Implement computer vision solutions

Implement computer vision solutions Detailed Explanation

Computer vision enables computers to interpret and analyze visual data, just like the human eye. Azure provides powerful AI-based computer vision tools that allow applications to recognize objects, extract text from images, and even understand emotions from facial expressions.

1. Azure Computer Vision API: Understanding the Basics

Azure Computer Vision API is a cloud-based service that provides advanced image and video analysis using deep learning models.

1.1 What is Azure Computer Vision API?

The Azure Computer Vision API allows developers to process and analyze images in various ways, including:

Object Detection – Identifying objects in images and classifying them into categories.
Optical Character Recognition (OCR) – Extracting text from scanned documents, signs, and handwritten notes.
Image Captioning – Generating automatic descriptions for images using natural language processing (NLP).
Facial Recognition – Detecting faces, landmarks (eyes, nose, mouth), and emotions in images.

1.2 How Does the Azure Computer Vision API Work?

The service processes an image and returns a structured JSON response containing the detected objects, text, faces, and metadata.

Key Features of the Azure Computer Vision API

Pre-trained AI models – No need to train models from scratch.
Supports multiple languages – OCR can detect over 100 languages.
Accessible via REST API and SDKs – Supports Python, C#, Java.
Integration with other Azure services – Works with Azure AI Search, Azure Cognitive Services, and Azure Machine Learning.

1.3 Setting Up the Azure Computer Vision API

Step 1: Create a Computer Vision Resource in Azure

Sign in to Azure Portal (https://portal.azure.com).
Navigate to Azure AI Services → Computer Vision.
Click Create and select the region, pricing tier, and resource group.
After creation, go to the Keys and Endpoints section to get your API Key and Endpoint URL.

Step 2: Install the Required SDKs

For Python users, install the azure-cognitiveservices-vision-computervision package:

pip install azure-cognitiveservices-vision-computervision

Step 3: Making a Basic API Call to Analyze an Image

Python Example: Using the API to Analyze an Image

import requests

#Define API endpoint and key
endpoint = "https://your-computer-vision-endpoint.com"
api_key = "your_api_key"

#Image URL to analyze
image_url = "https://example.com/sample-image.jpg"

#Set up headers
headers = {
    "Ocp-Apim-Subscription-Key": api_key,
    "Content-Type": "application/json"
}

#Define API request payload
data = {
    "url": image_url
}

#Make a request to analyze the image
response = requests.post(f"{endpoint}/vision/v3.2/analyze?visualFeatures=Tags,Description", headers=headers, json=data)

#Print the response
print(response.json())

Step 4: Understanding the API Response

When the request is processed, the API returns a JSON response like this:

{
    "description": {
        "captions": [
            {
                "text": "A man riding a bicycle in a park",
                "confidence": 0.95
            }
        ]
    },
    "tags": [
        {"name": "man", "confidence": 0.98},
        {"name": "bicycle", "confidence": 0.95},
        {"name": "park", "confidence": 0.90}
    ]
}

How to Use This Data?

The image caption ("A man riding a bicycle in a park") can be used for automated accessibility features.
The detected objects (man, bicycle, park) can be used for image categorization.

2. Implementing Object Detection

Object detection enables AI to identify objects in an image and classify them into categories. Azure Computer Vision API provides pre-trained object detection models, but you can also train custom object detection models using Azure Custom Vision.

2.1 How Object Detection Works

The AI model scans the image for known objects (e.g., cars, animals, furniture).
Bounding boxes are drawn around detected objects.
The API returns object names and confidence scores.

2.2 Using the Object Detection API

Here’s how to use Azure’s Object Detection API to detect objects in an image.

Python Example: Detecting Objects in an Image

from azure.cognitiveservices.vision.computervision import ComputerVisionClient
from msrest.authentication import CognitiveServicesCredentials

#Azure Computer Vision Credentials
API_KEY = "your_api_key"
ENDPOINT = "https://your-computer-vision-endpoint.com"

#Create a Computer Vision client
client = ComputerVisionClient(ENDPOINT, CognitiveServicesCredentials(API_KEY))

#Define the image URL
image_url = "https://example.com/car.jpg"

#Perform object detection
objects = client.analyze_image(image_url, visual_features=["Objects"])

#Print detected objects
for obj in objects.objects:
    print(f"Detected {obj.object_property} with confidence {obj.confidence}")

2.3 Understanding the API Response

The response will look like this:

{
    "objects": [
        {
            "object": "Car",
            "confidence": 0.98,
            "rectangle": {
                "x": 120,
                "y": 200,
                "w": 400,
                "h": 300
            }
        }
    ]
}

How to Use This Data?

Draw bounding boxes around detected objects for visualization.
Use object labels to categorize images in a database.
Filter images based on detected objects (e.g., block weapons from being uploaded).

2.4 Customizing Object Detection with Azure Custom Vision

Azure Custom Vision allows you to train your own object detection models if the pre-trained model does not detect specific objects relevant to your business.

Steps to Train a Custom Object Detection Model

Upload Training Data
- Collect images containing the objects you want to detect.
- Label objects manually in the Azure Custom Vision portal.
Train the Model
- Use Azure AutoML or custom model training.
- Train on a dataset of at least 100 images per object category.
Deploy the Model
- Deploy as an API endpoint for real-time predictions.
- Export as ONNX or TensorFlow model for edge devices.

3. Implementing OCR (Optical Character Recognition)

Optical Character Recognition (OCR) is a key feature of Azure Computer Vision that allows AI to extract text from images, scanned documents, and handwritten notes. This capability is essential for applications in document automation, digital archiving, and accessibility services.

3.1 What is OCR?

OCR is a technology that enables computers to read text from images or documents and convert it into machine-readable text.

Use Cases for OCR

Digitizing Printed Documents: Extracting text from scanned invoices, receipts, and reports.
License Plate Recognition: Reading vehicle license plates from traffic cameras.
Handwritten Text Recognition: Converting handwritten forms into structured digital text.
Extracting Text from Signs and Labels: Identifying text in street signs, product labels, or restaurant menus.

3.2 How Azure OCR Works

Azure OCR can analyze images containing printed or handwritten text and return structured text data.

Features of Azure OCR:

Supports Multi-Language Recognition: Can extract text in over 100 languages.
Detects Structured Data: Recognizes text inside tables, paragraphs, and key-value pairs.
Handles Handwriting Recognition: Can process handwritten forms and notes.

3.3 Implementing OCR with Azure Computer Vision API

Step 1: Setting Up OCR in Azure

Create an Azure Computer Vision resource in the Azure Portal.
Get the API Key and Endpoint from the Azure portal.
Install the Azure OCR SDK for Python:

pip install azure-cognitiveservices-vision-computervision

Step 2: Using the OCR API to Extract Text from an Image

We can send an image containing text to the Azure OCR API and retrieve recognized words.

Python Example: Performing OCR on an Image

from azure.cognitiveservices.vision.computervision import ComputerVisionClient
from msrest.authentication import CognitiveServicesCredentials

#Azure Computer Vision credentials
API_KEY = "your_api_key"
ENDPOINT = "https://your-computer-vision-endpoint.com"

#Create a Computer Vision client
client = ComputerVisionClient(ENDPOINT, CognitiveServicesCredentials(API_KEY))

#Define image URL (could also use local image)
image_url = "https://example.com/sample-text-image.jpg"

#Perform OCR on the image
ocr_results = client.read(image_url, raw=True)

#Wait for OCR results to be available
import time
operation_location = ocr_results.headers["Operation-Location"]
operation_id = operation_location.split("/")[-1]

while True:
    result = client.get_read_result(operation_id)
    if result.status not in ["notStarted", "running"]:
        break
    time.sleep(1)

#Print extracted text
for page in result.analyze_result.read_results:
    for line in page.lines:
        print(line.text)

Step 3: Understanding the OCR API Response

When the OCR API processes an image, it returns a JSON response containing recognized text and position coordinates.

{
    "read_results": [
        {
            "page": 1,
            "lines": [
                {
                    "text": "Welcome to Azure AI",
                    "bounding_box": [50, 50, 200, 50]
                },
                {
                    "text": "Computer Vision is powerful",
                    "bounding_box": [50, 100, 300, 100]
                }
            ]
        }
    ]
}

How to Use This Data?

Convert scanned documents into searchable text (for digital archives).
Extract key-value pairs from invoices and forms.
Process text in images for accessibility features (screen readers).

3.4 Advanced OCR: Extracting Text from Structured Documents

Azure OCR can extract structured text from:

Invoices and Receipts
Tables and Forms
Identity Documents (Passports, Driver’s Licenses, ID Cards)

Example: Extracting Text from an Invoice

If an image contains an invoice, Azure OCR can recognize structured fields:

Field	Extracted Value
Invoice Number	12345
Customer Name	John Doe
Total Amount	$250.00

Code Example: Extracting Key-Value Pairs from an Invoice

from azure.ai.formrecognizer import DocumentAnalysisClient
from azure.core.credentials import AzureKeyCredential

#Azure Form Recognizer credentials
API_KEY = "your_api_key"
ENDPOINT = "https://your-form-recognizer-endpoint.com"

#Create a client
client = DocumentAnalysisClient(ENDPOINT, AzureKeyCredential(API_KEY))

#Image URL of an invoice
invoice_url = "https://example.com/sample-invoice.jpg"

#Analyze the invoice
poller = client.begin_analyze_document_from_url("prebuilt-invoice", invoice_url)
result = poller.result()

#Extract key fields
for field_name, field in result.fields.items():
    print(f"{field_name}: {field.value}")

3.5 Implementing Handwritten Text Recognition

Azure OCR also supports handwriting recognition.

Use Cases for Handwriting OCR

Digitizing old handwritten documents.
Processing handwritten exam answer sheets.
Reading handwritten notes in applications like OneNote.

Example: Extracting Handwritten Text from an Image

ocr_results = client.read(image_url, raw=True)
operation_location = ocr_results.headers["Operation-Location"]
operation_id = operation_location.split("/")[-1]

while True:
    result = client.get_read_result(operation_id)
    if result.status not in ["notStarted", "running"]:
        break
    time.sleep(1)

#Print extracted handwritten text
for page in result.analyze_result.read_results:
    for line in page.lines:
        print(line.text)

Expected Output for a Handwritten Note

Meeting at 3 PM
Call John for updates
Buy groceries

3.6 Deploying OCR Solutions

After implementing OCR, the next step is deploying it at scale.

Deployment Options for OCR Solutions

Deployment Method	Best For	Example
Azure Functions	Real-time text extraction	Chat applications that moderate offensive text
Azure Kubernetes Service (AKS)	High-scale document processing	Bank processing millions of receipts daily
Azure Form Recognizer	Structured document parsing	Automating invoice and contract processing

3.7 Real-World Use Cases of OCR

1. Automating Document Processing in Banks

Banks use OCR to extract text from checks, invoices, and loan applications.
Azure Form Recognizer helps automate document workflows, reducing manual work.

2. License Plate Recognition for Traffic Management

Traffic systems use OCR to read vehicle license plates.
Azure extracts plate numbers from security camera footage for law enforcement.

3. Accessibility Solutions for the Visually Impaired

Screen reader applications use OCR to read text from images aloud.
Visually impaired users can use Azure OCR to scan restaurant menus, books, and street signs.

4. Implementing Image Captioning

Image captioning is a computer vision technique that allows AI to automatically generate textual descriptions of images. It combines deep learning-based image recognition with Natural Language Processing (NLP) to produce human-like descriptions of visual content.

4.1 What is Image Captioning?

Image captioning enables AI to analyze an image and describe it in natural language. This is useful for:

Accessibility: Helping visually impaired users understand images.
Content Organization: Automatically tagging and categorizing images in databases.
Search and SEO: Improving search engine indexing for image-heavy websites.
Social Media: Auto-generating captions for images posted online.

How Does Image Captioning Work?

Azure's Computer Vision API uses deep learning models to:

Detect objects, people, and environments in an image.
Identify relationships between objects (e.g., "a person riding a bicycle").
Generate a natural language description based on detected objects and context.

4.2 Using Azure Computer Vision for Image Captioning

Step 1: Setting Up the Computer Vision API

To use image captioning, first set up Azure Computer Vision:

Create an Azure Computer Vision resource in the Azure Portal.
Obtain the API Key and Endpoint URL.
Install the Azure SDK for Python:

pip install azure-cognitiveservices-vision-computervision

Step 2: Using the Image Captioning API

The following Python script sends an image to Azure Computer Vision and returns a caption.

from azure.cognitiveservices.vision.computervision import ComputerVisionClient
from msrest.authentication import CognitiveServicesCredentials

#Azure Computer Vision credentials
API_KEY = "your_api_key"
ENDPOINT = "https://your-computer-vision-endpoint.com"

#Create a Computer Vision client
client = ComputerVisionClient(ENDPOINT, CognitiveServicesCredentials(API_KEY))

#Define image URL
image_url = "https://example.com/sample-image.jpg"

#Analyze the image for captions
description_results = client.describe_image(image_url)

#Print generated captions
for caption in description_results.captions:
    print(f"Caption: {caption.text}, Confidence: {caption.confidence:.2f}")

Step 3: Understanding the API Response

When the API processes an image, it returns a JSON response with generated captions and confidence scores.

{
    "captions": [
        {
            "text": "A cat sitting on a couch",
            "confidence": 0.96
        }
    ]
}

How to Use This Data?

Enhance accessibility: Automatically add captions for visually impaired users.
Improve search results: Use captions as alt text for images.
Generate social media captions: Auto-caption images for Instagram, Facebook, or Twitter.

4.3 Customizing Image Captioning

While Azure's default image captioning is powerful, you can customize it for specific use cases.

1. Adjusting Caption Sensitivity

By default, Azure returns one or more captions with confidence scores.
You can filter low-confidence captions to improve quality.
Example: Only use captions with confidence > 0.80.

2. Combining Captioning with Custom Object Detection

Standard captions may be too generic (e.g., "a person sitting").
If you need domain-specific captions, train a Custom Vision model to recognize specialized objects.

3. Translating Captions into Multiple Languages

Azure supports automatic translation of captions using Azure Translator API.
Example: Convert an English caption into Spanish or French.

Python Example: Translating an Image Caption

from azure.cognitiveservices.speech.translation import SpeechTranslationConfig

#Set up Azure Translator
translator = SpeechTranslationConfig(subscription="your_translator_api_key", region="your_region")
translator.speech_recognition_language = "en"
translator.add_target_language("es")  # Translate to Spanish

#Input caption
caption_text = "A dog playing in the park"

#Translate caption
translated_caption = translator.speech_synthesis_voice_name(caption_text, "es")

print(f"Translated Caption: {translated_caption}")

4.4 Real-World Applications of Image Captioning

1. Automatic Alt Text for Websites

Websites like Google, Wikipedia, and e-commerce sites can automatically generate alt text for images.
Example: A shopping website uses image captioning to describe products.

2. AI-Powered Accessibility for the Blind

Apps like Seeing AI use image captioning to describe surroundings to visually impaired users.
Example: A phone app can narrate what is in an image, such as "A person standing in front of a red car."

3. Social Media Auto-Captioning

Instagram and Facebook use AI-powered captions to automatically describe user-uploaded images.
Example: A social media post of a sunset by the beach could generate the caption:
"A beautiful sunset over the ocean with orange and pink clouds."

4.5 Deploying Image Captioning at Scale

Deployment Options

Deployment Method	Best For	Example Use Case
Azure Functions	Captioning individual images in real-time	Social media post automation
Azure Kubernetes Service (AKS)	Large-scale image analysis	E-commerce product descriptions
Azure Batch Processing	Bulk captioning of images	News agency tagging historical images

4.6 Example: Deploying Image Captioning in an E-commerce Website

Challenge

An online store sells thousands of products but many items lack descriptions.
Users rely on product images, but they need automated captions.

Solution

Use Azure Computer Vision API to auto-generate product descriptions.
If an image shows a blue jacket, the AI could caption it as:
- "A stylish blue jacket for winter wear."

Implementation Steps

Upload images to Azure Blob Storage.
Azure Functions triggers captioning when an image is uploaded.
Store captions in a database for use in product descriptions.

5. Implementing Face Recognition

Face recognition is a key capability of Azure AI's Computer Vision services, enabling applications to detect, analyze, and recognize human faces. This feature is widely used in security systems, identity verification, emotion analysis, and social media applications.

5.1 What is Face Recognition?

Face recognition is an AI-driven technology that allows systems to identify and analyze human faces in images and videos. Azure Face API, part of Azure Cognitive Services, provides advanced face detection and recognition functionalities.

Key Features of Face Recognition

Face Detection: Identifies faces in images and returns details like location, size, and facial landmarks.
Face Verification: Compares two faces to determine if they belong to the same person.
Emotion Analysis: Detects emotions such as happiness, sadness, anger, and surprise.
Face Identification: Recognizes known individuals from a database of faces.
Age and Gender Estimation: Predicts a person’s age range and gender based on facial features.

5.2 How Azure Face API Works

Azure Face API processes images containing faces and returns structured metadata about detected individuals.

Step 1: Setting Up Azure Face API

Create an Azure Face API Resource in the Azure Portal.
Obtain the API Key and Endpoint URL from the Azure portal.
Install the Azure Face API SDK:

pip install azure-cognitiveservices-vision-face

5.3 Detecting Faces in an Image

Face detection allows you to find faces in an image and analyze their features.

Python Example: Detecting Faces in an Image

from azure.cognitiveservices.vision.face import FaceClient
from msrest.authentication import CognitiveServicesCredentials

#Azure Face API credentials
API_KEY = "your_api_key"
ENDPOINT = "https://your-face-api-endpoint.com"

#Create a FaceClient instance
face_client = FaceClient(ENDPOINT, CognitiveServicesCredentials(API_KEY))

#Image containing faces
image_url = "https://example.com/group-photo.jpg"

#Detect faces
faces = face_client.face.detect_with_url(image_url, return_face_landmarks=True, return_face_attributes=["age", "gender", "emotion"])

#Print detected faces and attributes
for face in faces:
    print(f"Detected face at location {face.face_rectangle}")
    print(f"Age: {face.face_attributes.age}, Gender: {face.face_attributes.gender}")
    print(f"Emotions: {face.face_attributes.emotion}")

5.4 Understanding the API Response

The API response contains bounding box coordinates, age, gender, and emotion analysis.

{
    "faceId": "abcd1234",
    "faceRectangle": {
        "top": 100,
        "left": 200,
        "width": 80,
        "height": 80
    },
    "faceAttributes": {
        "age": 29,
        "gender": "male",
        "emotion": {
            "happiness": 0.95,
            "anger": 0.02,
            "sadness": 0.01
        }
    }
}

How to Use This Data?

Blur faces for privacy in public images.
Personalize user experiences based on detected emotions.
Automate customer feedback analysis by recognizing emotional reactions.

5.5 Implementing Face Verification

Face verification allows applications to compare two images and determine if they belong to the same person.

Use Cases

Identity verification in banking and secure logins.
Fraud prevention in financial transactions.
User authentication in mobile apps (face login).

Python Example: Face Verification

#Define two images to compare
image1 = "https://example.com/user-photo-1.jpg"
image2 = "https://example.com/user-photo-2.jpg"

#Detect faces in both images
faces1 = face_client.face.detect_with_url(image1)
faces2 = face_client.face.detect_with_url(image2)

#Extract face IDs
face_id1 = faces1[0].face_id
face_id2 = faces2[0].face_id

#Verify if they belong to the same person
verify_result = face_client.face.verify_face_to_face(face_id1, face_id2)

if verify_result.is_identical:
    print(f"Faces match with confidence {verify_result.confidence:.2f}")
else:
    print("Faces do not match")

5.6 Implementing Face Identification

Face identification allows applications to recognize known individuals from a database of faces.

How It Works

Create a "Face Database" of registered users.
Add multiple face images per person for better accuracy.
Compare a new face against the database to identify the person.

Python Example: Face Identification

#Create a new person group
person_group_id = "my_users"
face_client.person_group.create(person_group_id, name="Users Database")

#Add a new user to the database
user_id = face_client.person_group_person.create(person_group_id, name="John Doe").person_id

#Add multiple images of the person
image_urls = ["https://example.com/john1.jpg", "https://example.com/john2.jpg"]
for img in image_urls:
    face_client.person_group_person.add_face_from_url(person_group_id, user_id, img)

#Train the model
face_client.person_group.train(person_group_id)

#Identify a face against the database
face_image = "https://example.com/test-photo.jpg"
faces = face_client.face.detect_with_url(face_image)
face_ids = [face.face_id for face in faces]

#Identify faces in the database
identify_results = face_client.face.identify(face_ids, person_group_id)

#Print results
for result in identify_results:
    for candidate in result.candidates:
        print(f"Identified User ID: {candidate.person_id}, Confidence: {candidate.confidence:.2f}")

5.7 Deploying Face Recognition Solutions

Deployment Options

Deployment Method	Best For	Example Use Case
Azure Functions	Small-scale authentication	Face login for mobile apps
Azure Kubernetes Service (AKS)	Large-scale identity verification	Face recognition in airports
Azure IoT Edge	On-device face detection	Smart cameras in security systems

5.8 Real-World Applications of Face Recognition

1. Face-Based Attendance Systems

Schools and offices use face recognition to track attendance automatically.
Example: Employees check-in using facial authentication instead of ID cards.

2. Smart Security and Access Control

Airports and offices use face verification for identity checks.
Example: Biometric security gates open for authorized personnel only.

3. Customer Experience Enhancement

Stores use emotion recognition to analyze customer reactions.
Example: If a customer looks frustrated, staff can offer assistance.

6. Training Custom Image Models with Azure Custom Vision

Azure Custom Vision is a powerful service that allows you to train and deploy custom image classification and object detection models. Unlike the pre-trained models in Azure Computer Vision API, Custom Vision lets you define specific objects or patterns that your AI should recognize.

6.1 What is Azure Custom Vision?

Azure Custom Vision is a cloud-based tool that enables training AI models to recognize images based on labeled data. It is useful when pre-trained models do not meet your specific requirements.

Key Features of Azure Custom Vision

Train custom AI models to detect domain-specific objects.
No programming expertise required – UI-based training available in Azure Portal.
Supports image classification and object detection.
Export models to edge devices for offline processing.

6.2 Image Classification vs. Object Detection

Azure Custom Vision supports two types of tasks:

Feature	Image Classification	Object Detection
Purpose	Classifies images into categories	Identifies and localizes objects in images
Example	"Cat" vs. "Dog" image classification	Detecting a cat in an image and marking its location
Use Case	Sorting defective vs. non-defective products	Identifying individual components on a factory belt

6.3 Setting Up Azure Custom Vision

Step 1: Create a Custom Vision Project

Sign in to the Azure Portal (https://portal.azure.com).
Navigate to Custom Vision → Click Create a New Project.
Select Project Type:
- Classification (identify image categories).
- Object Detection (detect specific objects in images).
Choose a training domain:
- General (for most use cases).
- Retail (for fashion, e-commerce).
- Landmarks (for geographical recognition).

Step 2: Upload Training Data

Upload at least 50 images per category for classification.
For object detection, label objects using bounding boxes.
Example: If training an AI model to recognize dog breeds, provide multiple labeled images per breed.

Step 3: Train the Model

Click Train Model in Azure Custom Vision.
The system analyzes the images and trains a neural network.
After training, view performance metrics (accuracy, precision, recall).

Step 4: Test the Model

Upload a new image that was not used in training.
The model will predict the category or detect objects.
Example: If the model is trained on cars, uploading a new car image should classify the brand correctly.

6.4 Implementing Custom Vision in Python

After training the model, use the Custom Vision API to send images and get predictions.

Python Example: Classifying an Image Using Custom Vision

from azure.cognitiveservices.vision.customvision.prediction import CustomVisionPredictionClient
from msrest.authentication import ApiKeyCredentials

#Azure Custom Vision credentials
API_KEY = "your_prediction_key"
ENDPOINT = "https://your-custom-vision-endpoint.com"
PROJECT_ID = "your_project_id"
MODEL_NAME = "your_model_name"

#Create a prediction client
credentials = ApiKeyCredentials(in_headers={"Prediction-key": API_KEY})
predictor = CustomVisionPredictionClient(ENDPOINT, credentials)

#Define image URL for classification
image_url = "https://example.com/sample-image.jpg"

#Send image for prediction
results = predictor.classify_image_url(PROJECT_ID, MODEL_NAME, image_url)

#Print prediction results
for prediction in results.predictions:
    print(f"Tag: {prediction.tag_name}, Confidence: {prediction.probability:.2f}")

6.5 Training an Object Detection Model in Custom Vision

Unlike classification, object detection requires bounding boxes.

Steps to Train an Object Detection Model

Upload images containing objects to detect.
Label objects by drawing bounding boxes.
Train the model using Azure Custom Vision.
Test detection on new images.

Python Example: Detecting Objects with Custom Vision

from azure.cognitiveservices.vision.customvision.prediction import CustomVisionPredictionClient

#Define Custom Vision credentials
PREDICTION_KEY = "your_prediction_key"
ENDPOINT = "https://your-custom-vision-endpoint.com"
PROJECT_ID = "your_project_id"
MODEL_NAME = "your_model_name"

#Create a prediction client
predictor = CustomVisionPredictionClient(ENDPOINT, ApiKeyCredentials(in_headers={"Prediction-key": PREDICTION_KEY}))

#Image for object detection
image_url = "https://example.com/sample-image.jpg"

#Perform object detection
results = predictor.detect_image_url(PROJECT_ID, MODEL_NAME, image_url)

#Print detected objects
for prediction in results.predictions:
    print(f"Detected: {prediction.tag_name} with confidence {prediction.probability:.2f}")

6.6 Deploying Custom Vision Models

After training and testing, deploy the model for real-world use.

Deployment Options

Deployment Method	Best For	Example Use Case
Azure Cloud API	Scalable cloud-based predictions	Real-time image classification in a mobile app
Azure IoT Edge	Low-latency, on-device predictions	Factory defect detection in real-time
Embedded AI (ONNX, TensorFlow)	Running AI offline	AI-powered security cameras

6.7 Real-World Applications of Custom Vision

1. Defect Detection in Manufacturing

AI detects scratches or damages in factory-produced goods.
Example: A car assembly line identifies misaligned parts automatically.

2. Personalized Fashion Recommendations

AI detects clothing style, colors, and brands in shopping apps.
Example: A fashion retailer suggests outfits based on a user’s uploaded photos.

3. Smart Agricultural Monitoring

AI identifies crop diseases or pests from field images.
Example: A farm monitoring system sends alerts when crops need attention.

7. Deploying Computer Vision Models

Once a computer vision model is trained and tested, the next step is to deploy it for real-world applications. Deployment ensures that AI-powered image recognition, object detection, or face recognition can be used in applications at scale.

7.1 Deployment Methods for Computer Vision Models

Azure provides multiple ways to deploy a computer vision model, depending on performance, scalability, and cost considerations.

Deployment Method	Best For	Example Use Case
Azure Cloud API (Serverless)	Scalable AI services	Mobile apps performing real-time image classification
Azure Kubernetes Service (AKS)	Large-scale AI model deployment	AI-driven security systems monitoring city-wide cameras
Azure IoT Edge	On-device AI inference	AI-powered smart cameras in retail stores
Azure App Services	Deploying AI-powered web applications	E-commerce platforms using image search

7.2 Deploying Computer Vision as a Cloud API

The easiest way to deploy a trained AI model is as an Azure Cloud API, allowing applications to send images and receive predictions in real-time.

Steps to Deploy an AI Model as a Cloud API

Train and export the model (Custom Vision, TensorFlow, ONNX).
Deploy the model as an Azure Function or Web App.
Expose a REST API endpoint for applications to use the model.
Scale using Azure Load Balancer and Autoscale.

Python Example: Deploying an AI Model as an API

The following Flask-based Python API allows users to upload an image and get classification results from a Custom Vision model.

from flask import Flask, request, jsonify
from azure.cognitiveservices.vision.customvision.prediction import CustomVisionPredictionClient
from msrest.authentication import ApiKeyCredentials

#Azure Custom Vision API credentials
PREDICTION_KEY = "your_prediction_key"
ENDPOINT = "https://your-custom-vision-endpoint.com"
PROJECT_ID = "your_project_id"
MODEL_NAME = "your_model_name"

#Create Flask app
app = Flask(__name__)

#Define the prediction function
@app.route('/predict', methods=['POST'])
def predict():
    file = request.files['image']
    image_data = file.read()

    #Create prediction client
    credentials = ApiKeyCredentials(in_headers={"Prediction-key": PREDICTION_KEY})
    predictor = CustomVisionPredictionClient(ENDPOINT, credentials)

    #Send image to model
    results = predictor.classify_image(PROJECT_ID, MODEL_NAME, image_data)

    #Prepare the response
    predictions = [{"tag": p.tag_name, "confidence": p.probability} for p in results.predictions]
    return jsonify(predictions)

#Run the Flask app
if __name__ == '__main__':
    app.run(debug=True)

How It Works:

Users send an image via POST request.
The Flask API calls the Azure Custom Vision model.
The API returns classification results in JSON format.

7.3 Deploying AI Models with Azure Kubernetes Service (AKS)

For high-scale applications, AI models can be containerized and deployed using Azure Kubernetes Service (AKS).

Why Use AKS for AI Deployment?

Handles large-scale AI requests by distributing the load across multiple servers.
Enables AI model versioning (A/B testing of new model versions).
Automatically scales based on usage.

Steps to Deploy a Computer Vision Model with AKS

Containerize the AI model using Docker.
Push the Docker image to Azure Container Registry (ACR).
Deploy the containerized AI model to an AKS cluster.
Expose the model as an API endpoint for applications to use.

Example: Deploying an AI Model in a Docker Container

#Use Python as base image
FROM python:3.9

#Install dependencies
RUN pip install flask azure-cognitiveservices-vision-customvision

#Copy the model script
COPY app.py /app/app.py

#Run the API server
CMD ["python", "/app/app.py"]

7.4 Deploying AI Models to Azure IoT Edge

For real-time, low-latency AI inference, deploying models to edge devices (such as security cameras, drones, or industrial machines) is the best approach.

Why Use Azure IoT Edge for AI Deployment?

Reduces latency (AI runs locally without cloud dependency).
Works in offline environments (factories, remote locations).
Optimized for low-power devices (Raspberry Pi, Nvidia Jetson).

Steps to Deploy an AI Model to IoT Edge

Train and optimize the model for edge devices (convert to ONNX format).
Deploy the AI model as an IoT Edge module.
Use IoT Edge Hub for real-time AI processing.
Send only important results to the cloud (reduces data costs).

Example: Converting a TensorFlow Model to ONNX for Edge Deployment

import tf2onnx
import tensorflow as tf

#Load trained TensorFlow model
model = tf.keras.models.load_model("custom_model.h5")

#Convert to ONNX format for IoT Edge
onnx_model, _ = tf2onnx.convert.from_keras(model)
with open("custom_model.onnx", "wb") as f:
    f.write(onnx_model.SerializeToString())

7.5 Monitoring and Updating AI Models

Once an AI model is deployed, it needs to be monitored for performance, drift, and errors.

Monitoring AI Models with Azure Machine Learning

Track model accuracy and confidence scores over time.
Detect "model drift" (when real-world data changes, making the model outdated).
Automate AI model updates when new data is available.

Example: Automating Model Updates

from azureml.core import Workspace, Model

#Connect to Azure ML workspace
ws = Workspace.from_config()

#Register a new version of the AI model
model = Model.register(
    workspace=ws,
    model_name="new_vision_model",
    model_path="./new_model.onnx"
)

7.6 Real-World Applications of AI Deployment

1. AI-Powered Retail Image Search

Users upload a photo of a product and the AI finds similar items in stock.
Deployed as an API in an e-commerce platform.

2. Smart Traffic Surveillance

AI detects violations (speeding, red-light running) using camera feeds.
Deployed on Azure IoT Edge for real-time AI inference.

3. AI in Healthcare

AI models analyze X-rays and MRI scans for disease detection.
Deployed in hospitals as an AKS-based cloud service.

7.7 Choosing the Best Deployment Strategy

Deployment Option	Best For	Example Use Case
Azure Functions (Cloud API)	Real-time, low-traffic AI services	AI chatbot that detects objects in images
Azure Kubernetes Service (AKS)	Large-scale AI inference	AI-powered security monitoring system
Azure IoT Edge	Offline, real-time AI processing	AI-powered quality control on factory floors
Azure App Services	AI-powered web applications	AI-driven product recommendations

Implement computer vision solutions (Additional Content)

1. Technical Limitations of Azure Computer Vision Models

Understanding the limitations of Azure’s prebuilt computer vision models is critical for both solution architecture and exam performance, especially in scenarios where resource constraints, time sensitivity, or edge compatibility are involved.

1.1 Common Technical Constraints

Aspect	Limitation
Image Size	Maximum input size is typically 50 MB. Resolutions above 4200×4200 pixels may be rejected.
Image Formats	Supported formats: JPEG, PNG, BMP, and GIF. RAW and TIFF are not supported.
API Rate Limits	For standard pricing tiers: ~10–20 requests/sec per subscription (subject to change by region and tier).
Latency	Varies by model type. For Computer Vision 4.0, response time is typically less than 1 second for standard images.
Timeout Handling	Requests that exceed processing limits (e.g., large files or network lag) will return HTTP 408/504 errors.
Language Support	OCR and captioning may not support all languages. Check documentation for language-specific limitations.

1.2 Example Exam-Relevant Scenario

A user attempts to analyze a high-resolution satellite image (8000x8000 pixels, 80MB file size) using Azure’s Analyze Image API. What is the likely result?

Correct Answer: The request will fail due to exceeding image size and resolution limits.

2. Comparing Azure Computer Vision Services

Azure provides multiple vision-related services, each optimized for different use cases. Candidates are often asked to select the most appropriate service based on a described scenario.

2.1 Service Comparison Table

Service	Best For	Customization	Example Use Case
Computer Vision	General-purpose image analysis, OCR, spatial analysis	Pre-trained only	Extracting text from invoices
Custom Vision	Domain-specific image classification or object detection	Train your own models	Defect detection on factory assembly lines
Face API	Detecting faces, identifying individuals, emotion estimation	Pre-trained (Face data only)	Multi-user access control with facial login
Form Recognizer	Structured document extraction (tables, fields, key-values)	Custom models supported	Reading data from scanned insurance forms
Azure OpenAI DALL·E	Generating synthetic images from prompts	Pre-trained	Creating marketing assets from descriptions

2.2 Key Distinctions

Feature	Computer Vision	Custom Vision	Face API
Can detect faces?	Yes (but limited)	No	Yes (advanced)
Custom training?	No	Yes	No
Emotion detection?	No	No	Yes
Object detection?	No	Yes	No
OCR support?	Yes	No	No

2.3 Decision-Making Tips for Exams

If the question mentions domain-specific classification (e.g., classifying different types of mushrooms), Custom Vision is the right choice.
If it requires face emotion or identity recognition, use Face API.
If it's about reading text or analyzing general images, use Computer Vision API.

2.4 Example Question for Comparison

You are developing a system to classify different types of machinery parts from images. You need to train a model using a labeled dataset. Which Azure service should you choose?

Correct Answer: Custom Vision

Shopping cart

Subtotal:

AI-102 Implement computer vision solutions

Detailed list of AI-102 knowledge points

Implement computer vision solutions Detailed Explanation

1. Azure Computer Vision API: Understanding the Basics

1.1 What is Azure Computer Vision API?

1.2 How Does the Azure Computer Vision API Work?

Key Features of the Azure Computer Vision API

1.3 Setting Up the Azure Computer Vision API

Step 1: Create a Computer Vision Resource in Azure

Step 2: Install the Required SDKs

Step 3: Making a Basic API Call to Analyze an Image

Python Example: Using the API to Analyze an Image

Step 4: Understanding the API Response

How to Use This Data?

2. Implementing Object Detection

2.1 How Object Detection Works

2.2 Using the Object Detection API

Python Example: Detecting Objects in an Image

2.3 Understanding the API Response

How to Use This Data?

2.4 Customizing Object Detection with Azure Custom Vision

Steps to Train a Custom Object Detection Model

3. Implementing OCR (Optical Character Recognition)

3.1 What is OCR?

Use Cases for OCR

3.2 How Azure OCR Works

Features of Azure OCR:

3.3 Implementing OCR with Azure Computer Vision API

Step 1: Setting Up OCR in Azure

Step 2: Using the OCR API to Extract Text from an Image

Python Example: Performing OCR on an Image

Step 3: Understanding the OCR API Response

How to Use This Data?

3.4 Advanced OCR: Extracting Text from Structured Documents

Example: Extracting Text from an Invoice

Code Example: Extracting Key-Value Pairs from an Invoice

3.5 Implementing Handwritten Text Recognition

Use Cases for Handwriting OCR

Example: Extracting Handwritten Text from an Image

Expected Output for a Handwritten Note

3.6 Deploying OCR Solutions

Deployment Options for OCR Solutions

3.7 Real-World Use Cases of OCR

1. Automating Document Processing in Banks

2. License Plate Recognition for Traffic Management

3. Accessibility Solutions for the Visually Impaired

4. Implementing Image Captioning

4.1 What is Image Captioning?

How Does Image Captioning Work?

4.2 Using Azure Computer Vision for Image Captioning

Step 1: Setting Up the Computer Vision API

Step 2: Using the Image Captioning API

Step 3: Understanding the API Response

How to Use This Data?

4.3 Customizing Image Captioning

1. Adjusting Caption Sensitivity

2. Combining Captioning with Custom Object Detection

3. Translating Captions into Multiple Languages

Python Example: Translating an Image Caption

4.4 Real-World Applications of Image Captioning

1. Automatic Alt Text for Websites

2. AI-Powered Accessibility for the Blind

3. Social Media Auto-Captioning

4.5 Deploying Image Captioning at Scale

Deployment Options

4.6 Example: Deploying Image Captioning in an E-commerce Website

Challenge

Solution

Implementation Steps

5. Implementing Face Recognition

5.1 What is Face Recognition?

Key Features of Face Recognition

5.2 How Azure Face API Works

Step 1: Setting Up Azure Face API

5.3 Detecting Faces in an Image

Python Example: Detecting Faces in an Image

5.4 Understanding the API Response