Shopping cart

Subtotal:

$0.00

AI-102 Implement computer vision solutions

Implement computer vision solutions

Detailed list of AI-102 knowledge points

Implement computer vision solutions Detailed Explanation

Computer vision enables computers to interpret and analyze visual data, just like the human eye. Azure provides powerful AI-based computer vision tools that allow applications to recognize objects, extract text from images, and even understand emotions from facial expressions.

1. Azure Computer Vision API: Understanding the Basics

Azure Computer Vision API is a cloud-based service that provides advanced image and video analysis using deep learning models.

1.1 What is Azure Computer Vision API?

The Azure Computer Vision API allows developers to process and analyze images in various ways, including:

  • Object Detection – Identifying objects in images and classifying them into categories.
  • Optical Character Recognition (OCR) – Extracting text from scanned documents, signs, and handwritten notes.
  • Image Captioning – Generating automatic descriptions for images using natural language processing (NLP).
  • Facial Recognition – Detecting faces, landmarks (eyes, nose, mouth), and emotions in images.

1.2 How Does the Azure Computer Vision API Work?

The service processes an image and returns a structured JSON response containing the detected objects, text, faces, and metadata.

Key Features of the Azure Computer Vision API
  1. Pre-trained AI models – No need to train models from scratch.
  2. Supports multiple languages – OCR can detect over 100 languages.
  3. Accessible via REST API and SDKs – Supports Python, C#, Java.
  4. Integration with other Azure services – Works with Azure AI Search, Azure Cognitive Services, and Azure Machine Learning.

1.3 Setting Up the Azure Computer Vision API

Step 1: Create a Computer Vision Resource in Azure
  1. Sign in to Azure Portal (https://portal.azure.com).
  2. Navigate to Azure AI ServicesComputer Vision.
  3. Click Create and select the region, pricing tier, and resource group.
  4. After creation, go to the Keys and Endpoints section to get your API Key and Endpoint URL.
Step 2: Install the Required SDKs

For Python users, install the azure-cognitiveservices-vision-computervision package:

pip install azure-cognitiveservices-vision-computervision
Step 3: Making a Basic API Call to Analyze an Image
Python Example: Using the API to Analyze an Image
import requests

#Define API endpoint and key
endpoint = "https://your-computer-vision-endpoint.com"
api_key = "your_api_key"

#Image URL to analyze
image_url = "https://example.com/sample-image.jpg"

#Set up headers
headers = {
    "Ocp-Apim-Subscription-Key": api_key,
    "Content-Type": "application/json"
}

#Define API request payload
data = {
    "url": image_url
}

#Make a request to analyze the image
response = requests.post(f"{endpoint}/vision/v3.2/analyze?visualFeatures=Tags,Description", headers=headers, json=data)

#Print the response
print(response.json())
Step 4: Understanding the API Response

When the request is processed, the API returns a JSON response like this:

{
    "description": {
        "captions": [
            {
                "text": "A man riding a bicycle in a park",
                "confidence": 0.95
            }
        ]
    },
    "tags": [
        {"name": "man", "confidence": 0.98},
        {"name": "bicycle", "confidence": 0.95},
        {"name": "park", "confidence": 0.90}
    ]
}
How to Use This Data?
  • The image caption ("A man riding a bicycle in a park") can be used for automated accessibility features.
  • The detected objects (man, bicycle, park) can be used for image categorization.

2. Implementing Object Detection

Object detection enables AI to identify objects in an image and classify them into categories. Azure Computer Vision API provides pre-trained object detection models, but you can also train custom object detection models using Azure Custom Vision.

2.1 How Object Detection Works

  • The AI model scans the image for known objects (e.g., cars, animals, furniture).
  • Bounding boxes are drawn around detected objects.
  • The API returns object names and confidence scores.

2.2 Using the Object Detection API

Here’s how to use Azure’s Object Detection API to detect objects in an image.

Python Example: Detecting Objects in an Image
from azure.cognitiveservices.vision.computervision import ComputerVisionClient
from msrest.authentication import CognitiveServicesCredentials

#Azure Computer Vision Credentials
API_KEY = "your_api_key"
ENDPOINT = "https://your-computer-vision-endpoint.com"

#Create a Computer Vision client
client = ComputerVisionClient(ENDPOINT, CognitiveServicesCredentials(API_KEY))

#Define the image URL
image_url = "https://example.com/car.jpg"

#Perform object detection
objects = client.analyze_image(image_url, visual_features=["Objects"])

#Print detected objects
for obj in objects.objects:
    print(f"Detected {obj.object_property} with confidence {obj.confidence}")

2.3 Understanding the API Response

The response will look like this:

{
    "objects": [
        {
            "object": "Car",
            "confidence": 0.98,
            "rectangle": {
                "x": 120,
                "y": 200,
                "w": 400,
                "h": 300
            }
        }
    ]
}
How to Use This Data?
  • Draw bounding boxes around detected objects for visualization.
  • Use object labels to categorize images in a database.
  • Filter images based on detected objects (e.g., block weapons from being uploaded).

2.4 Customizing Object Detection with Azure Custom Vision

Azure Custom Vision allows you to train your own object detection models if the pre-trained model does not detect specific objects relevant to your business.

Steps to Train a Custom Object Detection Model
  1. Upload Training Data

    • Collect images containing the objects you want to detect.
    • Label objects manually in the Azure Custom Vision portal.
  2. Train the Model

    • Use Azure AutoML or custom model training.
    • Train on a dataset of at least 100 images per object category.
  3. Deploy the Model

    • Deploy as an API endpoint for real-time predictions.
    • Export as ONNX or TensorFlow model for edge devices.

3. Implementing OCR (Optical Character Recognition)

Optical Character Recognition (OCR) is a key feature of Azure Computer Vision that allows AI to extract text from images, scanned documents, and handwritten notes. This capability is essential for applications in document automation, digital archiving, and accessibility services.

3.1 What is OCR?

OCR is a technology that enables computers to read text from images or documents and convert it into machine-readable text.

Use Cases for OCR
  • Digitizing Printed Documents: Extracting text from scanned invoices, receipts, and reports.
  • License Plate Recognition: Reading vehicle license plates from traffic cameras.
  • Handwritten Text Recognition: Converting handwritten forms into structured digital text.
  • Extracting Text from Signs and Labels: Identifying text in street signs, product labels, or restaurant menus.

3.2 How Azure OCR Works

Azure OCR can analyze images containing printed or handwritten text and return structured text data.

Features of Azure OCR:
  • Supports Multi-Language Recognition: Can extract text in over 100 languages.
  • Detects Structured Data: Recognizes text inside tables, paragraphs, and key-value pairs.
  • Handles Handwriting Recognition: Can process handwritten forms and notes.

3.3 Implementing OCR with Azure Computer Vision API

Step 1: Setting Up OCR in Azure
  1. Create an Azure Computer Vision resource in the Azure Portal.
  2. Get the API Key and Endpoint from the Azure portal.
  3. Install the Azure OCR SDK for Python:
pip install azure-cognitiveservices-vision-computervision
Step 2: Using the OCR API to Extract Text from an Image

We can send an image containing text to the Azure OCR API and retrieve recognized words.

Python Example: Performing OCR on an Image
from azure.cognitiveservices.vision.computervision import ComputerVisionClient
from msrest.authentication import CognitiveServicesCredentials

#Azure Computer Vision credentials
API_KEY = "your_api_key"
ENDPOINT = "https://your-computer-vision-endpoint.com"

#Create a Computer Vision client
client = ComputerVisionClient(ENDPOINT, CognitiveServicesCredentials(API_KEY))

#Define image URL (could also use local image)
image_url = "https://example.com/sample-text-image.jpg"

#Perform OCR on the image
ocr_results = client.read(image_url, raw=True)

#Wait for OCR results to be available
import time
operation_location = ocr_results.headers["Operation-Location"]
operation_id = operation_location.split("/")[-1]

while True:
    result = client.get_read_result(operation_id)
    if result.status not in ["notStarted", "running"]:
        break
    time.sleep(1)

#Print extracted text
for page in result.analyze_result.read_results:
    for line in page.lines:
        print(line.text)
Step 3: Understanding the OCR API Response

When the OCR API processes an image, it returns a JSON response containing recognized text and position coordinates.

{
    "read_results": [
        {
            "page": 1,
            "lines": [
                {
                    "text": "Welcome to Azure AI",
                    "bounding_box": [50, 50, 200, 50]
                },
                {
                    "text": "Computer Vision is powerful",
                    "bounding_box": [50, 100, 300, 100]
                }
            ]
        }
    ]
}
How to Use This Data?
  • Convert scanned documents into searchable text (for digital archives).
  • Extract key-value pairs from invoices and forms.
  • Process text in images for accessibility features (screen readers).

3.4 Advanced OCR: Extracting Text from Structured Documents

Azure OCR can extract structured text from:

  • Invoices and Receipts
  • Tables and Forms
  • Identity Documents (Passports, Driver’s Licenses, ID Cards)
Example: Extracting Text from an Invoice

If an image contains an invoice, Azure OCR can recognize structured fields:

Field Extracted Value
Invoice Number 12345
Customer Name John Doe
Total Amount $250.00
Code Example: Extracting Key-Value Pairs from an Invoice
from azure.ai.formrecognizer import DocumentAnalysisClient
from azure.core.credentials import AzureKeyCredential

#Azure Form Recognizer credentials
API_KEY = "your_api_key"
ENDPOINT = "https://your-form-recognizer-endpoint.com"

#Create a client
client = DocumentAnalysisClient(ENDPOINT, AzureKeyCredential(API_KEY))

#Image URL of an invoice
invoice_url = "https://example.com/sample-invoice.jpg"

#Analyze the invoice
poller = client.begin_analyze_document_from_url("prebuilt-invoice", invoice_url)
result = poller.result()

#Extract key fields
for field_name, field in result.fields.items():
    print(f"{field_name}: {field.value}")

3.5 Implementing Handwritten Text Recognition

Azure OCR also supports handwriting recognition.

Use Cases for Handwriting OCR
  • Digitizing old handwritten documents.
  • Processing handwritten exam answer sheets.
  • Reading handwritten notes in applications like OneNote.
Example: Extracting Handwritten Text from an Image
ocr_results = client.read(image_url, raw=True)
operation_location = ocr_results.headers["Operation-Location"]
operation_id = operation_location.split("/")[-1]

while True:
    result = client.get_read_result(operation_id)
    if result.status not in ["notStarted", "running"]:
        break
    time.sleep(1)

#Print extracted handwritten text
for page in result.analyze_result.read_results:
    for line in page.lines:
        print(line.text)
Expected Output for a Handwritten Note
Meeting at 3 PM
Call John for updates
Buy groceries

3.6 Deploying OCR Solutions

After implementing OCR, the next step is deploying it at scale.

Deployment Options for OCR Solutions
Deployment Method Best For Example
Azure Functions Real-time text extraction Chat applications that moderate offensive text
Azure Kubernetes Service (AKS) High-scale document processing Bank processing millions of receipts daily
Azure Form Recognizer Structured document parsing Automating invoice and contract processing

3.7 Real-World Use Cases of OCR

1. Automating Document Processing in Banks
  • Banks use OCR to extract text from checks, invoices, and loan applications.
  • Azure Form Recognizer helps automate document workflows, reducing manual work.
2. License Plate Recognition for Traffic Management
  • Traffic systems use OCR to read vehicle license plates.
  • Azure extracts plate numbers from security camera footage for law enforcement.
3. Accessibility Solutions for the Visually Impaired
  • Screen reader applications use OCR to read text from images aloud.
  • Visually impaired users can use Azure OCR to scan restaurant menus, books, and street signs.

4. Implementing Image Captioning

Image captioning is a computer vision technique that allows AI to automatically generate textual descriptions of images. It combines deep learning-based image recognition with Natural Language Processing (NLP) to produce human-like descriptions of visual content.

4.1 What is Image Captioning?

Image captioning enables AI to analyze an image and describe it in natural language. This is useful for:

  • Accessibility: Helping visually impaired users understand images.
  • Content Organization: Automatically tagging and categorizing images in databases.
  • Search and SEO: Improving search engine indexing for image-heavy websites.
  • Social Media: Auto-generating captions for images posted online.
How Does Image Captioning Work?

Azure's Computer Vision API uses deep learning models to:

  1. Detect objects, people, and environments in an image.
  2. Identify relationships between objects (e.g., "a person riding a bicycle").
  3. Generate a natural language description based on detected objects and context.

4.2 Using Azure Computer Vision for Image Captioning

Step 1: Setting Up the Computer Vision API

To use image captioning, first set up Azure Computer Vision:

  1. Create an Azure Computer Vision resource in the Azure Portal.
  2. Obtain the API Key and Endpoint URL.
  3. Install the Azure SDK for Python:
pip install azure-cognitiveservices-vision-computervision
Step 2: Using the Image Captioning API

The following Python script sends an image to Azure Computer Vision and returns a caption.

from azure.cognitiveservices.vision.computervision import ComputerVisionClient
from msrest.authentication import CognitiveServicesCredentials

#Azure Computer Vision credentials
API_KEY = "your_api_key"
ENDPOINT = "https://your-computer-vision-endpoint.com"

#Create a Computer Vision client
client = ComputerVisionClient(ENDPOINT, CognitiveServicesCredentials(API_KEY))

#Define image URL
image_url = "https://example.com/sample-image.jpg"

#Analyze the image for captions
description_results = client.describe_image(image_url)

#Print generated captions
for caption in description_results.captions:
    print(f"Caption: {caption.text}, Confidence: {caption.confidence:.2f}")
Step 3: Understanding the API Response

When the API processes an image, it returns a JSON response with generated captions and confidence scores.

{
    "captions": [
        {
            "text": "A cat sitting on a couch",
            "confidence": 0.96
        }
    ]
}
How to Use This Data?
  • Enhance accessibility: Automatically add captions for visually impaired users.
  • Improve search results: Use captions as alt text for images.
  • Generate social media captions: Auto-caption images for Instagram, Facebook, or Twitter.

4.3 Customizing Image Captioning

While Azure's default image captioning is powerful, you can customize it for specific use cases.

1. Adjusting Caption Sensitivity
  • By default, Azure returns one or more captions with confidence scores.
  • You can filter low-confidence captions to improve quality.
  • Example: Only use captions with confidence > 0.80.
2. Combining Captioning with Custom Object Detection
  • Standard captions may be too generic (e.g., "a person sitting").
  • If you need domain-specific captions, train a Custom Vision model to recognize specialized objects.
3. Translating Captions into Multiple Languages
  • Azure supports automatic translation of captions using Azure Translator API.
  • Example: Convert an English caption into Spanish or French.
Python Example: Translating an Image Caption
from azure.cognitiveservices.speech.translation import SpeechTranslationConfig

#Set up Azure Translator
translator = SpeechTranslationConfig(subscription="your_translator_api_key", region="your_region")
translator.speech_recognition_language = "en"
translator.add_target_language("es")  # Translate to Spanish

#Input caption
caption_text = "A dog playing in the park"

#Translate caption
translated_caption = translator.speech_synthesis_voice_name(caption_text, "es")

print(f"Translated Caption: {translated_caption}")

4.4 Real-World Applications of Image Captioning

1. Automatic Alt Text for Websites
  • Websites like Google, Wikipedia, and e-commerce sites can automatically generate alt text for images.
  • Example: A shopping website uses image captioning to describe products.
2. AI-Powered Accessibility for the Blind
  • Apps like Seeing AI use image captioning to describe surroundings to visually impaired users.
  • Example: A phone app can narrate what is in an image, such as "A person standing in front of a red car."
3. Social Media Auto-Captioning
  • Instagram and Facebook use AI-powered captions to automatically describe user-uploaded images.
  • Example: A social media post of a sunset by the beach could generate the caption:
    "A beautiful sunset over the ocean with orange and pink clouds."

4.5 Deploying Image Captioning at Scale

Deployment Options
Deployment Method Best For Example Use Case
Azure Functions Captioning individual images in real-time Social media post automation
Azure Kubernetes Service (AKS) Large-scale image analysis E-commerce product descriptions
Azure Batch Processing Bulk captioning of images News agency tagging historical images

4.6 Example: Deploying Image Captioning in an E-commerce Website

Challenge
  • An online store sells thousands of products but many items lack descriptions.
  • Users rely on product images, but they need automated captions.
Solution
  • Use Azure Computer Vision API to auto-generate product descriptions.
  • If an image shows a blue jacket, the AI could caption it as:
    • "A stylish blue jacket for winter wear."
Implementation Steps
  1. Upload images to Azure Blob Storage.
  2. Azure Functions triggers captioning when an image is uploaded.
  3. Store captions in a database for use in product descriptions.

5. Implementing Face Recognition

Face recognition is a key capability of Azure AI's Computer Vision services, enabling applications to detect, analyze, and recognize human faces. This feature is widely used in security systems, identity verification, emotion analysis, and social media applications.

5.1 What is Face Recognition?

Face recognition is an AI-driven technology that allows systems to identify and analyze human faces in images and videos. Azure Face API, part of Azure Cognitive Services, provides advanced face detection and recognition functionalities.

Key Features of Face Recognition
  • Face Detection: Identifies faces in images and returns details like location, size, and facial landmarks.
  • Face Verification: Compares two faces to determine if they belong to the same person.
  • Emotion Analysis: Detects emotions such as happiness, sadness, anger, and surprise.
  • Face Identification: Recognizes known individuals from a database of faces.
  • Age and Gender Estimation: Predicts a person’s age range and gender based on facial features.

5.2 How Azure Face API Works

Azure Face API processes images containing faces and returns structured metadata about detected individuals.

Step 1: Setting Up Azure Face API
  1. Create an Azure Face API Resource in the Azure Portal.
  2. Obtain the API Key and Endpoint URL from the Azure portal.
  3. Install the Azure Face API SDK:
pip install azure-cognitiveservices-vision-face

5.3 Detecting Faces in an Image

Face detection allows you to find faces in an image and analyze their features.

Python Example: Detecting Faces in an Image
from azure.cognitiveservices.vision.face import FaceClient
from msrest.authentication import CognitiveServicesCredentials

#Azure Face API credentials
API_KEY = "your_api_key"
ENDPOINT = "https://your-face-api-endpoint.com"

#Create a FaceClient instance
face_client = FaceClient(ENDPOINT, CognitiveServicesCredentials(API_KEY))

#Image containing faces
image_url = "https://example.com/group-photo.jpg"

#Detect faces
faces = face_client.face.detect_with_url(image_url, return_face_landmarks=True, return_face_attributes=["age", "gender", "emotion"])

#Print detected faces and attributes
for face in faces:
    print(f"Detected face at location {face.face_rectangle}")
    print(f"Age: {face.face_attributes.age}, Gender: {face.face_attributes.gender}")
    print(f"Emotions: {face.face_attributes.emotion}")

5.4 Understanding the API Response

The API response contains bounding box coordinates, age, gender, and emotion analysis.

{
    "faceId": "abcd1234",
    "faceRectangle": {
        "top": 100,
        "left": 200,
        "width": 80,
        "height": 80
    },
    "faceAttributes": {
        "age": 29,
        "gender": "male",
        "emotion": {
            "happiness": 0.95,
            "anger": 0.02,
            "sadness": 0.01
        }
    }
}
How to Use This Data?
  • Blur faces for privacy in public images.
  • Personalize user experiences based on detected emotions.
  • Automate customer feedback analysis by recognizing emotional reactions.

5.5 Implementing Face Verification

Face verification allows applications to compare two images and determine if they belong to the same person.

Use Cases
  • Identity verification in banking and secure logins.
  • Fraud prevention in financial transactions.
  • User authentication in mobile apps (face login).
Python Example: Face Verification
#Define two images to compare
image1 = "https://example.com/user-photo-1.jpg"
image2 = "https://example.com/user-photo-2.jpg"

#Detect faces in both images
faces1 = face_client.face.detect_with_url(image1)
faces2 = face_client.face.detect_with_url(image2)

#Extract face IDs
face_id1 = faces1[0].face_id
face_id2 = faces2[0].face_id

#Verify if they belong to the same person
verify_result = face_client.face.verify_face_to_face(face_id1, face_id2)

if verify_result.is_identical:
    print(f"Faces match with confidence {verify_result.confidence:.2f}")
else:
    print("Faces do not match")

5.6 Implementing Face Identification

Face identification allows applications to recognize known individuals from a database of faces.

How It Works
  1. Create a "Face Database" of registered users.
  2. Add multiple face images per person for better accuracy.
  3. Compare a new face against the database to identify the person.
Python Example: Face Identification
#Create a new person group
person_group_id = "my_users"
face_client.person_group.create(person_group_id, name="Users Database")

#Add a new user to the database
user_id = face_client.person_group_person.create(person_group_id, name="John Doe").person_id

#Add multiple images of the person
image_urls = ["https://example.com/john1.jpg", "https://example.com/john2.jpg"]
for img in image_urls:
    face_client.person_group_person.add_face_from_url(person_group_id, user_id, img)

#Train the model
face_client.person_group.train(person_group_id)

#Identify a face against the database
face_image = "https://example.com/test-photo.jpg"
faces = face_client.face.detect_with_url(face_image)
face_ids = [face.face_id for face in faces]

#Identify faces in the database
identify_results = face_client.face.identify(face_ids, person_group_id)

#Print results
for result in identify_results:
    for candidate in result.candidates:
        print(f"Identified User ID: {candidate.person_id}, Confidence: {candidate.confidence:.2f}")

5.7 Deploying Face Recognition Solutions

Deployment Options
Deployment Method Best For Example Use Case
Azure Functions Small-scale authentication Face login for mobile apps
Azure Kubernetes Service (AKS) Large-scale identity verification Face recognition in airports
Azure IoT Edge On-device face detection Smart cameras in security systems

5.8 Real-World Applications of Face Recognition

1. Face-Based Attendance Systems
  • Schools and offices use face recognition to track attendance automatically.
  • Example: Employees check-in using facial authentication instead of ID cards.
2. Smart Security and Access Control
  • Airports and offices use face verification for identity checks.
  • Example: Biometric security gates open for authorized personnel only.
3. Customer Experience Enhancement
  • Stores use emotion recognition to analyze customer reactions.
  • Example: If a customer looks frustrated, staff can offer assistance.

6. Training Custom Image Models with Azure Custom Vision

Azure Custom Vision is a powerful service that allows you to train and deploy custom image classification and object detection models. Unlike the pre-trained models in Azure Computer Vision API, Custom Vision lets you define specific objects or patterns that your AI should recognize.

6.1 What is Azure Custom Vision?

Azure Custom Vision is a cloud-based tool that enables training AI models to recognize images based on labeled data. It is useful when pre-trained models do not meet your specific requirements.

Key Features of Azure Custom Vision
  • Train custom AI models to detect domain-specific objects.
  • No programming expertise required – UI-based training available in Azure Portal.
  • Supports image classification and object detection.
  • Export models to edge devices for offline processing.

6.2 Image Classification vs. Object Detection

Azure Custom Vision supports two types of tasks:

Feature Image Classification Object Detection
Purpose Classifies images into categories Identifies and localizes objects in images
Example "Cat" vs. "Dog" image classification Detecting a cat in an image and marking its location
Use Case Sorting defective vs. non-defective products Identifying individual components on a factory belt

6.3 Setting Up Azure Custom Vision

Step 1: Create a Custom Vision Project
  1. Sign in to the Azure Portal (https://portal.azure.com).
  2. Navigate to Custom Vision → Click Create a New Project.
  3. Select Project Type:
    • Classification (identify image categories).
    • Object Detection (detect specific objects in images).
  4. Choose a training domain:
    • General (for most use cases).
    • Retail (for fashion, e-commerce).
    • Landmarks (for geographical recognition).
Step 2: Upload Training Data
  • Upload at least 50 images per category for classification.
  • For object detection, label objects using bounding boxes.
  • Example: If training an AI model to recognize dog breeds, provide multiple labeled images per breed.
Step 3: Train the Model
  1. Click Train Model in Azure Custom Vision.
  2. The system analyzes the images and trains a neural network.
  3. After training, view performance metrics (accuracy, precision, recall).
Step 4: Test the Model
  • Upload a new image that was not used in training.
  • The model will predict the category or detect objects.
  • Example: If the model is trained on cars, uploading a new car image should classify the brand correctly.

6.4 Implementing Custom Vision in Python

After training the model, use the Custom Vision API to send images and get predictions.

Python Example: Classifying an Image Using Custom Vision
from azure.cognitiveservices.vision.customvision.prediction import CustomVisionPredictionClient
from msrest.authentication import ApiKeyCredentials

#Azure Custom Vision credentials
API_KEY = "your_prediction_key"
ENDPOINT = "https://your-custom-vision-endpoint.com"
PROJECT_ID = "your_project_id"
MODEL_NAME = "your_model_name"

#Create a prediction client
credentials = ApiKeyCredentials(in_headers={"Prediction-key": API_KEY})
predictor = CustomVisionPredictionClient(ENDPOINT, credentials)

#Define image URL for classification
image_url = "https://example.com/sample-image.jpg"

#Send image for prediction
results = predictor.classify_image_url(PROJECT_ID, MODEL_NAME, image_url)

#Print prediction results
for prediction in results.predictions:
    print(f"Tag: {prediction.tag_name}, Confidence: {prediction.probability:.2f}")

6.5 Training an Object Detection Model in Custom Vision

Unlike classification, object detection requires bounding boxes.

Steps to Train an Object Detection Model
  1. Upload images containing objects to detect.
  2. Label objects by drawing bounding boxes.
  3. Train the model using Azure Custom Vision.
  4. Test detection on new images.
Python Example: Detecting Objects with Custom Vision
from azure.cognitiveservices.vision.customvision.prediction import CustomVisionPredictionClient

#Define Custom Vision credentials
PREDICTION_KEY = "your_prediction_key"
ENDPOINT = "https://your-custom-vision-endpoint.com"
PROJECT_ID = "your_project_id"
MODEL_NAME = "your_model_name"

#Create a prediction client
predictor = CustomVisionPredictionClient(ENDPOINT, ApiKeyCredentials(in_headers={"Prediction-key": PREDICTION_KEY}))

#Image for object detection
image_url = "https://example.com/sample-image.jpg"

#Perform object detection
results = predictor.detect_image_url(PROJECT_ID, MODEL_NAME, image_url)

#Print detected objects
for prediction in results.predictions:
    print(f"Detected: {prediction.tag_name} with confidence {prediction.probability:.2f}")

6.6 Deploying Custom Vision Models

After training and testing, deploy the model for real-world use.

Deployment Options
Deployment Method Best For Example Use Case
Azure Cloud API Scalable cloud-based predictions Real-time image classification in a mobile app
Azure IoT Edge Low-latency, on-device predictions Factory defect detection in real-time
Embedded AI (ONNX, TensorFlow) Running AI offline AI-powered security cameras

6.7 Real-World Applications of Custom Vision

1. Defect Detection in Manufacturing
  • AI detects scratches or damages in factory-produced goods.
  • Example: A car assembly line identifies misaligned parts automatically.
2. Personalized Fashion Recommendations
  • AI detects clothing style, colors, and brands in shopping apps.
  • Example: A fashion retailer suggests outfits based on a user’s uploaded photos.
3. Smart Agricultural Monitoring
  • AI identifies crop diseases or pests from field images.
  • Example: A farm monitoring system sends alerts when crops need attention.

7. Deploying Computer Vision Models

Once a computer vision model is trained and tested, the next step is to deploy it for real-world applications. Deployment ensures that AI-powered image recognition, object detection, or face recognition can be used in applications at scale.

7.1 Deployment Methods for Computer Vision Models

Azure provides multiple ways to deploy a computer vision model, depending on performance, scalability, and cost considerations.

Deployment Method Best For Example Use Case
Azure Cloud API (Serverless) Scalable AI services Mobile apps performing real-time image classification
Azure Kubernetes Service (AKS) Large-scale AI model deployment AI-driven security systems monitoring city-wide cameras
Azure IoT Edge On-device AI inference AI-powered smart cameras in retail stores
Azure App Services Deploying AI-powered web applications E-commerce platforms using image search

7.2 Deploying Computer Vision as a Cloud API

The easiest way to deploy a trained AI model is as an Azure Cloud API, allowing applications to send images and receive predictions in real-time.

Steps to Deploy an AI Model as a Cloud API
  1. Train and export the model (Custom Vision, TensorFlow, ONNX).
  2. Deploy the model as an Azure Function or Web App.
  3. Expose a REST API endpoint for applications to use the model.
  4. Scale using Azure Load Balancer and Autoscale.
Python Example: Deploying an AI Model as an API

The following Flask-based Python API allows users to upload an image and get classification results from a Custom Vision model.

from flask import Flask, request, jsonify
from azure.cognitiveservices.vision.customvision.prediction import CustomVisionPredictionClient
from msrest.authentication import ApiKeyCredentials

#Azure Custom Vision API credentials
PREDICTION_KEY = "your_prediction_key"
ENDPOINT = "https://your-custom-vision-endpoint.com"
PROJECT_ID = "your_project_id"
MODEL_NAME = "your_model_name"

#Create Flask app
app = Flask(__name__)

#Define the prediction function
@app.route('/predict', methods=['POST'])
def predict():
    file = request.files['image']
    image_data = file.read()

    #Create prediction client
    credentials = ApiKeyCredentials(in_headers={"Prediction-key": PREDICTION_KEY})
    predictor = CustomVisionPredictionClient(ENDPOINT, credentials)

    #Send image to model
    results = predictor.classify_image(PROJECT_ID, MODEL_NAME, image_data)

    #Prepare the response
    predictions = [{"tag": p.tag_name, "confidence": p.probability} for p in results.predictions]
    return jsonify(predictions)

#Run the Flask app
if __name__ == '__main__':
    app.run(debug=True)
How It Works:
  • Users send an image via POST request.
  • The Flask API calls the Azure Custom Vision model.
  • The API returns classification results in JSON format.

7.3 Deploying AI Models with Azure Kubernetes Service (AKS)

For high-scale applications, AI models can be containerized and deployed using Azure Kubernetes Service (AKS).

Why Use AKS for AI Deployment?
  • Handles large-scale AI requests by distributing the load across multiple servers.
  • Enables AI model versioning (A/B testing of new model versions).
  • Automatically scales based on usage.
Steps to Deploy a Computer Vision Model with AKS
  1. Containerize the AI model using Docker.
  2. Push the Docker image to Azure Container Registry (ACR).
  3. Deploy the containerized AI model to an AKS cluster.
  4. Expose the model as an API endpoint for applications to use.
Example: Deploying an AI Model in a Docker Container
#Use Python as base image
FROM python:3.9

#Install dependencies
RUN pip install flask azure-cognitiveservices-vision-customvision

#Copy the model script
COPY app.py /app/app.py

#Run the API server
CMD ["python", "/app/app.py"]

7.4 Deploying AI Models to Azure IoT Edge

For real-time, low-latency AI inference, deploying models to edge devices (such as security cameras, drones, or industrial machines) is the best approach.

Why Use Azure IoT Edge for AI Deployment?
  • Reduces latency (AI runs locally without cloud dependency).
  • Works in offline environments (factories, remote locations).
  • Optimized for low-power devices (Raspberry Pi, Nvidia Jetson).
Steps to Deploy an AI Model to IoT Edge
  1. Train and optimize the model for edge devices (convert to ONNX format).
  2. Deploy the AI model as an IoT Edge module.
  3. Use IoT Edge Hub for real-time AI processing.
  4. Send only important results to the cloud (reduces data costs).
Example: Converting a TensorFlow Model to ONNX for Edge Deployment
import tf2onnx
import tensorflow as tf

#Load trained TensorFlow model
model = tf.keras.models.load_model("custom_model.h5")

#Convert to ONNX format for IoT Edge
onnx_model, _ = tf2onnx.convert.from_keras(model)
with open("custom_model.onnx", "wb") as f:
    f.write(onnx_model.SerializeToString())

7.5 Monitoring and Updating AI Models

Once an AI model is deployed, it needs to be monitored for performance, drift, and errors.

Monitoring AI Models with Azure Machine Learning
  • Track model accuracy and confidence scores over time.
  • Detect "model drift" (when real-world data changes, making the model outdated).
  • Automate AI model updates when new data is available.
Example: Automating Model Updates
from azureml.core import Workspace, Model

#Connect to Azure ML workspace
ws = Workspace.from_config()

#Register a new version of the AI model
model = Model.register(
    workspace=ws,
    model_name="new_vision_model",
    model_path="./new_model.onnx"
)

7.6 Real-World Applications of AI Deployment

1. AI-Powered Retail Image Search
  • Users upload a photo of a product and the AI finds similar items in stock.
  • Deployed as an API in an e-commerce platform.
2. Smart Traffic Surveillance
  • AI detects violations (speeding, red-light running) using camera feeds.
  • Deployed on Azure IoT Edge for real-time AI inference.
3. AI in Healthcare
  • AI models analyze X-rays and MRI scans for disease detection.
  • Deployed in hospitals as an AKS-based cloud service.

7.7 Choosing the Best Deployment Strategy

Deployment Option Best For Example Use Case
Azure Functions (Cloud API) Real-time, low-traffic AI services AI chatbot that detects objects in images
Azure Kubernetes Service (AKS) Large-scale AI inference AI-powered security monitoring system
Azure IoT Edge Offline, real-time AI processing AI-powered quality control on factory floors
Azure App Services AI-powered web applications AI-driven product recommendations

Implement computer vision solutions (Additional Content)

1. Technical Limitations of Azure Computer Vision Models

Understanding the limitations of Azure’s prebuilt computer vision models is critical for both solution architecture and exam performance, especially in scenarios where resource constraints, time sensitivity, or edge compatibility are involved.

1.1 Common Technical Constraints

Aspect Limitation
Image Size Maximum input size is typically 50 MB. Resolutions above 4200×4200 pixels may be rejected.
Image Formats Supported formats: JPEG, PNG, BMP, and GIF. RAW and TIFF are not supported.
API Rate Limits For standard pricing tiers: ~10–20 requests/sec per subscription (subject to change by region and tier).
Latency Varies by model type. For Computer Vision 4.0, response time is typically less than 1 second for standard images.
Timeout Handling Requests that exceed processing limits (e.g., large files or network lag) will return HTTP 408/504 errors.
Language Support OCR and captioning may not support all languages. Check documentation for language-specific limitations.

1.2 Example Exam-Relevant Scenario

A user attempts to analyze a high-resolution satellite image (8000x8000 pixels, 80MB file size) using Azure’s Analyze Image API. What is the likely result?

Correct Answer: The request will fail due to exceeding image size and resolution limits.

2. Comparing Azure Computer Vision Services

Azure provides multiple vision-related services, each optimized for different use cases. Candidates are often asked to select the most appropriate service based on a described scenario.

2.1 Service Comparison Table

Service Best For Customization Example Use Case
Computer Vision General-purpose image analysis, OCR, spatial analysis Pre-trained only Extracting text from invoices
Custom Vision Domain-specific image classification or object detection Train your own models Defect detection on factory assembly lines
Face API Detecting faces, identifying individuals, emotion estimation Pre-trained (Face data only) Multi-user access control with facial login
Form Recognizer Structured document extraction (tables, fields, key-values) Custom models supported Reading data from scanned insurance forms
Azure OpenAI DALL·E Generating synthetic images from prompts Pre-trained Creating marketing assets from descriptions

2.2 Key Distinctions

Feature Computer Vision Custom Vision Face API
Can detect faces? Yes (but limited) No Yes (advanced)
Custom training? No Yes No
Emotion detection? No No Yes
Object detection? No Yes No
OCR support? Yes No No

2.3 Decision-Making Tips for Exams

  • If the question mentions domain-specific classification (e.g., classifying different types of mushrooms), Custom Vision is the right choice.

  • If it requires face emotion or identity recognition, use Face API.

  • If it's about reading text or analyzing general images, use Computer Vision API.

2.4 Example Question for Comparison

You are developing a system to classify different types of machinery parts from images. You need to train a model using a labeled dataset. Which Azure service should you choose?

Correct Answer: Custom Vision

Frequently Asked Questions

An image processed with the Azure Vision Read API returns an empty text result even though the image clearly contains text. What is the most likely cause?

Answer:

The image resolution or text clarity is insufficient for OCR detection.

Explanation:

Azure Vision OCR models require sufficient image quality to identify text regions accurately. If the text is too small, blurry, rotated excessively, or obscured by background noise, the model may fail to detect it. Another common issue occurs when the image resolution is very low or heavily compressed, reducing character clarity. Developers sometimes assume OCR failures indicate API problems, but in most cases the underlying cause is input quality. Pre-processing steps such as increasing resolution, improving contrast, or correcting orientation can significantly improve OCR results. Understanding these limitations helps design reliable document-processing pipelines using computer vision services.

Demand Score: 72

Exam Relevance Score: 81

When processing structured documents such as invoices or forms, why might Azure Document Intelligence be preferred over the Vision OCR API?

Answer:

Document Intelligence can extract structured fields and layout information in addition to text.

Explanation:

The Vision Read API focuses primarily on recognizing text within images and returning the detected content along with bounding boxes. However, many document-processing scenarios require understanding document structure such as tables, key-value pairs, and form fields. Azure Document Intelligence provides pretrained models specifically designed for structured documents like invoices, receipts, and contracts. These models identify semantic elements such as totals, vendor names, and table rows. Developers designing document-automation workflows often use Document Intelligence because it eliminates the need to manually parse OCR text output to reconstruct document structure. Selecting the correct service ensures efficient processing and accurate extraction of structured information.

Demand Score: 65

Exam Relevance Score: 83

Why might the Azure Vision Read API fail to detect text when the image contains handwritten notes?

Answer:

The handwriting style may fall outside the model’s recognition capabilities.

Explanation:

Although Azure Vision OCR supports handwriting recognition, accuracy depends heavily on handwriting clarity and style. Highly cursive writing, irregular character spacing, or overlapping characters can reduce detection accuracy. The OCR model performs best with clear block handwriting or printed text. Developers processing handwritten forms often encounter inconsistent results when handwriting quality varies significantly. In such cases, image preprocessing or specialized document models may be required. Understanding the limitations of handwriting recognition helps determine whether OCR alone is sufficient or whether alternative document processing techniques should be used.

Demand Score: 64

Exam Relevance Score: 74

What is a common cause of OCR misreading characters such as “0” and “O” or “1” and “I”?

Answer:

Similar character shapes combined with low image quality lead to recognition ambiguity.

Explanation:

OCR models analyze visual patterns to classify characters. When characters have nearly identical shapes—such as zero and the letter O—the model must rely on contextual cues to distinguish them. If the image quality is poor or the surrounding context is limited, the model may select the incorrect interpretation. Fonts with minimal visual differences between characters further increase ambiguity. Developers often encounter this issue when processing scanned documents with low resolution or compression artifacts. Improving image quality or validating extracted text against expected formats can reduce the impact of these OCR ambiguities.

Demand Score: 60

Exam Relevance Score: 71

AI-102 Training Course
$68$29.99
AI-102 Training Course