Computer vision enables computers to interpret and analyze visual data, just like the human eye. Azure provides powerful AI-based computer vision tools that allow applications to recognize objects, extract text from images, and even understand emotions from facial expressions.
Azure Computer Vision API is a cloud-based service that provides advanced image and video analysis using deep learning models.
The Azure Computer Vision API allows developers to process and analyze images in various ways, including:
The service processes an image and returns a structured JSON response containing the detected objects, text, faces, and metadata.
For Python users, install the azure-cognitiveservices-vision-computervision package:
pip install azure-cognitiveservices-vision-computervision
import requests
#Define API endpoint and key
endpoint = "https://your-computer-vision-endpoint.com"
api_key = "your_api_key"
#Image URL to analyze
image_url = "https://example.com/sample-image.jpg"
#Set up headers
headers = {
"Ocp-Apim-Subscription-Key": api_key,
"Content-Type": "application/json"
}
#Define API request payload
data = {
"url": image_url
}
#Make a request to analyze the image
response = requests.post(f"{endpoint}/vision/v3.2/analyze?visualFeatures=Tags,Description", headers=headers, json=data)
#Print the response
print(response.json())
When the request is processed, the API returns a JSON response like this:
{
"description": {
"captions": [
{
"text": "A man riding a bicycle in a park",
"confidence": 0.95
}
]
},
"tags": [
{"name": "man", "confidence": 0.98},
{"name": "bicycle", "confidence": 0.95},
{"name": "park", "confidence": 0.90}
]
}
"A man riding a bicycle in a park") can be used for automated accessibility features.man, bicycle, park) can be used for image categorization.Object detection enables AI to identify objects in an image and classify them into categories. Azure Computer Vision API provides pre-trained object detection models, but you can also train custom object detection models using Azure Custom Vision.
Here’s how to use Azure’s Object Detection API to detect objects in an image.
from azure.cognitiveservices.vision.computervision import ComputerVisionClient
from msrest.authentication import CognitiveServicesCredentials
#Azure Computer Vision Credentials
API_KEY = "your_api_key"
ENDPOINT = "https://your-computer-vision-endpoint.com"
#Create a Computer Vision client
client = ComputerVisionClient(ENDPOINT, CognitiveServicesCredentials(API_KEY))
#Define the image URL
image_url = "https://example.com/car.jpg"
#Perform object detection
objects = client.analyze_image(image_url, visual_features=["Objects"])
#Print detected objects
for obj in objects.objects:
print(f"Detected {obj.object_property} with confidence {obj.confidence}")
The response will look like this:
{
"objects": [
{
"object": "Car",
"confidence": 0.98,
"rectangle": {
"x": 120,
"y": 200,
"w": 400,
"h": 300
}
}
]
}
Azure Custom Vision allows you to train your own object detection models if the pre-trained model does not detect specific objects relevant to your business.
Upload Training Data
Train the Model
Deploy the Model
Optical Character Recognition (OCR) is a key feature of Azure Computer Vision that allows AI to extract text from images, scanned documents, and handwritten notes. This capability is essential for applications in document automation, digital archiving, and accessibility services.
OCR is a technology that enables computers to read text from images or documents and convert it into machine-readable text.
Azure OCR can analyze images containing printed or handwritten text and return structured text data.
pip install azure-cognitiveservices-vision-computervision
We can send an image containing text to the Azure OCR API and retrieve recognized words.
from azure.cognitiveservices.vision.computervision import ComputerVisionClient
from msrest.authentication import CognitiveServicesCredentials
#Azure Computer Vision credentials
API_KEY = "your_api_key"
ENDPOINT = "https://your-computer-vision-endpoint.com"
#Create a Computer Vision client
client = ComputerVisionClient(ENDPOINT, CognitiveServicesCredentials(API_KEY))
#Define image URL (could also use local image)
image_url = "https://example.com/sample-text-image.jpg"
#Perform OCR on the image
ocr_results = client.read(image_url, raw=True)
#Wait for OCR results to be available
import time
operation_location = ocr_results.headers["Operation-Location"]
operation_id = operation_location.split("/")[-1]
while True:
result = client.get_read_result(operation_id)
if result.status not in ["notStarted", "running"]:
break
time.sleep(1)
#Print extracted text
for page in result.analyze_result.read_results:
for line in page.lines:
print(line.text)
When the OCR API processes an image, it returns a JSON response containing recognized text and position coordinates.
{
"read_results": [
{
"page": 1,
"lines": [
{
"text": "Welcome to Azure AI",
"bounding_box": [50, 50, 200, 50]
},
{
"text": "Computer Vision is powerful",
"bounding_box": [50, 100, 300, 100]
}
]
}
]
}
Azure OCR can extract structured text from:
If an image contains an invoice, Azure OCR can recognize structured fields:
| Field | Extracted Value |
|---|---|
| Invoice Number | 12345 |
| Customer Name | John Doe |
| Total Amount | $250.00 |
from azure.ai.formrecognizer import DocumentAnalysisClient
from azure.core.credentials import AzureKeyCredential
#Azure Form Recognizer credentials
API_KEY = "your_api_key"
ENDPOINT = "https://your-form-recognizer-endpoint.com"
#Create a client
client = DocumentAnalysisClient(ENDPOINT, AzureKeyCredential(API_KEY))
#Image URL of an invoice
invoice_url = "https://example.com/sample-invoice.jpg"
#Analyze the invoice
poller = client.begin_analyze_document_from_url("prebuilt-invoice", invoice_url)
result = poller.result()
#Extract key fields
for field_name, field in result.fields.items():
print(f"{field_name}: {field.value}")
Azure OCR also supports handwriting recognition.
ocr_results = client.read(image_url, raw=True)
operation_location = ocr_results.headers["Operation-Location"]
operation_id = operation_location.split("/")[-1]
while True:
result = client.get_read_result(operation_id)
if result.status not in ["notStarted", "running"]:
break
time.sleep(1)
#Print extracted handwritten text
for page in result.analyze_result.read_results:
for line in page.lines:
print(line.text)
Meeting at 3 PM
Call John for updates
Buy groceries
After implementing OCR, the next step is deploying it at scale.
| Deployment Method | Best For | Example |
|---|---|---|
| Azure Functions | Real-time text extraction | Chat applications that moderate offensive text |
| Azure Kubernetes Service (AKS) | High-scale document processing | Bank processing millions of receipts daily |
| Azure Form Recognizer | Structured document parsing | Automating invoice and contract processing |
Image captioning is a computer vision technique that allows AI to automatically generate textual descriptions of images. It combines deep learning-based image recognition with Natural Language Processing (NLP) to produce human-like descriptions of visual content.
Image captioning enables AI to analyze an image and describe it in natural language. This is useful for:
Azure's Computer Vision API uses deep learning models to:
To use image captioning, first set up Azure Computer Vision:
pip install azure-cognitiveservices-vision-computervision
The following Python script sends an image to Azure Computer Vision and returns a caption.
from azure.cognitiveservices.vision.computervision import ComputerVisionClient
from msrest.authentication import CognitiveServicesCredentials
#Azure Computer Vision credentials
API_KEY = "your_api_key"
ENDPOINT = "https://your-computer-vision-endpoint.com"
#Create a Computer Vision client
client = ComputerVisionClient(ENDPOINT, CognitiveServicesCredentials(API_KEY))
#Define image URL
image_url = "https://example.com/sample-image.jpg"
#Analyze the image for captions
description_results = client.describe_image(image_url)
#Print generated captions
for caption in description_results.captions:
print(f"Caption: {caption.text}, Confidence: {caption.confidence:.2f}")
When the API processes an image, it returns a JSON response with generated captions and confidence scores.
{
"captions": [
{
"text": "A cat sitting on a couch",
"confidence": 0.96
}
]
}
While Azure's default image captioning is powerful, you can customize it for specific use cases.
from azure.cognitiveservices.speech.translation import SpeechTranslationConfig
#Set up Azure Translator
translator = SpeechTranslationConfig(subscription="your_translator_api_key", region="your_region")
translator.speech_recognition_language = "en"
translator.add_target_language("es") # Translate to Spanish
#Input caption
caption_text = "A dog playing in the park"
#Translate caption
translated_caption = translator.speech_synthesis_voice_name(caption_text, "es")
print(f"Translated Caption: {translated_caption}")
| Deployment Method | Best For | Example Use Case |
|---|---|---|
| Azure Functions | Captioning individual images in real-time | Social media post automation |
| Azure Kubernetes Service (AKS) | Large-scale image analysis | E-commerce product descriptions |
| Azure Batch Processing | Bulk captioning of images | News agency tagging historical images |
Face recognition is a key capability of Azure AI's Computer Vision services, enabling applications to detect, analyze, and recognize human faces. This feature is widely used in security systems, identity verification, emotion analysis, and social media applications.
Face recognition is an AI-driven technology that allows systems to identify and analyze human faces in images and videos. Azure Face API, part of Azure Cognitive Services, provides advanced face detection and recognition functionalities.
Azure Face API processes images containing faces and returns structured metadata about detected individuals.
pip install azure-cognitiveservices-vision-face
Face detection allows you to find faces in an image and analyze their features.
from azure.cognitiveservices.vision.face import FaceClient
from msrest.authentication import CognitiveServicesCredentials
#Azure Face API credentials
API_KEY = "your_api_key"
ENDPOINT = "https://your-face-api-endpoint.com"
#Create a FaceClient instance
face_client = FaceClient(ENDPOINT, CognitiveServicesCredentials(API_KEY))
#Image containing faces
image_url = "https://example.com/group-photo.jpg"
#Detect faces
faces = face_client.face.detect_with_url(image_url, return_face_landmarks=True, return_face_attributes=["age", "gender", "emotion"])
#Print detected faces and attributes
for face in faces:
print(f"Detected face at location {face.face_rectangle}")
print(f"Age: {face.face_attributes.age}, Gender: {face.face_attributes.gender}")
print(f"Emotions: {face.face_attributes.emotion}")
The API response contains bounding box coordinates, age, gender, and emotion analysis.
{
"faceId": "abcd1234",
"faceRectangle": {
"top": 100,
"left": 200,
"width": 80,
"height": 80
},
"faceAttributes": {
"age": 29,
"gender": "male",
"emotion": {
"happiness": 0.95,
"anger": 0.02,
"sadness": 0.01
}
}
}
Face verification allows applications to compare two images and determine if they belong to the same person.
#Define two images to compare
image1 = "https://example.com/user-photo-1.jpg"
image2 = "https://example.com/user-photo-2.jpg"
#Detect faces in both images
faces1 = face_client.face.detect_with_url(image1)
faces2 = face_client.face.detect_with_url(image2)
#Extract face IDs
face_id1 = faces1[0].face_id
face_id2 = faces2[0].face_id
#Verify if they belong to the same person
verify_result = face_client.face.verify_face_to_face(face_id1, face_id2)
if verify_result.is_identical:
print(f"Faces match with confidence {verify_result.confidence:.2f}")
else:
print("Faces do not match")
Face identification allows applications to recognize known individuals from a database of faces.
#Create a new person group
person_group_id = "my_users"
face_client.person_group.create(person_group_id, name="Users Database")
#Add a new user to the database
user_id = face_client.person_group_person.create(person_group_id, name="John Doe").person_id
#Add multiple images of the person
image_urls = ["https://example.com/john1.jpg", "https://example.com/john2.jpg"]
for img in image_urls:
face_client.person_group_person.add_face_from_url(person_group_id, user_id, img)
#Train the model
face_client.person_group.train(person_group_id)
#Identify a face against the database
face_image = "https://example.com/test-photo.jpg"
faces = face_client.face.detect_with_url(face_image)
face_ids = [face.face_id for face in faces]
#Identify faces in the database
identify_results = face_client.face.identify(face_ids, person_group_id)
#Print results
for result in identify_results:
for candidate in result.candidates:
print(f"Identified User ID: {candidate.person_id}, Confidence: {candidate.confidence:.2f}")
| Deployment Method | Best For | Example Use Case |
|---|---|---|
| Azure Functions | Small-scale authentication | Face login for mobile apps |
| Azure Kubernetes Service (AKS) | Large-scale identity verification | Face recognition in airports |
| Azure IoT Edge | On-device face detection | Smart cameras in security systems |
Azure Custom Vision is a powerful service that allows you to train and deploy custom image classification and object detection models. Unlike the pre-trained models in Azure Computer Vision API, Custom Vision lets you define specific objects or patterns that your AI should recognize.
Azure Custom Vision is a cloud-based tool that enables training AI models to recognize images based on labeled data. It is useful when pre-trained models do not meet your specific requirements.
Azure Custom Vision supports two types of tasks:
| Feature | Image Classification | Object Detection |
|---|---|---|
| Purpose | Classifies images into categories | Identifies and localizes objects in images |
| Example | "Cat" vs. "Dog" image classification | Detecting a cat in an image and marking its location |
| Use Case | Sorting defective vs. non-defective products | Identifying individual components on a factory belt |
After training the model, use the Custom Vision API to send images and get predictions.
from azure.cognitiveservices.vision.customvision.prediction import CustomVisionPredictionClient
from msrest.authentication import ApiKeyCredentials
#Azure Custom Vision credentials
API_KEY = "your_prediction_key"
ENDPOINT = "https://your-custom-vision-endpoint.com"
PROJECT_ID = "your_project_id"
MODEL_NAME = "your_model_name"
#Create a prediction client
credentials = ApiKeyCredentials(in_headers={"Prediction-key": API_KEY})
predictor = CustomVisionPredictionClient(ENDPOINT, credentials)
#Define image URL for classification
image_url = "https://example.com/sample-image.jpg"
#Send image for prediction
results = predictor.classify_image_url(PROJECT_ID, MODEL_NAME, image_url)
#Print prediction results
for prediction in results.predictions:
print(f"Tag: {prediction.tag_name}, Confidence: {prediction.probability:.2f}")
Unlike classification, object detection requires bounding boxes.
from azure.cognitiveservices.vision.customvision.prediction import CustomVisionPredictionClient
#Define Custom Vision credentials
PREDICTION_KEY = "your_prediction_key"
ENDPOINT = "https://your-custom-vision-endpoint.com"
PROJECT_ID = "your_project_id"
MODEL_NAME = "your_model_name"
#Create a prediction client
predictor = CustomVisionPredictionClient(ENDPOINT, ApiKeyCredentials(in_headers={"Prediction-key": PREDICTION_KEY}))
#Image for object detection
image_url = "https://example.com/sample-image.jpg"
#Perform object detection
results = predictor.detect_image_url(PROJECT_ID, MODEL_NAME, image_url)
#Print detected objects
for prediction in results.predictions:
print(f"Detected: {prediction.tag_name} with confidence {prediction.probability:.2f}")
After training and testing, deploy the model for real-world use.
| Deployment Method | Best For | Example Use Case |
|---|---|---|
| Azure Cloud API | Scalable cloud-based predictions | Real-time image classification in a mobile app |
| Azure IoT Edge | Low-latency, on-device predictions | Factory defect detection in real-time |
| Embedded AI (ONNX, TensorFlow) | Running AI offline | AI-powered security cameras |
Once a computer vision model is trained and tested, the next step is to deploy it for real-world applications. Deployment ensures that AI-powered image recognition, object detection, or face recognition can be used in applications at scale.
Azure provides multiple ways to deploy a computer vision model, depending on performance, scalability, and cost considerations.
| Deployment Method | Best For | Example Use Case |
|---|---|---|
| Azure Cloud API (Serverless) | Scalable AI services | Mobile apps performing real-time image classification |
| Azure Kubernetes Service (AKS) | Large-scale AI model deployment | AI-driven security systems monitoring city-wide cameras |
| Azure IoT Edge | On-device AI inference | AI-powered smart cameras in retail stores |
| Azure App Services | Deploying AI-powered web applications | E-commerce platforms using image search |
The easiest way to deploy a trained AI model is as an Azure Cloud API, allowing applications to send images and receive predictions in real-time.
The following Flask-based Python API allows users to upload an image and get classification results from a Custom Vision model.
from flask import Flask, request, jsonify
from azure.cognitiveservices.vision.customvision.prediction import CustomVisionPredictionClient
from msrest.authentication import ApiKeyCredentials
#Azure Custom Vision API credentials
PREDICTION_KEY = "your_prediction_key"
ENDPOINT = "https://your-custom-vision-endpoint.com"
PROJECT_ID = "your_project_id"
MODEL_NAME = "your_model_name"
#Create Flask app
app = Flask(__name__)
#Define the prediction function
@app.route('/predict', methods=['POST'])
def predict():
file = request.files['image']
image_data = file.read()
#Create prediction client
credentials = ApiKeyCredentials(in_headers={"Prediction-key": PREDICTION_KEY})
predictor = CustomVisionPredictionClient(ENDPOINT, credentials)
#Send image to model
results = predictor.classify_image(PROJECT_ID, MODEL_NAME, image_data)
#Prepare the response
predictions = [{"tag": p.tag_name, "confidence": p.probability} for p in results.predictions]
return jsonify(predictions)
#Run the Flask app
if __name__ == '__main__':
app.run(debug=True)
For high-scale applications, AI models can be containerized and deployed using Azure Kubernetes Service (AKS).
#Use Python as base image
FROM python:3.9
#Install dependencies
RUN pip install flask azure-cognitiveservices-vision-customvision
#Copy the model script
COPY app.py /app/app.py
#Run the API server
CMD ["python", "/app/app.py"]
For real-time, low-latency AI inference, deploying models to edge devices (such as security cameras, drones, or industrial machines) is the best approach.
import tf2onnx
import tensorflow as tf
#Load trained TensorFlow model
model = tf.keras.models.load_model("custom_model.h5")
#Convert to ONNX format for IoT Edge
onnx_model, _ = tf2onnx.convert.from_keras(model)
with open("custom_model.onnx", "wb") as f:
f.write(onnx_model.SerializeToString())
Once an AI model is deployed, it needs to be monitored for performance, drift, and errors.
from azureml.core import Workspace, Model
#Connect to Azure ML workspace
ws = Workspace.from_config()
#Register a new version of the AI model
model = Model.register(
workspace=ws,
model_name="new_vision_model",
model_path="./new_model.onnx"
)
| Deployment Option | Best For | Example Use Case |
|---|---|---|
| Azure Functions (Cloud API) | Real-time, low-traffic AI services | AI chatbot that detects objects in images |
| Azure Kubernetes Service (AKS) | Large-scale AI inference | AI-powered security monitoring system |
| Azure IoT Edge | Offline, real-time AI processing | AI-powered quality control on factory floors |
| Azure App Services | AI-powered web applications | AI-driven product recommendations |
Understanding the limitations of Azure’s prebuilt computer vision models is critical for both solution architecture and exam performance, especially in scenarios where resource constraints, time sensitivity, or edge compatibility are involved.
| Aspect | Limitation |
|---|---|
| Image Size | Maximum input size is typically 50 MB. Resolutions above 4200×4200 pixels may be rejected. |
| Image Formats | Supported formats: JPEG, PNG, BMP, and GIF. RAW and TIFF are not supported. |
| API Rate Limits | For standard pricing tiers: ~10–20 requests/sec per subscription (subject to change by region and tier). |
| Latency | Varies by model type. For Computer Vision 4.0, response time is typically less than 1 second for standard images. |
| Timeout Handling | Requests that exceed processing limits (e.g., large files or network lag) will return HTTP 408/504 errors. |
| Language Support | OCR and captioning may not support all languages. Check documentation for language-specific limitations. |
A user attempts to analyze a high-resolution satellite image (8000x8000 pixels, 80MB file size) using Azure’s Analyze Image API. What is the likely result?
Correct Answer: The request will fail due to exceeding image size and resolution limits.
Azure provides multiple vision-related services, each optimized for different use cases. Candidates are often asked to select the most appropriate service based on a described scenario.
| Service | Best For | Customization | Example Use Case |
|---|---|---|---|
| Computer Vision | General-purpose image analysis, OCR, spatial analysis | Pre-trained only | Extracting text from invoices |
| Custom Vision | Domain-specific image classification or object detection | Train your own models | Defect detection on factory assembly lines |
| Face API | Detecting faces, identifying individuals, emotion estimation | Pre-trained (Face data only) | Multi-user access control with facial login |
| Form Recognizer | Structured document extraction (tables, fields, key-values) | Custom models supported | Reading data from scanned insurance forms |
| Azure OpenAI DALL·E | Generating synthetic images from prompts | Pre-trained | Creating marketing assets from descriptions |
| Feature | Computer Vision | Custom Vision | Face API |
|---|---|---|---|
| Can detect faces? | Yes (but limited) | No | Yes (advanced) |
| Custom training? | No | Yes | No |
| Emotion detection? | No | No | Yes |
| Object detection? | No | Yes | No |
| OCR support? | Yes | No | No |
If the question mentions domain-specific classification (e.g., classifying different types of mushrooms), Custom Vision is the right choice.
If it requires face emotion or identity recognition, use Face API.
If it's about reading text or analyzing general images, use Computer Vision API.
You are developing a system to classify different types of machinery parts from images. You need to train a model using a labeled dataset. Which Azure service should you choose?
Correct Answer: Custom Vision
An image processed with the Azure Vision Read API returns an empty text result even though the image clearly contains text. What is the most likely cause?
The image resolution or text clarity is insufficient for OCR detection.
Azure Vision OCR models require sufficient image quality to identify text regions accurately. If the text is too small, blurry, rotated excessively, or obscured by background noise, the model may fail to detect it. Another common issue occurs when the image resolution is very low or heavily compressed, reducing character clarity. Developers sometimes assume OCR failures indicate API problems, but in most cases the underlying cause is input quality. Pre-processing steps such as increasing resolution, improving contrast, or correcting orientation can significantly improve OCR results. Understanding these limitations helps design reliable document-processing pipelines using computer vision services.
Demand Score: 72
Exam Relevance Score: 81
When processing structured documents such as invoices or forms, why might Azure Document Intelligence be preferred over the Vision OCR API?
Document Intelligence can extract structured fields and layout information in addition to text.
The Vision Read API focuses primarily on recognizing text within images and returning the detected content along with bounding boxes. However, many document-processing scenarios require understanding document structure such as tables, key-value pairs, and form fields. Azure Document Intelligence provides pretrained models specifically designed for structured documents like invoices, receipts, and contracts. These models identify semantic elements such as totals, vendor names, and table rows. Developers designing document-automation workflows often use Document Intelligence because it eliminates the need to manually parse OCR text output to reconstruct document structure. Selecting the correct service ensures efficient processing and accurate extraction of structured information.
Demand Score: 65
Exam Relevance Score: 83
Why might the Azure Vision Read API fail to detect text when the image contains handwritten notes?
The handwriting style may fall outside the model’s recognition capabilities.
Although Azure Vision OCR supports handwriting recognition, accuracy depends heavily on handwriting clarity and style. Highly cursive writing, irregular character spacing, or overlapping characters can reduce detection accuracy. The OCR model performs best with clear block handwriting or printed text. Developers processing handwritten forms often encounter inconsistent results when handwriting quality varies significantly. In such cases, image preprocessing or specialized document models may be required. Understanding the limitations of handwriting recognition helps determine whether OCR alone is sufficient or whether alternative document processing techniques should be used.
Demand Score: 64
Exam Relevance Score: 74
What is a common cause of OCR misreading characters such as “0” and “O” or “1” and “I”?
Similar character shapes combined with low image quality lead to recognition ambiguity.
OCR models analyze visual patterns to classify characters. When characters have nearly identical shapes—such as zero and the letter O—the model must rely on contextual cues to distinguish them. If the image quality is poor or the surrounding context is limited, the model may select the incorrect interpretation. Fonts with minimal visual differences between characters further increase ambiguity. Developers often encounter this issue when processing scanned documents with low resolution or compression artifacts. Improving image quality or validating extracted text against expected formats can reduce the impact of these OCR ambiguities.
Demand Score: 60
Exam Relevance Score: 71