Generative AI enables computers to generate human-like text, code, and images using pre-trained AI models. Azure provides Azure OpenAI Service, which offers access to GPT-4, Codex, and DALL·E for text, code, and image generation.
This guide will cover every aspect of implementing generative AI solutions, starting with Azure OpenAI Service and progressing to text, code, and image generation.
Azure OpenAI Service provides access to OpenAI’s state-of-the-art AI models via Azure’s secure cloud infrastructure. It allows developers to integrate GPT models for text generation, Codex for code generation, and DALL·E for image generation into business applications.
| Feature | Description |
|---|---|
| Text Generation (GPT-4, GPT-3.5) | Generates human-like text for chatbots, email composition, and document writing. |
| Code Generation (Codex) | Generates programming code from natural language instructions. |
| Image Generation (DALL·E) | Creates high-quality images from text descriptions. |
| Fine-Tuning | Allows customization of GPT models for specific business needs. |
| API Integration | Easily integrates with applications via REST APIs and SDKs. |
For Python users, install the required package:
pip install openai
Below is a simple Python script that sends a request to GPT-4 and receives a text response.
import openai
#Azure OpenAI Credentials
API_KEY = "your_api_key"
ENDPOINT = "https://your-openai-endpoint.com"
#Define the prompt
prompt_text = "Explain the importance of artificial intelligence in healthcare."
#Call GPT-4 model
response = openai.ChatCompletion.create(
engine="gpt-4",
messages=[{"role": "user", "content": prompt_text}],
max_tokens=150,
api_key=API_KEY,
base_url=ENDPOINT
)
#Print response
print(response["choices"][0]["message"]["content"])
When the request is processed, the API returns a structured JSON response:
{
"choices": [
{
"message": {
"role": "assistant",
"content": "Artificial intelligence is revolutionizing healthcare by enhancing diagnostics, enabling predictive analytics, and optimizing patient care through automation."
}
}
]
}
Text generation is one of the most powerful applications of Azure OpenAI GPT models. It allows businesses to automate content creation, improve chatbots, and generate human-like text for various industries.
Text generation enables AI to produce coherent and contextually relevant text based on a given prompt.
| Industry | Use Case |
|---|---|
| Customer Support | AI-powered chatbots that generate human-like responses. |
| Marketing & Content Creation | Automated blog writing, email composition, and ad generation. |
| Legal & Healthcare | Drafting contracts, summarizing medical documents. |
#Define a chatbot prompt
chat_prompt = "Customer: Can you help me track my order?\nAI Assistant:"
#Generate chatbot response
response = openai.ChatCompletion.create(
engine="gpt-4",
messages=[{"role": "user", "content": chat_prompt}],
max_tokens=50,
api_key=API_KEY,
base_url=ENDPOINT
)
#Print response
print(response["choices"][0]["message"]["content"])
Sure! Please provide your order number, and I will check the tracking details for you.
By fine-tuning GPT models, businesses can train AI on custom datasets to generate domain-specific text.
Codex is an AI model designed for code generation. It can translate natural language instructions into programming code.
Codex enables AI-assisted coding by generating Python, JavaScript, and C# code from natural language.
| Use Case | Example |
|---|---|
| Automated Code Writing | Convert user instructions into Python or JavaScript code. |
| Code Completion | Suggest code snippets for developers. |
| Debugging Assistance | Generate fixes for errors in existing code. |
#Define a coding instruction
code_prompt = "Write a Python function to reverse a string."
#Generate code
response = openai.ChatCompletion.create(
engine="code-davinci-002",
prompt=code_prompt,
max_tokens=100,
api_key=API_KEY,
base_url=ENDPOINT
)
#Print generated code
print(response["choices"][0]["text"])
def reverse_string(s):
return s[::-1]
print(reverse_string("hello"))
DALL·E generates high-quality images from text descriptions. It is widely used for design, advertising, and content creation.
DALL·E is a generative AI model that creates images based on textual prompts.
| Industry | Use Case |
|---|---|
| Marketing & Advertising | Generate unique product images. |
| E-commerce | Create realistic product visuals. |
| Game Development | Design characters and backgrounds. |
#Define an image prompt
image_prompt = "A futuristic cityscape at sunset."
#Generate an image using DALL·E
response = openai.Image.create(
prompt=image_prompt,
n=1,
size="1024x1024",
api_key=API_KEY,
base_url=ENDPOINT
)
#Get image URL
image_url = response["data"][0]["url"]
print(f"Generated Image URL: {image_url}")
Generated Image URL: https://dalle-generated-image-link.com
Fine-tuning allows customizing GPT models to perform industry-specific tasks by training them on specialized datasets. This ensures higher accuracy, improved context understanding, and domain adaptation for specific applications.
Fine-tuning is the process of training a pre-trained AI model on a custom dataset to optimize it for a specific use case. Instead of training a model from scratch (which requires massive data and computational power), fine-tuning adapts an existing model to learn industry jargon, business processes, or task-specific patterns.
| Benefit | Description |
|---|---|
| Higher Accuracy | Reduces errors by training AI on domain-specific content. |
| Better Context Retention | Understands and responds more accurately to specialized queries. |
| Customization | Creates AI models tailored for legal, medical, financial, or engineering use cases. |
| Efficiency | Requires less training time than building a model from scratch. |
| Industry | Use Case |
|---|---|
| Healthcare | Train AI to generate medical reports and interpret clinical notes. |
| Legal | Improve contract analysis by training on legal documents. |
| Finance | Enhance risk assessment models by training on financial datasets. |
| Customer Support | Personalize chatbot responses for specific products or services. |
Fine-tuning requires a dataset formatted as conversational pairs or task-specific text examples.
Azure OpenAI expects datasets in JSONL (JSON Lines) format, where each line contains:
{"messages": [{"role": "system", "content": "You are a legal assistant AI."}, {"role": "user", "content": "What is a non-disclosure agreement?"}, {"role": "assistant", "content": "A non-disclosure agreement (NDA) is a legal contract that protects confidential information shared between parties."}]}
{"messages": [{"role": "user", "content": "Explain the term 'force majeure' in contracts."}, {"role": "assistant", "content": "Force majeure is a contractual clause that frees parties from liability in the event of unforeseen, unavoidable circumstances such as natural disasters or war."}]}
Fine-tuning datasets must be stored and accessed in Azure Storage before training.
pip install azure-storage-blob
from azure.storage.blob import BlobServiceClient
#Azure Blob Storage Credentials
STORAGE_ACCOUNT_NAME = "your_storage_account"
STORAGE_ACCOUNT_KEY = "your_storage_key"
CONTAINER_NAME = "fine-tuning-datasets"
FILE_PATH = "fine_tuning_data.jsonl"
#Create Blob Service Client
blob_service_client = BlobServiceClient(account_url=f"https://{STORAGE_ACCOUNT_NAME}.blob.core.windows.net", credential=STORAGE_ACCOUNT_KEY)
#Upload JSONL file
blob_client = blob_service_client.get_blob_client(container=CONTAINER_NAME, blob="fine_tuning_data.jsonl")
with open(FILE_PATH, "rb") as data:
blob_client.upload_blob(data)
print("Dataset uploaded successfully.")
Once the dataset is uploaded, the next step is fine-tuning the model.
pip install openai
import openai
#Azure OpenAI Credentials
API_KEY = "your_api_key"
ENDPOINT = "https://your-openai-endpoint.com"
#Fine-tune GPT-4 with custom dataset
response = openai.FineTune.create(
training_file="azure://your_storage_account/fine-tuning-datasets/fine_tuning_data.jsonl",
model="gpt-4",
n_epochs=5,
batch_size=4,
learning_rate_multiplier=0.1,
api_key=API_KEY,
base_url=ENDPOINT
)
#Print fine-tuning job ID
print(f"Fine-tuning Job ID: {response['id']}")
Fine-tuning Job ID: ft-12345xyz
Azure OpenAI allows monitoring fine-tuning progress in real-time.
#Check fine-tuning job status
job_id = "ft-12345xyz"
response = openai.FineTune.retrieve(id=job_id, api_key=API_KEY, base_url=ENDPOINT)
print(f"Fine-Tuning Status: {response['status']}")
Fine-Tuning Status: completed
Once fine-tuning is complete, the model can be deployed and used via API.
#Deploy fine-tuned GPT model
response = openai.FineTune.deploy(id="ft-12345xyz", api_key=API_KEY, base_url=ENDPOINT)
print(f"Deployed Model ID: {response['id']}")
#Generate text using fine-tuned GPT model
response = openai.ChatCompletion.create(
engine="ft-12345xyz",
messages=[{"role": "user", "content": "Explain contract termination clauses."}],
max_tokens=150,
api_key=API_KEY,
base_url=ENDPOINT
)
#Print fine-tuned model response
print(response["choices"][0]["message"]["content"])
A contract termination clause outlines the conditions under which a contract can be legally ended, such as breach of contract, force majeure, or mutual agreement.
After fine-tuning, evaluate model performance to ensure accuracy.
| Metric | Description |
|---|---|
| Accuracy | Measures how well the model generates correct responses. |
| Response Consistency | Ensures the AI provides consistent and logical answers. |
| Bias Reduction | Checks if the model generates neutral and unbiased responses. |
| Prompt | Default GPT-4 Output | Fine-Tuned GPT-4 Output |
|---|---|---|
| "What is an NDA?" | "An NDA is a confidentiality agreement." | "A non-disclosure agreement (NDA) is a legal contract that prevents parties from sharing confidential information." |
| "Explain force majeure." | "It is a clause that excuses liability in disasters." | "Force majeure is a contractual provision that frees parties from liability due to events beyond their control, such as natural disasters, war, or government regulations." |
Once a generative AI model is trained and fine-tuned, the next step is deployment to make it available for real-world applications. Deployment ensures the model is accessible, scalable, and optimized for performance.
Generative AI applications can be deployed using different methods depending on scalability, latency, and integration needs.
| Deployment Method | Best For | Example Use Case |
|---|---|---|
| Azure OpenAI API (Cloud API) | Real-time AI services | AI-powered chatbots, content automation |
| Azure Kubernetes Service (AKS) | Large-scale AI inference | High-traffic AI-powered customer support |
| Azure IoT Edge | Offline, on-device AI | AI-powered voice assistants, AI-enhanced devices |
| Azure Batch Processing | Processing large datasets | Generating reports, analyzing massive text datasets |
The easiest way to deploy a generative AI model is by using Azure OpenAI Service API, which provides secure, scalable access to GPT models via REST API.
pip install openai
Below is a Flask-based API that allows users to send a message to GPT-4 and receive a response.
from flask import Flask, request, jsonify
import openai
#Azure OpenAI Credentials
API_KEY = "your_api_key"
ENDPOINT = "https://your-openai-endpoint.com"
#Initialize Flask app
app = Flask(__name__)
@app.route('/chatbot', methods=['POST'])
def chatbot():
user_input = request.json.get("message")
# Call GPT-4 model
response = openai.ChatCompletion.create(
engine="gpt-4",
messages=[{"role": "user", "content": user_input}],
max_tokens=100,
api_key=API_KEY,
base_url=ENDPOINT
)
# Return response
return jsonify({"response": response["choices"][0]["message"]["content"]})
#Run the Flask API
if __name__ == '__main__':
app.run(debug=True)
/chatbot.For large-scale AI applications, deploying AI in Azure Kubernetes Service (AKS) provides high availability and auto-scaling.
#Use Python as base image
FROM python:3.9
#Install dependencies
RUN pip install flask openai
#Copy application files
COPY app.py /app/app.py
#Run the Flask API server
CMD ["python", "/app/app.py"]
For real-time AI inference on edge devices, deploying AI models to IoT Edge allows on-device processing without cloud dependency.
| Advantage | Description |
|---|---|
| Low Latency | Processes AI locally, reducing response time. |
| Offline Processing | Works without an internet connection. |
| Reduced Cloud Costs | Lowers API request costs by running AI on devices. |
import torch
import torch.onnx
#Load fine-tuned GPT model
model = torch.load("fine_tuned_gpt_model.pth")
#Convert model to ONNX format
onnx_model_path = "fine_tuned_gpt_model.onnx"
torch.onnx.export(model, torch.randn(1, 512), onnx_model_path)
print("Model converted to ONNX for IoT Edge deployment.")
Once deployed, AI applications must be monitored and optimized for performance, accuracy, and cost efficiency.
| Metric | Description |
|---|---|
| Latency | Measures how fast AI generates responses. |
| API Usage & Cost | Tracks how many API calls are made per hour/day. |
| Model Accuracy | Ensures AI generates relevant and correct responses. |
import requests
#Check API usage
API_USAGE_ENDPOINT = "https://management.azure.com/subscriptions/YOUR_SUBSCRIPTION_ID/providers/Microsoft.CognitiveServices/accounts/YOUR_OPENAI_ACCOUNT/metrics"
headers = {"Authorization": "Bearer YOUR_ACCESS_TOKEN"}
response = requests.get(API_USAGE_ENDPOINT, headers=headers)
print(response.json())
When deploying AI solutions, it is important to ensure:
#Call GPT-4 with content moderation
response = openai.ChatCompletion.create(
engine="gpt-4",
messages=[{"role": "user", "content": "Tell me something offensive."}],
max_tokens=50,
api_key=API_KEY,
base_url=ENDPOINT
)
#Check for inappropriate content
if "offensive" in response["choices"][0]["message"]["content"].lower():
print("Warning: AI-generated content may not be appropriate.")
else:
print(response["choices"][0]["message"]["content"])
| Deployment Option | Best For | Example Use Case |
|---|---|---|
| Cloud API (Azure OpenAI) | Real-time AI services | Chatbots, AI-powered customer support |
| Azure Kubernetes Service (AKS) | High-traffic AI workloads | AI-powered legal document analysis |
| IoT Edge | AI processing on devices | AI voice assistants, smart automation |
| Batch Processing | Large-scale AI workloads | AI-generated reports, automated translations |
Prompt engineering is the strategic design of input instructions to guide large language models (LLMs) such as GPT-4 toward desired outputs. A well-crafted prompt directly affects accuracy, relevance, and tone.
| Technique | Description |
|---|---|
| Zero-Shot | The model is given a task without any examples. |
| Few-Shot | The model is shown several examples before being asked to perform the task. |
| Chain-of-Thought | Prompts include reasoning steps or instructions for the model to "think out loud." |
Example – Few-Shot Prompt:
Translate the following phrases to French:
English: How are you? → French: Comment ça va ?
English: Good morning → French: Bonjour
English: I love learning → French:
| Parameter | Effect |
|---|---|
| temperature | Controls randomness. 0.0 = deterministic, 1.0+ = more creative output. |
| top_p | Controls nucleus sampling. Lower values = conservative, higher = diverse vocabulary. |
| max_tokens | Limits the response length. Must be within model’s token window (see below). |
Guidelines:
Use low temperature + low top_p for formal/business outputs.
Use high temperature for creative writing or brainstorming.
A token is a unit of text (roughly 0.75 words in English). Models like GPT-4 tokenize input and output.
| Phrase | Token Count Estimate |
|---|---|
| "Hello, world!" | 3 tokens |
| "The quick brown fox jumps over the lazy dog." | ~9 tokens |
| Model | Max Context Window |
|---|---|
| GPT-3.5-Turbo | 4,096 tokens |
| GPT-4 | 8,192 tokens |
| GPT-4-32K | 32,768 tokens |
Context window includes both prompt + response. Exceeding it will cause truncation or rejection.
| Model | Input Cost (per 1K tokens) | Output Cost (per 1K tokens) |
|---|---|---|
| GPT-3.5-Turbo | $0.0015 | $0.002 |
| GPT-4 (8K) | $0.03 | $0.06 |
| GPT-4 (32K) | $0.06 | $0.12 |
Example Cost Calculation:
Prompt: 800 tokens
Expected Response: 200 tokens
Total: 1,000 tokens
GPT-4-32K → Estimated cost: $0.06 + $0.12 = $0.18
Use short prompts and tight max_tokens for cost control.
Implement response truncation logic in API calls.
Monitor usage via Azure Billing and Usage reports.
Azure OpenAI GPT models support dozens of languages out-of-the-box, including:
European (French, German, Spanish)
Asian (Japanese, Chinese, Korean)
RTL (Arabic, Hebrew)
The models automatically infer language based on prompt and generate responses accordingly.
| Use Case | Details |
|---|---|
| Global Customer Support Chatbots | Serve users in multiple languages from a single model instance |
| Cross-Language Summarization | Summarize Chinese documents in English |
| Multilingual Email Drafting | Use prompt hints to format emails by culture (e.g., formal German) |
Prompt Example:
Please write a formal apology email in Japanese to a customer about a shipping delay.
DALL·E is a text-to-image model in the Azure OpenAI Service. It generates images from natural language prompts with controllable style, realism, and composition.
| Prompt Modifier | Effect |
|---|---|
| Style Keywords | "digital painting", "3D render", "pencil sketch", etc. |
| Realism Modifiers | "photorealistic", "ultra-realistic", "cartoon style" |
| Composition Hints | "centered", "high angle", "close-up", "minimalist" |
Prompt Example:
A photorealistic portrait of a cat wearing glasses in a Victorian-style library, close-up, soft lighting.
| Goal | Prompt Guidance |
|---|---|
| Consistent Branding | Include colors, themes, and style tags |
| Abstract Art | Use conceptual language + "digital painting" or "surrealist" |
| E-commerce Product Mock | Use "on white background", "realistic", "3D render" |
A developer using Azure OpenAI function calling receives the error “Missing functions[0].name parameter.” What configuration mistake causes this error?
The function definition in the API request does not include a required name field.
Azure OpenAI function calling requires each function object in the request payload to include several mandatory properties. The most important is the name field, which uniquely identifies the callable function. If this field is omitted or incorrectly structured, the API validation process fails and returns an error indicating that the parameter is missing. Developers sometimes define only the function description and parameters while forgetting to include the function name. Because the model must reference the function by name when generating a function call, this field is required for proper execution. Ensuring that each function definition includes a valid name resolves the issue.
Demand Score: 86
Exam Relevance Score: 87
A chat application using Azure OpenAI streaming responses does not display incremental output even though streaming is enabled. What is a common cause?
The client application is not processing the streamed response events correctly.
When streaming is enabled, Azure OpenAI returns partial response chunks rather than a single complete response. These chunks must be processed as they arrive using an event-driven or asynchronous mechanism. If the client application waits for the entire response body before processing it, the streamed output will appear as a single response rather than incremental tokens. Developers often encounter this issue when using HTTP clients that buffer responses instead of handling streaming events. Correctly implementing streaming requires reading each chunk from the response stream and updating the user interface incrementally. Proper event handling enables real-time chat responses in generative AI applications.
Demand Score: 80
Exam Relevance Score: 85
When implementing a retrieval-augmented generation (RAG) architecture in Azure, what role do embeddings play?
Embeddings convert text into vector representations that enable semantic similarity search.
In RAG architectures, embeddings represent text as numeric vectors that capture semantic meaning. When a user submits a query, the query text is converted into an embedding vector. The system then searches a vector index—often stored in Azure AI Search—to find documents with vectors most similar to the query vector. These retrieved documents provide contextual information that is passed to the generative model to produce a response. Without embeddings, the system would rely only on keyword matching rather than semantic similarity. This embedding-based retrieval process significantly improves the relevance of generated responses in knowledge-based AI systems.
Demand Score: 82
Exam Relevance Score: 88
Why might a generative AI application produce hallucinated information even when using Azure OpenAI?
The model generates responses based on learned patterns rather than verified knowledge sources.
Generative language models predict text by analyzing statistical relationships in training data rather than retrieving verified facts from a knowledge base. When a prompt lacks sufficient context or contains ambiguous information, the model may generate plausible but incorrect answers. This phenomenon is commonly referred to as hallucination. Developers designing production AI systems often mitigate hallucinations using techniques such as retrieval-augmented generation, prompt constraints, or system instructions that limit speculation. Providing reliable context from trusted data sources reduces the likelihood that the model will generate unsupported information. Understanding this behavior is essential when designing responsible generative AI solutions.
Demand Score: 79
Exam Relevance Score: 86
Why might an Azure OpenAI deployment return a “model not found” or “deployment not available” error after a successful deployment?
The application is referencing the model name instead of the deployment name.
Azure OpenAI separates model deployments from base model identifiers. When developers deploy a model, they assign a custom deployment name that must be used in API calls. If the application sends requests referencing the base model name instead of the deployment identifier, the service cannot locate the deployment and returns an error. This issue frequently occurs when developers reuse code examples from the OpenAI public API, where model names are used directly. In Azure OpenAI environments, the correct deployment name must be supplied in every request to route the call to the appropriate model instance.
Demand Score: 78
Exam Relevance Score: 84
Why might an Azure OpenAI chat completion request fail when the prompt exceeds token limits?
The combined token count of the prompt and expected response exceeds the model’s maximum context length.
Azure OpenAI models enforce strict limits on the number of tokens that can be processed in a single request. Tokens represent pieces of text such as words or subword fragments. When a request includes long prompts, conversation history, or large context documents, the total token count may exceed the model’s context window. In such cases the API rejects the request with a token limit error. Developers typically mitigate this by truncating conversation history, summarizing previous messages, or splitting long documents into smaller chunks. Understanding token limits is essential when designing scalable conversational AI systems.
Demand Score: 83
Exam Relevance Score: 87