Implement AI Solutions using Microsoft Foundry

Implement AI Solutions using Microsoft Foundry Detailed Explanation

Build and Test a Foundry Chat and Single-Agent Solution

Exam Radar

Microscopic Technical Focus: System prompt design, model deployment interaction, lightweight chat client, agent instruction, tool boundary, and test trace validation
Core Priority: The largest AI-901 domain includes practical Foundry tasks: deploying a model, interacting with it, creating a lightweight client, and testing a single-agent flow.
Confusion Alert: Distractors often skip the deployment name, confuse prompt engineering with agent tool wiring, or change code before validating behavior in the Foundry portal.
Scenario Logic: Separate prompt behavior from runtime routing: the prompt or agent instruction controls behavior, while endpoint, deployment name, agent ID, credential, and tool configuration control whether the app reaches the correct runtime object.
Failure Trigger: The app returns routing errors, ignores tool boundaries, or produces off-scope answers when the deployment identity, agent instruction, or tool configuration is not validated in the portal first.
Operational Dependency: The first dependency is a verified Foundry deployment or agent test, followed by client configuration that exactly matches the tested object.
Topic-Specific Exam Cue: Questions that mention playground success, deployment-not-found errors, agent tests, tools, system prompts, or lightweight clients usually ask for the first validation point.

Practice question: A lightweight chat client returns a deployment-not-found error after the model works in the Foundry playground. What should the developer verify first?

A. Whether the system prompt includes more examples
B. Whether the temperature is lower than 0.2
C. Whether the application uses image input
D. Whether the client configuration uses the exact Foundry deployment name and endpoint

Correct Answer: D

Explanation: D is correct because the portal test proves the deployment works, so the client route is the likely failure. A and B affect response behavior after routing succeeds. C is unrelated unless the app sends images.

Atomic Deconstruction - Operational Level

Foundry chat and agent questions are runtime-routing questions. The core objects are the system prompt or agent instruction, the deployment or agent ID, configured tools, and the client settings that send the request to the intended runtime object.

The drill is to validate in the portal before expanding code. If the playground or agent test works, a client failure usually points to endpoint, deployment name, agent ID, or credential mismatch. If the portal behavior is wrong, the instruction or tool boundary must be corrected first.

For exam transformation, treat every option as a proposed dependency. The correct option is the one that unlocks the blocked workflow and can be verified. Wrong options are often useful in another situation, but they fail here because they tune a later step, address a different modality, or repair a symptom without satisfying the scenario requirement.

Component Specifications

For a beginner, this table means: prove the Foundry object works in the portal, then make sure the client is calling that exact object.

Object	Attribute	Value Range	Default State	Dependency	Failure State
System prompt	Behavior constraint	Role, boundaries, response format, safety rule	Empty or generic	Model request	Model produces off-scope or unstructured responses
Model deployment	Invocation target	Deployment name and endpoint	Not callable until created	Client SDK configuration	404 deployment not found or wrong model family
Agent instruction	Task policy	Goal, allowed actions, tool limits	Undefined	Agent runtime	Agent chooses unsupported actions or ignores constraints
Client application	Configuration source	Endpoint, deployment/agent ID, credential	Local placeholder	Environment variables and SDK	Authentication or routing failure

Step-by-Step Execution Path

Create the system prompt or agent instruction before testing so the model has a stable behavioral contract.
Deploy or select the model in Microsoft Foundry and verify the deployment name from the portal instead of inventing a model identifier.
Test the prompt or single-agent solution in the Foundry portal to observe response quality, tool selection, and error messages before coding.
Build the lightweight client with endpoint, credential, and deployment or agent identifier stored in configuration.
Run a minimal conversation and inspect response text, status, and trace or portal test output to confirm the app calls the intended Foundry object.

Suggested Lab Validation Ideas:

The following paths and commands are conceptual lab-style examples for practice. Adapt them to the current Microsoft documentation, SDK/API version, subscription permissions, and project environment before using them in a real implementation.

Portal path: Microsoft Foundry > Models + endpoints > Playground - portal test for model deployment response behavior.
Portal path: Microsoft Foundry > Agents > Test - portal test for single-agent instruction and tool behavior.
Python SDK rehearsal: python app.py - local client validation after setting endpoint, credential, and deployment or agent ID environment variables.

Technical Chain

The system prompt or agent instruction sets the behavior contract. The client request then targets a Foundry deployment or agent endpoint with credentials. The runtime loads the model and any configured agent tools, produces a response, and records evidence in the portal test surface or trace. If the deployment name, agent ID, or instruction is wrong, the failure appears as routing errors, unsupported behavior, or off-scope output.

Operational Skills Matrix

Task	Precise Command or Path	Verification Standard
Validate deployment identity	Microsoft Foundry portal > Models + endpoints > deployment details	Deployment name in code exactly matches the Foundry deployment
Validate agent behavior	Microsoft Foundry portal > Agents > Test conversation	Agent follows instruction boundaries and uses only configured tools
Validate client call path	Run lightweight client and inspect response/status	Client receives a successful response from the intended deployment or agent

Implement Text and Speech Solutions with Foundry Tools

Exam Radar

Microscopic Technical Focus: Text analysis output contracts, spoken prompt response, Azure Speech tool selection, audio format, and lightweight app validation
Core Priority: AI-901 includes practical text and speech implementation, so candidates must distinguish text analysis, speech recognition, speech synthesis, and multimodal spoken interactions.
Confusion Alert: A common wrong answer is to use a generative chat model for every language task or to choose speech synthesis when the requirement is transcribing spoken input.
Scenario Logic: Determine the direction of the language or audio transformation: text to labels, audio to text, text to audio, or a combined spoken interaction.
Failure Trigger: The workflow breaks when raw audio is sent to a text-only path, when synthesis is chosen before transcription, or when the response format does not match the application need.
Operational Dependency: The blocking dependency is the correct direction and output contract for Speech or text analysis before voice, language, or formatting options are tuned.
Topic-Specific Exam Cue: Words such as listen, transcribe, speak, voice, sentiment, entities, key phrases, or spoken answer usually identify the required tool direction.

Practice question: A kiosk must listen to a visitor question and answer aloud. Which implementation sequence best matches the requirement?

A. Recognize speech to text, generate or retrieve the answer, then synthesize the answer to speech
B. Use sentiment analysis first, then generate an image response
C. Send the raw audio to a text-only chat model without transcription
D. Use text-to-speech before the question is converted to text

Correct Answer: A

Explanation: A is correct because the workflow requires audio input and audio output. B targets the wrong output. C breaks when the model cannot consume raw audio. D reverses the dependency.

Atomic Deconstruction - Operational Level

Text and speech implementation questions are direction-sensitive. Spoken input must become text before a text-only reasoning step can use it, while a spoken answer requires generated or selected text to become audio.

The drill is to label the conversion direction and the expected response field. A transcript, sentiment label, key phrase list, synthesized audio stream, and spoken answer are different completion states, so the selected tool must create the one requested by the scenario.

Component Specifications

Read this table as a direction check: decide whether the app is converting speech to text, text to speech, or text to analysis output before writing code.

Object	Attribute	Value Range	Default State	Dependency	Failure State
Text analysis call	Output type	Key phrases, entities, sentiment, summary	No analysis until requested	Text input and service capability	Application receives free-form text instead of required labels
Speech recognition	Direction	Audio to text	Not configured	Audio stream and language	Spoken input is not converted into prompt text
Speech synthesis	Direction	Text to audio	Not configured	Voice selection and output format	Application cannot return spoken responses
Audio format	Encoding/sample assumptions	Service-supported WAV/PCM or SDK stream	Client dependent	Speech call fails or returns poor recognition	undefined

Step-by-Step Execution Path

Write the required output contract first: sentiment label, extracted entities, recognized transcript, or spoken answer.
Select the Foundry Tool or multimodal model path that matches the direction of the audio or text transformation.
Configure language, voice, and audio format only after the workload direction is correct.
Run a minimal app invocation and verify the returned transcript, text labels, or playable speech output before expanding the workflow.

Suggested Lab Validation Ideas:

Portal path: Microsoft Foundry > Tools > Speech - portal verification of speech capability and configuration.
Python SDK rehearsal: python speech_demo.py --input prompt.wav - local validation for audio-to-text or spoken-response flow; adapt to current SDK package names.
Python SDK rehearsal: python text_analysis_demo.py --text reviews.txt - local validation for key phrase, entity, sentiment, or summary output.

Technical Chain

Audio and text solutions are directional. A microphone input must become text before a text-only model can reason over it, while a spoken answer requires generated text to be synthesized into audio. Text analysis tasks are evaluated by structured labels or extracted values. The exam answer must keep input direction, service capability, and expected output connected.

Operational Skills Matrix

Task	Precise Command or Path	Verification Standard
Validate speech direction	Inspect app requirement and Speech tool configuration	Configuration is speech-to-text for prompts or text-to-speech for spoken output
Validate text analysis contract	Run sample input through the text analysis call	Response contains the required entities, key phrases, sentiment, or summary fields
Validate audio compatibility	Inspect audio format or SDK stream settings	Audio input is accepted and produces transcript or playable output

Implement Vision and Image Generation Solutions with Foundry

Exam Radar

Microscopic Technical Focus: Visual prompt interpretation, multimodal model input, generated image output, vision app capability, and result validation
Core Priority: The exam expects candidates to separate vision understanding from image generation and to know when a deployed multimodal model is required.
Confusion Alert: Distractors choose image generation when the app must understand an image, or choose OCR-only extraction when the scenario asks for broad visual interpretation.
Scenario Logic: Decide whether the workflow inspects an existing image or creates a new image from a prompt, then validate that the selected model accepts or returns the correct visual artifact.
Failure Trigger: The app fails conceptually when image generation is selected for image understanding, or when a text-only model is asked to reason over visual input.
Operational Dependency: The first dependency is the transformation direction: image-to-answer for vision understanding, or prompt-to-image for generation.
Topic-Specific Exam Cue: Interpret, describe, inspect, detect, and identify point to vision understanding; generate, create, or produce an image points to image generation.

Practice question: An app must inspect a photo of equipment and explain whether a warning light is visible. Which capability is required?

A. An image generation model
B. A deployed multimodal model that can receive the image and prompt
C. Text-to-speech output
D. A document extraction schema

Correct Answer: B

Explanation: B is correct because the app must interpret an existing image. A creates new images. C changes text into audio. D extracts structured document fields and is not the first fit for visual reasoning.

Atomic Deconstruction - Operational Level

Vision and image generation questions are modality-transformation questions. Vision interpretation starts from an existing visual input and returns labels or reasoning; image generation starts from a prompt and returns a new visual artifact.

The drill is to read scenario verbs carefully. Interpret, describe, identify, and detect point to visual understanding. Generate, create, or produce a picture points to image generation. The validation evidence must match that direction.

Component Specifications

For exam use, this table helps separate two easy-to-confuse choices: understanding an image that already exists versus generating a new image.

Object	Attribute	Value Range	Default State	Dependency	Failure State
Visual input	Prompt attachment	Image plus text instruction	Not included unless sent	Multimodal deployment	Model answers without seeing image evidence
Vision interpretation	Result type	Description, classification, detected content	Depends on prompt and model	Image quality and capability	Wrong labels or missing visual reasoning
Image generation	Output artifact	Generated image from prompt	No image until model call	Generative image model and safety filters	No visual asset is produced
Safety filter	Content policy response	Allowed, revised, or blocked	Service controlled	Prompt and policy	Request is blocked or altered without handling

Step-by-Step Execution Path

Decide whether the app must interpret an existing image or create a new image. This is the critical workload split.
For interpretation, send the visual input with a text instruction to a deployed multimodal model and verify that the response references visible evidence.
For generation, send a precise prompt to an image generation model and handle safety or policy responses.
Record the output contract: classification, description, detected attribute, or generated asset location.

Suggested Lab Validation Ideas:

Portal path: Microsoft Foundry > Playground > multimodal test - portal validation that image plus prompt input is accepted.
Python SDK rehearsal: python vision_prompt.py --image meter.jpg --question 'What reading is visible?' - local validation for visual interpretation.
Python SDK rehearsal: python image_generate.py --prompt 'product mockup on white background' - local validation for generated image output.

Technical Chain

Vision interpretation sends existing pixels to a model that can inspect visual features and combine them with the text instruction. Image generation starts with text and returns new pixels. The underlying request shape is therefore different. A correct exam choice follows the direction of transformation: image-to-answer for vision, prompt-to-image for generation.

Operational Skills Matrix

Task	Precise Command or Path	Verification Standard
Validate multimodal input	Run a visual prompt test with an attached image	Response uses visible image evidence rather than generic text
Validate generated asset	Inspect image generation response or output file path	A new image artifact is returned or a policy response is handled
Validate workload split	Compare scenario verb: interpret, describe, detect, generate, create	Selected service direction matches the requested transformation

Extract Information with Azure Content Understanding in Foundry Tools

Exam Radar

Microscopic Technical Focus: Schema-driven extraction from documents, images, audio, and video using Content Understanding and lightweight client validation
Core Priority: AI-901 specifically includes Content Understanding for information extraction across documents, forms, images, audio, and video.
Confusion Alert: Wrong options often use generic chat summarization or vision description when the requirement is a repeatable schema with fields that an app can consume.
Scenario Logic: Look for repeatable business fields from documents, images, audio, or video rather than a general natural-language answer.
Failure Trigger: Generic chat or summarization returns prose that cannot reliably populate application fields, preserve confidence evidence, or trigger human review.
Operational Dependency: The first dependency is a schema or extraction objective that returns structured fields the app can map and validate.
Topic-Specific Exam Cue: Field names, tables, invoice data, call attributes, dates, amounts, confidence, analyzer, or structured output usually signal Content Understanding.

Practice question: A lightweight app must process recorded service calls and extract customer name, requested product, issue category, and promised follow-up date. What should the developer validate first?

A. That an image generation model can create a visual summary
B. That text-to-speech produces a natural voice
C. That a chat prompt can produce a paragraph summary
D. That Content Understanding returns the required structured fields from the audio

Correct Answer: D

Explanation: D is correct because the scenario requires structured extraction from audio. A and B target visual or spoken output. C may summarize but does not guarantee field-level output for the app.

Atomic Deconstruction - Operational Level

Content Understanding questions are schema-evidence questions. The operational object is not a paragraph summary; it is the repeatable extraction result that an application can map into fields, tables, confidence checks, and review workflows.

The drill is to design from the target fields backward. If the app needs customer name, issue category, line items, dates, or amounts, the correct answer must preserve schema, status, confidence, and client mapping rather than only producing natural-language output.

Component Specifications

For this table, think like an app developer: the key question is whether the tool returns fields your program can store, check, and send for review.

Object	Attribute	Value Range	Default State	Dependency	Failure State
Extraction schema	Target fields	Names, types, tables, confidence	Absent until defined	Business output contract	Response is unstructured and cannot populate the app
Content source	Media type	Document, image, audio, video	Not uploaded or referenced	Tool support and file quality	No extraction or partial extraction
Analyzer/test run	Validation state	Succeeded, failed, needs review	Untested	Sample content and schema	Fields are missing before client integration
Client mapping	Field binding	JSON field to app property	Manual or absent	Extraction response	Application stores wrong values or drops confidence signals

Step-by-Step Execution Path

Define the business fields that must be extracted before choosing the tool. This turns the requirement into a schema rather than a broad summary.
Use Content Understanding in Foundry Tools with representative documents, images, audio, or video to test extraction quality.
Inspect field values, confidence, and missing-field behavior so the app can decide when human review is required.
Build a lightweight client that submits content, reads structured output, maps fields to application objects, and logs extraction failures.

Suggested Lab Validation Ideas:

Portal path: Microsoft Foundry > Tools > Content Understanding - portal validation for analyzer/schema and test results.
Python SDK rehearsal: python extract_content.py --file invoice.pdf - local client validation for structured extraction; adapt to current Content Understanding SDK/API surface.
API rehearsal: inspect JSON response fields, confidence scores, and status - response-shape validation rather than deployment command.

Technical Chain

The app sends a content item to an analyzer that applies a schema or extraction objective. The service processes the media, returns structured fields with status and confidence evidence, and the client maps those fields into business objects. If the answer uses generic summarization, the chain loses schema, confidence, and repeatable field binding.

Operational Skills Matrix

Task	Precise Command or Path	Verification Standard
Validate schema coverage	Microsoft Foundry > Tools > Content Understanding analyzer view	All required business fields are represented in the analyzer or extraction output
Validate extraction result	Run a sample file and inspect JSON/status output	Expected fields, values, and confidence indicators are present
Validate client mapping	Review application mapping from extraction response to data model	Each required field is stored correctly and low-confidence cases are handled

Shopping cart

Subtotal:

AI-901 Implement AI Solutions using Microsoft Foundry

Detailed list of AI-901 knowledge points