Shopping cart

Subtotal:

$0.00

AI-901 Implement AI Solutions using Microsoft Foundry

Implement AI Solutions using Microsoft Foundry

Detailed list of AI-901 knowledge points

Implement AI Solutions using Microsoft Foundry Detailed Explanation

Build and Test a Foundry Chat and Single-Agent Solution

Exam Radar

  • Microscopic Technical Focus: System prompt design, model deployment interaction, lightweight chat client, agent instruction, tool boundary, and test trace validation
  • Core Priority: The largest AI-901 domain includes practical Foundry tasks: deploying a model, interacting with it, creating a lightweight client, and testing a single-agent flow.
  • Confusion Alert: Distractors often skip the deployment name, confuse prompt engineering with agent tool wiring, or change code before validating behavior in the Foundry portal.
  • Scenario Logic: Separate prompt behavior from runtime routing: the prompt or agent instruction controls behavior, while endpoint, deployment name, agent ID, credential, and tool configuration control whether the app reaches the correct runtime object.
  • Failure Trigger: The app returns routing errors, ignores tool boundaries, or produces off-scope answers when the deployment identity, agent instruction, or tool configuration is not validated in the portal first.
  • Operational Dependency: The first dependency is a verified Foundry deployment or agent test, followed by client configuration that exactly matches the tested object.
  • Topic-Specific Exam Cue: Questions that mention playground success, deployment-not-found errors, agent tests, tools, system prompts, or lightweight clients usually ask for the first validation point.

Practice question: A lightweight chat client returns a deployment-not-found error after the model works in the Foundry playground. What should the developer verify first?

A. Whether the system prompt includes more examples
B. Whether the temperature is lower than 0.2
C. Whether the application uses image input
D. Whether the client configuration uses the exact Foundry deployment name and endpoint

Correct Answer: D

Explanation: D is correct because the portal test proves the deployment works, so the client route is the likely failure. A and B affect response behavior after routing succeeds. C is unrelated unless the app sends images.

Atomic Deconstruction - Operational Level

Foundry chat and agent questions are runtime-routing questions. The core objects are the system prompt or agent instruction, the deployment or agent ID, configured tools, and the client settings that send the request to the intended runtime object.

The drill is to validate in the portal before expanding code. If the playground or agent test works, a client failure usually points to endpoint, deployment name, agent ID, or credential mismatch. If the portal behavior is wrong, the instruction or tool boundary must be corrected first.

For exam transformation, treat every option as a proposed dependency. The correct option is the one that unlocks the blocked workflow and can be verified. Wrong options are often useful in another situation, but they fail here because they tune a later step, address a different modality, or repair a symptom without satisfying the scenario requirement.

Component Specifications

For a beginner, this table means: prove the Foundry object works in the portal, then make sure the client is calling that exact object.

Object Attribute Value Range Default State Dependency Failure State
System prompt Behavior constraint Role, boundaries, response format, safety rule Empty or generic Model request Model produces off-scope or unstructured responses
Model deployment Invocation target Deployment name and endpoint Not callable until created Client SDK configuration 404 deployment not found or wrong model family
Agent instruction Task policy Goal, allowed actions, tool limits Undefined Agent runtime Agent chooses unsupported actions or ignores constraints
Client application Configuration source Endpoint, deployment/agent ID, credential Local placeholder Environment variables and SDK Authentication or routing failure

Step-by-Step Execution Path

  1. Create the system prompt or agent instruction before testing so the model has a stable behavioral contract.
  2. Deploy or select the model in Microsoft Foundry and verify the deployment name from the portal instead of inventing a model identifier.
  3. Test the prompt or single-agent solution in the Foundry portal to observe response quality, tool selection, and error messages before coding.
  4. Build the lightweight client with endpoint, credential, and deployment or agent identifier stored in configuration.
  5. Run a minimal conversation and inspect response text, status, and trace or portal test output to confirm the app calls the intended Foundry object.

Suggested Lab Validation Ideas:

The following paths and commands are conceptual lab-style examples for practice. Adapt them to the current Microsoft documentation, SDK/API version, subscription permissions, and project environment before using them in a real implementation.

  • Portal path: Microsoft Foundry > Models + endpoints > Playground - portal test for model deployment response behavior.
  • Portal path: Microsoft Foundry > Agents > Test - portal test for single-agent instruction and tool behavior.
  • Python SDK rehearsal: python app.py - local client validation after setting endpoint, credential, and deployment or agent ID environment variables.

Technical Chain

The system prompt or agent instruction sets the behavior contract. The client request then targets a Foundry deployment or agent endpoint with credentials. The runtime loads the model and any configured agent tools, produces a response, and records evidence in the portal test surface or trace. If the deployment name, agent ID, or instruction is wrong, the failure appears as routing errors, unsupported behavior, or off-scope output.

Operational Skills Matrix

Task Precise Command or Path Verification Standard
Validate deployment identity Microsoft Foundry portal > Models + endpoints > deployment details Deployment name in code exactly matches the Foundry deployment
Validate agent behavior Microsoft Foundry portal > Agents > Test conversation Agent follows instruction boundaries and uses only configured tools
Validate client call path Run lightweight client and inspect response/status Client receives a successful response from the intended deployment or agent

Implement Text and Speech Solutions with Foundry Tools

Exam Radar

  • Microscopic Technical Focus: Text analysis output contracts, spoken prompt response, Azure Speech tool selection, audio format, and lightweight app validation
  • Core Priority: AI-901 includes practical text and speech implementation, so candidates must distinguish text analysis, speech recognition, speech synthesis, and multimodal spoken interactions.
  • Confusion Alert: A common wrong answer is to use a generative chat model for every language task or to choose speech synthesis when the requirement is transcribing spoken input.
  • Scenario Logic: Determine the direction of the language or audio transformation: text to labels, audio to text, text to audio, or a combined spoken interaction.
  • Failure Trigger: The workflow breaks when raw audio is sent to a text-only path, when synthesis is chosen before transcription, or when the response format does not match the application need.
  • Operational Dependency: The blocking dependency is the correct direction and output contract for Speech or text analysis before voice, language, or formatting options are tuned.
  • Topic-Specific Exam Cue: Words such as listen, transcribe, speak, voice, sentiment, entities, key phrases, or spoken answer usually identify the required tool direction.

Practice question: A kiosk must listen to a visitor question and answer aloud. Which implementation sequence best matches the requirement?

A. Recognize speech to text, generate or retrieve the answer, then synthesize the answer to speech
B. Use sentiment analysis first, then generate an image response
C. Send the raw audio to a text-only chat model without transcription
D. Use text-to-speech before the question is converted to text

Correct Answer: A

Explanation: A is correct because the workflow requires audio input and audio output. B targets the wrong output. C breaks when the model cannot consume raw audio. D reverses the dependency.

Atomic Deconstruction - Operational Level

Text and speech implementation questions are direction-sensitive. Spoken input must become text before a text-only reasoning step can use it, while a spoken answer requires generated or selected text to become audio.

The drill is to label the conversion direction and the expected response field. A transcript, sentiment label, key phrase list, synthesized audio stream, and spoken answer are different completion states, so the selected tool must create the one requested by the scenario.

For exam transformation, treat every option as a proposed dependency. The correct option is the one that unlocks the blocked workflow and can be verified. Wrong options are often useful in another situation, but they fail here because they tune a later step, address a different modality, or repair a symptom without satisfying the scenario requirement.

Component Specifications

Read this table as a direction check: decide whether the app is converting speech to text, text to speech, or text to analysis output before writing code.

Object Attribute Value Range Default State Dependency Failure State
Text analysis call Output type Key phrases, entities, sentiment, summary No analysis until requested Text input and service capability Application receives free-form text instead of required labels
Speech recognition Direction Audio to text Not configured Audio stream and language Spoken input is not converted into prompt text
Speech synthesis Direction Text to audio Not configured Voice selection and output format Application cannot return spoken responses
Audio format Encoding/sample assumptions Service-supported WAV/PCM or SDK stream Client dependent Speech call fails or returns poor recognition undefined

Step-by-Step Execution Path

  1. Write the required output contract first: sentiment label, extracted entities, recognized transcript, or spoken answer.
  2. Select the Foundry Tool or multimodal model path that matches the direction of the audio or text transformation.
  3. Configure language, voice, and audio format only after the workload direction is correct.
  4. Run a minimal app invocation and verify the returned transcript, text labels, or playable speech output before expanding the workflow.

Suggested Lab Validation Ideas:

The following paths and commands are conceptual lab-style examples for practice. Adapt them to the current Microsoft documentation, SDK/API version, subscription permissions, and project environment before using them in a real implementation.

  • Portal path: Microsoft Foundry > Tools > Speech - portal verification of speech capability and configuration.
  • Python SDK rehearsal: python speech_demo.py --input prompt.wav - local validation for audio-to-text or spoken-response flow; adapt to current SDK package names.
  • Python SDK rehearsal: python text_analysis_demo.py --text reviews.txt - local validation for key phrase, entity, sentiment, or summary output.

Technical Chain

Audio and text solutions are directional. A microphone input must become text before a text-only model can reason over it, while a spoken answer requires generated text to be synthesized into audio. Text analysis tasks are evaluated by structured labels or extracted values. The exam answer must keep input direction, service capability, and expected output connected.

Operational Skills Matrix

Task Precise Command or Path Verification Standard
Validate speech direction Inspect app requirement and Speech tool configuration Configuration is speech-to-text for prompts or text-to-speech for spoken output
Validate text analysis contract Run sample input through the text analysis call Response contains the required entities, key phrases, sentiment, or summary fields
Validate audio compatibility Inspect audio format or SDK stream settings Audio input is accepted and produces transcript or playable output

Implement Vision and Image Generation Solutions with Foundry

Exam Radar

  • Microscopic Technical Focus: Visual prompt interpretation, multimodal model input, generated image output, vision app capability, and result validation
  • Core Priority: The exam expects candidates to separate vision understanding from image generation and to know when a deployed multimodal model is required.
  • Confusion Alert: Distractors choose image generation when the app must understand an image, or choose OCR-only extraction when the scenario asks for broad visual interpretation.
  • Scenario Logic: Decide whether the workflow inspects an existing image or creates a new image from a prompt, then validate that the selected model accepts or returns the correct visual artifact.
  • Failure Trigger: The app fails conceptually when image generation is selected for image understanding, or when a text-only model is asked to reason over visual input.
  • Operational Dependency: The first dependency is the transformation direction: image-to-answer for vision understanding, or prompt-to-image for generation.
  • Topic-Specific Exam Cue: Interpret, describe, inspect, detect, and identify point to vision understanding; generate, create, or produce an image points to image generation.

Practice question: An app must inspect a photo of equipment and explain whether a warning light is visible. Which capability is required?

A. An image generation model
B. A deployed multimodal model that can receive the image and prompt
C. Text-to-speech output
D. A document extraction schema

Correct Answer: B

Explanation: B is correct because the app must interpret an existing image. A creates new images. C changes text into audio. D extracts structured document fields and is not the first fit for visual reasoning.

Atomic Deconstruction - Operational Level

Vision and image generation questions are modality-transformation questions. Vision interpretation starts from an existing visual input and returns labels or reasoning; image generation starts from a prompt and returns a new visual artifact.

The drill is to read scenario verbs carefully. Interpret, describe, identify, and detect point to visual understanding. Generate, create, or produce a picture points to image generation. The validation evidence must match that direction.

For exam transformation, treat every option as a proposed dependency. The correct option is the one that unlocks the blocked workflow and can be verified. Wrong options are often useful in another situation, but they fail here because they tune a later step, address a different modality, or repair a symptom without satisfying the scenario requirement.

Component Specifications

For exam use, this table helps separate two easy-to-confuse choices: understanding an image that already exists versus generating a new image.

Object Attribute Value Range Default State Dependency Failure State
Visual input Prompt attachment Image plus text instruction Not included unless sent Multimodal deployment Model answers without seeing image evidence
Vision interpretation Result type Description, classification, detected content Depends on prompt and model Image quality and capability Wrong labels or missing visual reasoning
Image generation Output artifact Generated image from prompt No image until model call Generative image model and safety filters No visual asset is produced
Safety filter Content policy response Allowed, revised, or blocked Service controlled Prompt and policy Request is blocked or altered without handling

Step-by-Step Execution Path

  1. Decide whether the app must interpret an existing image or create a new image. This is the critical workload split.
  2. For interpretation, send the visual input with a text instruction to a deployed multimodal model and verify that the response references visible evidence.
  3. For generation, send a precise prompt to an image generation model and handle safety or policy responses.
  4. Record the output contract: classification, description, detected attribute, or generated asset location.

Suggested Lab Validation Ideas:

The following paths and commands are conceptual lab-style examples for practice. Adapt them to the current Microsoft documentation, SDK/API version, subscription permissions, and project environment before using them in a real implementation.

  • Portal path: Microsoft Foundry > Playground > multimodal test - portal validation that image plus prompt input is accepted.
  • Python SDK rehearsal: python vision_prompt.py --image meter.jpg --question 'What reading is visible?' - local validation for visual interpretation.
  • Python SDK rehearsal: python image_generate.py --prompt 'product mockup on white background' - local validation for generated image output.

Technical Chain

Vision interpretation sends existing pixels to a model that can inspect visual features and combine them with the text instruction. Image generation starts with text and returns new pixels. The underlying request shape is therefore different. A correct exam choice follows the direction of transformation: image-to-answer for vision, prompt-to-image for generation.

Operational Skills Matrix

Task Precise Command or Path Verification Standard
Validate multimodal input Run a visual prompt test with an attached image Response uses visible image evidence rather than generic text
Validate generated asset Inspect image generation response or output file path A new image artifact is returned or a policy response is handled
Validate workload split Compare scenario verb: interpret, describe, detect, generate, create Selected service direction matches the requested transformation

Extract Information with Azure Content Understanding in Foundry Tools

Exam Radar

  • Microscopic Technical Focus: Schema-driven extraction from documents, images, audio, and video using Content Understanding and lightweight client validation
  • Core Priority: AI-901 specifically includes Content Understanding for information extraction across documents, forms, images, audio, and video.
  • Confusion Alert: Wrong options often use generic chat summarization or vision description when the requirement is a repeatable schema with fields that an app can consume.
  • Scenario Logic: Look for repeatable business fields from documents, images, audio, or video rather than a general natural-language answer.
  • Failure Trigger: Generic chat or summarization returns prose that cannot reliably populate application fields, preserve confidence evidence, or trigger human review.
  • Operational Dependency: The first dependency is a schema or extraction objective that returns structured fields the app can map and validate.
  • Topic-Specific Exam Cue: Field names, tables, invoice data, call attributes, dates, amounts, confidence, analyzer, or structured output usually signal Content Understanding.

Practice question: A lightweight app must process recorded service calls and extract customer name, requested product, issue category, and promised follow-up date. What should the developer validate first?

A. That an image generation model can create a visual summary
B. That text-to-speech produces a natural voice
C. That a chat prompt can produce a paragraph summary
D. That Content Understanding returns the required structured fields from the audio

Correct Answer: D

Explanation: D is correct because the scenario requires structured extraction from audio. A and B target visual or spoken output. C may summarize but does not guarantee field-level output for the app.

Atomic Deconstruction - Operational Level

Content Understanding questions are schema-evidence questions. The operational object is not a paragraph summary; it is the repeatable extraction result that an application can map into fields, tables, confidence checks, and review workflows.

The drill is to design from the target fields backward. If the app needs customer name, issue category, line items, dates, or amounts, the correct answer must preserve schema, status, confidence, and client mapping rather than only producing natural-language output.

For exam transformation, treat every option as a proposed dependency. The correct option is the one that unlocks the blocked workflow and can be verified. Wrong options are often useful in another situation, but they fail here because they tune a later step, address a different modality, or repair a symptom without satisfying the scenario requirement.

Component Specifications

For this table, think like an app developer: the key question is whether the tool returns fields your program can store, check, and send for review.

Object Attribute Value Range Default State Dependency Failure State
Extraction schema Target fields Names, types, tables, confidence Absent until defined Business output contract Response is unstructured and cannot populate the app
Content source Media type Document, image, audio, video Not uploaded or referenced Tool support and file quality No extraction or partial extraction
Analyzer/test run Validation state Succeeded, failed, needs review Untested Sample content and schema Fields are missing before client integration
Client mapping Field binding JSON field to app property Manual or absent Extraction response Application stores wrong values or drops confidence signals

Step-by-Step Execution Path

  1. Define the business fields that must be extracted before choosing the tool. This turns the requirement into a schema rather than a broad summary.
  2. Use Content Understanding in Foundry Tools with representative documents, images, audio, or video to test extraction quality.
  3. Inspect field values, confidence, and missing-field behavior so the app can decide when human review is required.
  4. Build a lightweight client that submits content, reads structured output, maps fields to application objects, and logs extraction failures.

Suggested Lab Validation Ideas:

The following paths and commands are conceptual lab-style examples for practice. Adapt them to the current Microsoft documentation, SDK/API version, subscription permissions, and project environment before using them in a real implementation.

  • Portal path: Microsoft Foundry > Tools > Content Understanding - portal validation for analyzer/schema and test results.
  • Python SDK rehearsal: python extract_content.py --file invoice.pdf - local client validation for structured extraction; adapt to current Content Understanding SDK/API surface.
  • API rehearsal: inspect JSON response fields, confidence scores, and status - response-shape validation rather than deployment command.

Technical Chain

The app sends a content item to an analyzer that applies a schema or extraction objective. The service processes the media, returns structured fields with status and confidence evidence, and the client maps those fields into business objects. If the answer uses generic summarization, the chain loses schema, confidence, and repeatable field binding.

Operational Skills Matrix

Task Precise Command or Path Verification Standard
Validate schema coverage Microsoft Foundry > Tools > Content Understanding analyzer view All required business fields are represented in the analyzer or extraction output
Validate extraction result Run a sample file and inspect JSON/status output Expected fields, values, and confidence indicators are present
Validate client mapping Review application mapping from extraction response to data model Each required field is stored correctly and low-confidence cases are handled

Frequently Asked Questions

When building a Foundry chat solution, what should be validated before expanding prompts or adding more examples?

Answer:

Validate that the selected deployment can be invoked successfully and supports the required chat or agent capability.

Explanation:

Implementation questions usually start with a working dependency chain: deployment, endpoint, authentication, and supported capability. Prompt examples can improve behavior only after the app can call the correct model. In Foundry scenarios, a minimal test invocation is strong evidence that the configuration path is ready.

Demand Score: 92

Exam Relevance Score: 97

When should a Foundry single-agent solution be used instead of a simple one-turn chat response?

Answer:

Use a single-agent solution when the scenario requires goal-driven interaction, tool use, or multi-step task handling within a controlled workflow.

Explanation:

A chat response is appropriate for direct text generation or question answering. An agent is more suitable when the solution must coordinate steps, use configured tools, or follow a task flow. AI-901 candidates should look for workflow cues rather than assuming every conversational requirement is just a basic chat completion.

Demand Score: 86

Exam Relevance Score: 93

What is the best Foundry workload choice when an application must convert spoken customer audio into text for analysis?

Answer:

Use a speech-to-text capability, then process the resulting transcript as needed.

Explanation:

The required transformation is audio input to text output. Text-to-speech would solve the opposite problem, and generic chat does not by itself transcribe audio. Exam stems that use verbs such as transcribe, dictate, or capture spoken input usually indicate speech-to-text before later text analysis or summarization.

Demand Score: 89

Exam Relevance Score: 95

How should a learner choose between image generation and vision analysis in Foundry?

Answer:

Choose image generation to create new images, and choose vision analysis or a multimodal model to interpret existing images.

Explanation:

The difference is the direction of the task. If the app needs a generated visual from a prompt, image generation fits. If it needs to detect, describe, classify, or answer questions about an uploaded image, a vision or multimodal capability is required. This distinction is a common AI-901 workload-pattern trap.

Demand Score: 88

Exam Relevance Score: 96

What should be used when a Foundry solution must return invoice date, vendor name, line items, and totals from uploaded documents?

Answer:

Use Azure Content Understanding or an information extraction workflow with a schema for the required fields.

Explanation:

The output requirement is structured data from documents, not a freeform summary. Content Understanding and extraction workflows are designed to map document content into fields, tables, and entities. A generic chat answer may describe the invoice but would not reliably satisfy the business output contract.

Demand Score: 94

Exam Relevance Score: 98

Why should deterministic settings be considered for classification or extraction tasks in Foundry?

Answer:

They help produce more consistent outputs when the task requires stable labels, fields, or decisions.

Explanation:

Classification and extraction usually need repeatable results, so high randomness can make outputs inconsistent. Creative generation can benefit from more variation, but structured tasks are judged by reliability and format adherence. AI-901 scenarios may frame this as choosing parameters that match the task purpose after the model capability is already correct.

Demand Score: 85

Exam Relevance Score: 92

AI-901 Training Course