Generative AI is a type of artificial intelligence that can create new content. This includes writing text, drawing pictures, composing music, generating computer code, or even producing videos. It learns from existing data and then produces original outputs based on patterns it has learned.
If you tell a generative AI:
"Write a story about a dog who learns to fly,"
it can create an entirely new story on that topic — not just retrieve something already written.
| Traditional AI | Generative AI |
|---|---|
| Predicts or classifies (e.g., spam detection) | Creates new content (e.g., emails, summaries) |
| Solves narrow, specific problems | Handles flexible, open-ended tasks |
| Learns from labeled data | Learns mostly from unlabeled data |
| Produces structured outputs (numbers, labels) | Produces unstructured outputs (text, images, etc.) |
Let’s explore the first major concept: Foundation Models, also known as Large Language Models (LLMs).
Foundation models are large-scale AI models trained on massive datasets. These datasets include text from books, websites, conversations, code, and more. The goal is for the model to learn general patterns in language, reasoning, and knowledge that can be applied to many different tasks.
A foundation model is a general-purpose AI model trained on large and diverse data to support a wide range of tasks such as summarization, question answering, translation, content generation, and more.
GPT (used in ChatGPT, developed by OpenAI)
PaLM (developed by Google)
Gemini (developed by Google, designed for multimodal tasks)
Claude (developed by Anthropic)
Most foundation models use a special kind of neural network architecture called the Transformer. This architecture was introduced by researchers at Google in 2017 and became the foundation for almost all modern generative AI systems.
It allows the model to read and understand long pieces of text.
It uses a method called "self-attention" to focus on the most important parts of a sentence or paragraph.
It enables parallel processing, which speeds up training on large datasets.
In the sentence "The teacher asked the student because she was confused," the model uses self-attention to figure out who "she" refers to — which depends on the broader context.
Foundation models are trained in several stages, each serving a different purpose. These include:
The model is exposed to a large volume of text or images and learns to predict the next word, sentence, or visual element. This stage does not require labeled data. It's called "self-supervised learning."
Example: Given the sentence "The Eiffel Tower is in ___", the model learns to complete it with "Paris."
After pretraining, the model can be further trained on a specific type of data for a specialized task.
Example: A general language model can be fine-tuned on legal documents to become better at answering legal questions.
Instead of changing the whole model, small components or prompt templates are trained or adjusted. This is a much lighter and more efficient method.
Example: You can teach the model to always return answers in bullet points by designing a specific prompt, or use adapter layers that tweak the model’s behavior without retraining everything.
Prompt engineering is the practice of designing effective input instructions (called "prompts") to guide a generative AI model to produce accurate, useful, and relevant outputs. Since generative models respond directly to user prompts, the way you ask a question or give instructions has a huge impact on the result.
Zero-shot prompting
This is when you ask the AI to complete a task without giving any examples.
Example:
"Translate this sentence into Spanish: Where is the train station?"
Few-shot prompting
In this method, you provide a few examples of input-output pairs to help the model understand what kind of response you want.
Example:
"Translate the following:
English: Good morning → Spanish: Buenos días
English: Thank you → Spanish: Gracias
English: How are you → Spanish:"
Chain-of-thought prompting
This technique asks the model to explain its reasoning step-by-step before giving the final answer. It's useful for complex tasks like math or logic questions.
Example:
"John has 3 apples. He buys 2 more. How many does he have now? Let's think step by step."
Use clear instructions
Avoid vague language. Be specific about what you want.
Define roles
Tell the model who it is acting as.
Example: "You are a customer service assistant."
Specify output format
Clearly describe how the answer should look.
Example: "List your response in bullet points."
Experiment with model parameters
These include:
Temperature: Controls how creative or random the output is. A lower temperature (like 0.2) gives more consistent, logical answers. A higher value (like 0.8) gives more creative results.
Max tokens: Limits the length of the response.
Top-p: Another method to control randomness. Like temperature, but based on probability thresholds.
By combining these techniques, you can make a model generate much more precise or creative outputs depending on your needs.
Generative AI can perform many types of creative tasks across different formats and industries. Below are the most common capabilities:
This is the most common use of generative AI.
Writing articles, blogs, reports, emails
Answering questions in conversational style (like a chatbot)
Summarizing long documents
Translating between languages
Rewriting or editing existing text
Writing code in programming languages like Python, JavaScript, or SQL
Some generative models can create images or audio from text prompts.
Image generation tools:
DALL·E (by OpenAI)
Imagen (by Google)
MidJourney
Examples:
"Draw a futuristic city at sunset"
"Create a logo for a tech company"
Audio generation:
Music synthesis from prompts
Speech generation in natural voices
Text-to-speech tools
Multimodal AI can take in more than one type of input (text, image, audio) and respond accordingly.
Example:
You can give it a picture of a graph and ask:
"What does this chart say about company sales?"
The model interprets the image and gives a meaningful answer.
Models like Gemini are designed for this multimodal interaction, combining reasoning across text, image, and even video.
While generative AI is powerful and versatile, it also comes with significant risks that users and developers must understand and manage carefully.
Definition:
Hallucination occurs when a generative AI model produces information that sounds plausible but is factually incorrect or entirely made up.
Example:
You ask, “Who invented email?” and the model responds, “Elon Musk invented email in 1997.”
This is false, but may sound convincing to a reader.
Why it happens:
Generative models predict text based on patterns, not facts. If the training data contains misleading or ambiguous information, the model might generate errors.
Definition:
Bias in AI refers to the tendency of a model to reflect or amplify stereotypes or unfair assumptions present in its training data.
Examples:
Gender bias in job descriptions (e.g., assuming all engineers are male)
Racial or cultural bias in legal or healthcare advice
Why it happens:
If a model is trained on biased data — such as online forums or historical documents — it may learn and repeat those patterns.
Impact:
Biased outputs can cause harm to individuals or groups and may violate ethical or legal standards.
Definition:
Toxicity refers to the generation of harmful, offensive, or inappropriate language.
Examples:
Hate speech
Insults or slurs
Violent or disturbing content
Causes:
Presence of toxic language in the training data
Open-ended prompts with no safeguards
Solutions:
Use safety filters and moderation tools
Apply prompt restrictions or content classification
Definition:
Data privacy risk refers to the possibility that a model might unintentionally reveal sensitive or personal information.
Examples:
Echoing names, phone numbers, or addresses if present in training data
Leaking internal business information in outputs
Key concerns:
Using personal or proprietary data in training without consent
Storing user prompts in ways that violate data policies
Solutions:
Avoid using real personal data in training
Use private models with strict access control
Anonymize or redact sensitive information
To ensure safe, ethical, and fair use of generative AI, organizations follow a set of guiding principles. These are often grouped under the term Responsible AI.
Definition:
Users should be able to understand how an AI system arrived at a particular response or decision.
Why it matters:
If someone is affected by an AI output (e.g., denied a loan), they deserve an explanation.
Methods:
Provide rationale or reasoning in outputs
Use step-by-step output methods (e.g., chain-of-thought prompting)
Definition:
AI systems and their creators should be open about how the system was trained, what data was used, and what limitations it has.
Best practices:
Publish model documentation
Provide terms of use and usage logs
Mark AI-generated content clearly
Definition:
AI should be designed to avoid harmful outputs and to perform reliably in real-world conditions.
Examples of safety mechanisms:
Offensive language filters
Prompt blocklists
Monitoring for abnormal behavior
Definition:
Clear human responsibility should be assigned for how AI is used and what it produces.
Who is accountable:
Developers (for safe model design)
Businesses (for deployment and usage policies)
Users (for how they apply AI in decision-making)
Although both traditional AI and generative AI use machine learning, they are designed for different goals, solve different types of problems, and produce different outputs.
Traditional AI:
Designed mainly for analysis and prediction.
It classifies data, detects patterns, forecasts outcomes, and makes decisions.
Generative AI:
Designed to create new content.
It produces text, images, sounds, and other media that did not exist before.
| Aspect | Traditional AI | Generative AI |
|---|---|---|
| Input Type | Structured data (numbers, tables) | Unstructured data (text, images, audio) |
| Output Type | Labels, numbers, categories | New content (sentences, images, music, code) |
| Task Example | Predict sales next month | Write a report explaining sales trends |
Traditional AI:
Often uses supervised learning, where the model is trained with input-output pairs (labeled data).
Example: Input = image of a dog, Output = label "dog"
Generative AI:
Typically uses self-supervised learning, where the model learns patterns from raw, unlabeled data by predicting missing parts.
Traditional AI:
Built for narrow tasks. Each model is trained for one job.
Generative AI:
Built on foundation models that can adapt to many tasks with small adjustments or prompts.
Traditional AI:
Models are often small or medium-sized. Easier to explain and control.
Generative AI:
Models are extremely large (billions of parameters), which allows for more general intelligence but makes them harder to fully understand and manage.
| Category | Traditional AI | Generative AI |
|---|---|---|
| Primary Use | Prediction, classification | Creation of new content |
| Learning Type | Supervised learning | Self-supervised learning |
| Output | Structured (labels, numbers) | Unstructured (text, images, code) |
| Example Tasks | Fraud detection, spam filtering | Writing essays, generating images |
| Task Flexibility | Task-specific | General-purpose |
| Model Examples | Decision Trees, SVM, Random Forests | GPT, PaLM, Gemini, Claude |
We’ve now covered the entire Fundamentals of Generative AI module in detail, including:
What Generative AI is and how it works
Foundation models and their architectures (like Transformers)
How to use prompts to control output (Prompt Engineering)
Capabilities of GenAI in text, images, audio, and multimodal contexts
Risks such as hallucination, bias, toxicity, and data privacy
Responsible AI principles: explainability, transparency, safety, and accountability
Clear distinctions between Generative AI and Traditional AI
High-quality, diverse, and representative data is essential to the performance and ethical reliability of generative AI models.
Why it matters:
Quality: If training data contains spelling errors, factual inaccuracies, or illogical content, the model may learn to reproduce those problems in its outputs.
Diversity: A wide variety of topics, styles, dialects, and domains allows the model to generalize across use cases (e.g., legal writing, creative fiction, medical summaries).
Representation: If the data reflects only one region, language, or demographic, the model may show bias, lack cultural understanding, or exclude minority viewpoints.
Impact on performance:
Bias and fairness issues increase if source data is imbalanced.
Repetitive or narrow data can lead to brittle models or overfitting.
High-quality, well-labeled, and balanced data enables models to generate content that is more factual, inclusive, and robust across tasks.
Best practice: Large-scale foundation models are often trained on multi-terabyte datasets pulled from the web, books, code repositories, and public documents. However, responsible curation and preprocessing (e.g., deduplication, toxicity filtering) are necessary to ensure safety and effectiveness.
Although both self-supervised and unsupervised learning use unlabeled data, they differ in how they define the learning task.
Unsupervised learning:
Finds structure in the data without any labels.
Examples: clustering, anomaly detection, topic modeling.
Goal: discover patterns or groups (e.g., K-means or PCA).
Self-supervised learning:
Generates pseudo-labels from the data itself to create a supervised-like task.
Examples: predicting the next word (language modeling), masked image patch reconstruction.
Goal: train a model with millions of input-output pairs derived from raw data.
Key comparison:
| Aspect | Unsupervised Learning | Self-Supervised Learning |
|---|---|---|
| Labels | None | Internally generated |
| Task | Discover structure | Predict missing data |
| Examples | Clustering, dimensionality reduction | Next-token prediction, contrastive learning |
| Usage | Analytics, anomaly detection | Foundation model pretraining |
Self-supervised learning is the default approach for training large-scale generative AI models like GPT or Gemini.
Generative AI outputs—especially text and image content—must be evaluated with both automated metrics and human judgment.
Text-based metrics:
BLEU (Bilingual Evaluation Understudy):
Used primarily in machine translation.
Compares the overlap of n-grams between the model output and a reference translation.
ROUGE (Recall-Oriented Understudy for Gisting Evaluation):
Common in summarization.
Measures overlap of words or phrases between the generated summary and human-written summary.
METEOR, BERTScore:
Image-based metrics:
FID (Fréchet Inception Distance):
Measures how similar the distribution of generated images is to real ones.
Lower scores indicate better realism.
CLIPScore:
Human evaluation dimensions:
Factuality
Coherence
Helpfulness
Toxicity
Style or tone alignment
Best practice: Use a combination of metrics to get a well-rounded view of output quality, especially for tasks involving open-ended generation.
Although transformer-based architectures dominate language models, diffusion models are the most successful approach for generating high-quality images.
What are diffusion models?
They start with random noise and gradually denoise it through many steps to produce an image.
The process is learned by reversing a simulated noise process during training.
Key models:
Stable Diffusion:
An open-source latent diffusion model.
Allows text-to-image generation with control over style, resolution, and prompts.
DALL·E 2:
Developed by OpenAI.
Combines diffusion and transformer techniques.
Accepts text prompts and generates diverse, creative visual outputs.
Why diffusion?
Diffusion models produce high-resolution, photorealistic, and diverse images.
They are preferred over GANs (Generative Adversarial Networks) in many GenAI applications due to better stability and fewer artifacts.
Integration with GenAI tools:
A company executive asks how generative AI differs from traditional machine learning models used in earlier analytics systems. What is the most accurate explanation?
Generative AI models create new content such as text, images, or code by learning patterns from large datasets, while traditional machine learning models typically classify, predict, or detect patterns within existing data.
Traditional ML systems focus on predictive tasks like classification, regression, or anomaly detection. They usually require structured training data and are designed for specific tasks such as fraud detection or recommendation systems. Generative AI models—especially large language models and diffusion models—are trained on massive datasets and learn complex probability distributions of data. This enables them to generate entirely new outputs such as natural language responses, synthetic images, or code snippets. Because of this generative capability, they support broader applications like chatbots, creative tools, and knowledge assistants. A common misunderstanding is assuming generative AI simply retrieves existing information. Instead, it synthesizes new outputs based on learned patterns.
Demand Score: 81
Exam Relevance Score: 86
What is a foundation model in the context of generative AI?
A foundation model is a large-scale machine learning model trained on extensive datasets that can be adapted for many different tasks without being retrained from scratch.
Foundation models serve as a base model that supports multiple downstream tasks such as summarization, translation, question answering, or content generation. These models are typically trained on vast amounts of text, images, or multimodal data. Instead of building separate models for each task, developers adapt the foundation model through techniques like prompting, fine-tuning, or grounding. This greatly reduces development time and enables organizations to reuse powerful pre-trained capabilities. In enterprise environments, foundation models are often delivered through managed services so organizations can leverage them without managing infrastructure or large training datasets.
Demand Score: 78
Exam Relevance Score: 85
Why do large language models sometimes produce incorrect or fabricated information?
Large language models may generate incorrect information due to hallucination, where the model produces plausible-sounding but inaccurate content because it predicts text patterns rather than verifying factual correctness.
LLMs generate responses by predicting the most statistically likely next words in a sequence. They do not inherently verify facts or access real-time knowledge unless integrated with external data sources. As a result, they may confidently produce responses that appear accurate but are fabricated or outdated. This behavior is known as hallucination. It typically occurs when the model lacks relevant training data or when the prompt requires information outside the model’s knowledge boundaries. Organizations mitigate hallucinations using grounding techniques, retrieval-augmented generation, or validation workflows that combine generative outputs with trusted data sources.
Demand Score: 75
Exam Relevance Score: 84
Which learning approach relies on labeled data where each input example is paired with the correct output?
Supervised learning.
Supervised learning is a machine learning approach where the model is trained using labeled datasets. Each training example includes both the input data and the expected output. The model learns the relationship between inputs and outputs so it can predict results for new data. This differs from unsupervised learning, which identifies patterns without labeled outputs, and reinforcement learning, which learns through rewards and penalties. Understanding these approaches is important for generative AI leadership roles because training data availability and labeling requirements directly influence project feasibility, costs, and model performance.
Demand Score: 70
Exam Relevance Score: 79