Snowflake uses the term AI Data Cloud to describe a unified, cloud-native platform that supports:
Data storage
Data processing
Data sharing
Data governance
Machine learning
AI capabilities
all in one single logical system, even when running on multiple physical cloud providers such as AWS, Azure, and Google Cloud.
This vision is based on the idea that AI requires high-quality, well-governed, highly accessible data, and the best way to support this is by unifying all data workloads under one architecture.
“Unified platform” means:
Your Snowflake experience is the same everywhere
Snowflake abstracts away differences between AWS/Azure/GCP
You do not deal with cloud-specific services; Snowflake provides a standard interface
Key benefits:
Portability: replicate data across clouds or regions
Consistency: same SQL, same security model, same architecture
Reduced complexity: Snowflake manages infrastructure differences for you
A single company might run:
Sales analytics in AWS
Marketing data pipelines in Azure
AI models in GCP
…but Snowflake makes them feel like they’re running in one system.
Snowflake supports multiple workloads on the same platform, eliminating the need for disconnected tools.
Analytical SQL workloads
Reporting and dashboarding
Star/snowflake schema analysis
Large aggregations
Snowflake supports semi-structured data formats such as:
JSON
Avro
Parquet
ORC
XML
You can load raw data and query it directly through VARIANT and dedicated SQL functions.
Snowflake helps data engineers build pipelines using:
SQL (DDL, DML, CTAS)
Streams & Tasks (change data capture + scheduling)
Dynamic Tables (declarative pipelines)
Snowpark (Python/Scala/Java)
Snowflake supports zero-copy Secure Data Sharing, letting users share live data:
Without copying
Without ETL
With strict governance
Used for internal data sharing or external partners.
Snowflake enables ML workflows with:
Snowpark for Python
Feature engineering in SQL
Storing features/predictions
Integrations with ML platforms
In-database inference (depending on region/edition)
Developers can build applications backed by Snowflake, including:
Analytical applications
Data-intensive services
Snowflake Native Apps (installed into customer accounts)
Snowflake is adding native AI and ML features to keep AI next to the data.
Cortex offers:
Built-in LLM functions
Vector search for semantic retrieval
Embeddings generation
AI-powered SQL assistance
Must be understood conceptually (not deeply) for the exam.
Snowpark allows:
Python-based ML workflows
Pushing compute to Snowflake
Eliminating the need to move data out
A powerful tool for governed ML pipelines.
Snowflake Marketplace provides:
External datasets
Data applications
AI/ML services
Data enrichment tools
Allows easy integration of 3rd-party intelligence into your workflows.
For the exam:
Prioritize architecture and platform design
Know Snowflake supports multiple workloads in one system
Understand multi-cloud, governance, and unified access
AI-specific product features are less important
Snowflake’s architectural foundation is:
Storage is shared. Compute is independent and elastic.
Meaning:
All compute clusters read/write the same data
Compute can scale up/down/out independently
Workloads do not compete for local resources
You don't manage storage layout or indexes
This separation is essential for Snowflake’s performance, simplicity, and elasticity.
Snowflake consists of three logical layers, each with a clear role.
Snowflake stores data in micro-partitions, which are:
Immutable
Columnar
Compressed
Typically around 16MB
Automatically created
Each partition contains metadata:
Min/max column values
Distinct count
Null count
Other statistics
Used for partition pruning, allowing Snowflake to skip irrelevant partitions during query execution.
Snowflake handles:
Partitioning
Compression
Statistics
Metadata organization
File lifecycle
Users do not manage:
Indexes
Vacuuming
Physical layout
Partition definitions
Snowflake fully abstracts these responsibilities.
A virtual warehouse is a compute engine responsible for:
Running queries
Executing DML (INSERT, UPDATE, DELETE, MERGE)
Performing COPY INTO loads
Running Tasks (scheduled operations)
Warehouses:
Do not share memory
Do not share local disk
Each has its own caching layer
All access the same central storage
This allows complete workload isolation.
Warehouses can scale:
Up → change size (XS → S → M, etc.)
Out → add clusters (multi-cluster warehouse)
Scale up = faster single queries
Scale out = better concurrency
This layer is the “control plane”, managing:
Authentication
Authorization
Query parsing
Query optimization
Metadata
Transactions
Result cache
Billing
Orchestration
Runs independently of warehouses, allowing some operations to proceed without compute.
All warehouses across the account:
Access the same micro-partitions
Operate on one version of truth
Avoid data duplication
Because compute does not store data locally:
No competition for I/O
Easier to scale compute elastically
Multi-cluster warehouses solve concurrency bottlenecks.
When queues form:
Cluster #1 is busy
Snowflake automatically adds Cluster #2
Then Cluster #3 if needed
When load decreases:
Ideal for:
BI dashboards
Shared analyst workloads
Spiky workloads
Multiple warehouses for different workloads
Each warehouse scales independently
Auto-suspend and auto-resume optimize cost
Three caches:
Result Cache: stored in Cloud Services; reused if SQL/data unchanged
Metadata Cache: stored in Cloud Services; enables pruning
Data Cache: stored on warehouse local SSD; lost on suspend
You must know where each cache lives.
Clones databases/schemas/tables instantly:
No data copied
New objects reference micro-partitions
Only changed data creates new partitions
Used for:
Dev/test
What-if analysis
Point-in-time recovery
Time Travel: restore/query data as of past time; 1–90 days
Fail-safe: extra 7 days; Snowflake-managed only
Know the difference.
Secure Data Sharing: live data sharing without copying
Replication: cross-account, cross-region, cross-cloud
Failover/Failback: DR capability
Snowflake Marketplace
Snowgrid for cross-cloud interoperability
Integrations with ETL, BI, ML tools
Snowgrid is Snowflake’s global control and coordination layer that operates across all supported cloud providers and regions. It is one of the least understood but most important architectural components of the platform.
Snowgrid provides the metadata, governance, and orchestration backbone that enables Snowflake to function as a unified AI Data Cloud despite running on different clouds and regions.
Key capabilities include:
Global metadata orchestration
Cross-region and cross-cloud replication management
Governance consistency across all regions
Support for global services such as data sharing, Marketplace distribution, and application deployment
Without Snowgrid, Snowflake would behave like isolated deployments in each cloud. Snowgrid ensures:
Consistent semantics and APIs across AWS, Azure, and GCP
Interoperability for global organizations
The ability to seamlessly replicate data across clouds
Centralized policy enforcement and governance
Snowgrid is the foundation for:
Cross-cloud data sharing
Cross-region database and share replication
Failover and failback management
Snowflake Marketplace global distribution
Native Application Framework app deployment
For the SnowPro exam, it is essential to know that Snowgrid is the underlying layer that makes Snowflake a true multi-cloud unified platform.
Snowflake implements a modern transaction system that guarantees full ACID compliance without using locking mechanisms typical in traditional databases.
Snowflake guarantees:
Atomicity
Consistency
Isolation
Durability
All transactions operate on consistent snapshots of data.
Snowflake’s concurrency model relies on MVCC, which allows:
Multiple readers and writers to operate concurrently
Readers to see a consistent snapshot of data without being blocked by writers
Writers to generate new versions of micro-partitions
Snowflake does not use:
Row locks
Table locks
Page locks
Instead, updates create new micro-partitions, and queries read the correct version based on transaction timestamps.
MVCC enables:
High concurrency for analytics workloads
Isolation without blocking
Support for Time Travel by retaining old versions of partitions
Fast cloning using metadata pointers
Understanding MVCC is essential for interpreting Snowflake’s performance and behavior under concurrent workloads.
External Tables allow Snowflake to query data stored in external cloud storage without loading it into internal Snowflake-managed storage.
External Tables are used primarily in cloud data lake architectures where data remains in:
Amazon S3
Azure Data Lake Storage (ADLS)
Google Cloud Storage (GCS)
External Tables rely on:
External file metadata stored in Snowflake
A metadata cache for file characteristics
The external storage location for actual file content
External Tables do not automatically detect new files unless specifically refreshed:
ALTER EXTERNAL TABLE my_table REFRESH;
This updates Snowflake’s metadata cache to recognize new or removed data files.
Querying data lakes without ingesting data
Blending data lake and warehouse architectures
Gradual migration to Snowflake from legacy lake architectures
Combining data from internal and external tables
External Tables support both structured and semi-structured data.
Snowflake provides support for Apache Iceberg, a high-performance table format widely used in modern data lake and lakehouse systems.
Snowflake supports two operational modes:
Snowflake reads Iceberg metadata maintained outside of Snowflake
Data remains in external storage
Snowflake acts as a query engine without managing the table lifecycle
Iceberg metadata and table lifecycle are fully managed by Snowflake
Data is stored in customer cloud storage
Provides consistent performance and Snowflake-level governance
Iceberg tables allow Snowflake to:
Interoperate with lakehouse ecosystems like Delta Lake and Hudi
Serve as a central data access layer for existing data lakes
Offer ACID-compliant operations on open formats
Iceberg support enables Snowflake to operate seamlessly in mixed architectures.
Materialized Views (MVs) are stored query results that are automatically refreshed by Snowflake.
Snowflake maintains MVs by:
Tracking changes at the micro-partition level
Incrementally updating the MV as source data changes
Storing precomputed results for fast access
Significantly faster query performance
Ideal for dashboards
Reduced compute for frequently repeated queries
MVs consume storage for materialized results
Maintenance consumes compute credits
MVs have limitations:
Cannot reference another MV
Must reference a single base table
Limited support for complex SQL constructs
Understanding MV limitations is important for exam questions about architecture and cost.
Search Optimization Service (SOS) accelerates highly selective queries that would otherwise require scanning large numbers of micro-partitions.
SOS builds additional persistent search structures to enable faster evaluation of:
Equality predicates
IN list queries
Highly selective filters
Certain semi-structured search conditions
Improves performance without traditional indexing
Fully managed by Snowflake
Adds both compute and storage cost
Does not replace clustering for range-based pruning
Appropriate for:
High-selectivity lookups
Large tables frequently queried on low-cardinality filters
Text searches or semi-structured field lookups
Not appropriate for:
Range queries (improved by clustering keys instead)
Full-scan analytical queries
Search Optimization is a powerful but optional performance service.
What are the three main layers of Snowflake architecture and their responsibilities?
The three layers are Database Storage, Compute (Virtual Warehouses), and Cloud Services. Storage holds data in compressed micro-partitions. Compute executes queries independently using virtual warehouses. Cloud Services manages metadata, authentication, and query optimization.
Snowflake separates storage and compute, enabling independent scaling. Storage is centralized and persistent, while compute clusters are transient and scalable. Cloud Services acts as the coordination layer, handling query parsing and access control. A common mistake is assuming compute stores data—it does not.
Demand Score: 82
Exam Relevance Score: 90
How do virtual warehouses scale in Snowflake?
Virtual warehouses scale either by resizing (increasing compute size) or by enabling multi-cluster mode to handle concurrency. Resizing increases resources per query, while multi-cluster adds parallel clusters for concurrent workloads.
Scaling up improves query performance, while scaling out handles multiple users. Snowflake allows auto-suspend and auto-resume to optimize cost. A common misunderstanding is using larger warehouses for concurrency instead of multi-cluster scaling.
Demand Score: 80
Exam Relevance Score: 88
What role does the Cloud Services layer play in query execution?
The Cloud Services layer handles query parsing, optimization, metadata management, and access control before sending execution tasks to the compute layer.
It acts as the brain of Snowflake, coordinating all operations. It does not execute queries itself but determines execution plans. A common mistake is assuming it consumes user credits—it generally does not for most operations.
Demand Score: 76
Exam Relevance Score: 85
How does Snowflake separate storage and compute, and why is it important?
Snowflake stores data in centralized storage while compute resources (virtual warehouses) are independent. This allows scaling compute without affecting storage and vice versa.
This separation enables concurrency, cost control, and performance tuning. Users can run multiple warehouses on the same data simultaneously. A common mistake is assuming scaling storage affects performance—it does not directly.
Demand Score: 79
Exam Relevance Score: 90