Loading and unloading are essential Snowflake operations.
They define how data enters Snowflake and how data leaves Snowflake.
The exam expects you to understand stages, file formats, COPY INTO behavior, and Snowpipe.
Internal stages are fully managed by Snowflake.
You do not need cloud credentials to use them.
Snowflake must understand how to parse incoming or outgoing data.
File formats tell Snowflake how to interpret fields, delimiters, compression, and more.
Or you can specify formats inline in COPY INTO commands.
Snowflake supports structured and semi-structured formats equally well.
These options determine how Snowflake parses files during COPY.
COPY INTO is Snowflake’s primary command for bulk data ingestion.
3.1 Typical Copy Pattern
COPY INTO my_table
FROM @mystage/path/
FILE_FORMAT = (FORMAT_NAME = my_csv_format)
ON_ERROR = 'CONTINUE';
COPY INTO:
3.2 COPY INTO Highlights
3.2.1 Parallel Loading
If there are many files:
Snowflake loads them in parallel
Data ingestion is fast and scalable
Parallelism depends on warehouse size
3.2.2 Loading Semi-Structured Data
Semi-structured formats (JSON, Parquet) can be loaded directly into:
3.2.3 Error Handling
ON_ERROR controls what happens during ingestion:
'ABORT_STATEMENT' (default) → stop on error
'CONTINUE' → skip bad rows
'SKIP_FILE' → skip whole files with errors
3.2.4 Validation Mode
Useful for checking data before loading:
VALIDATION_MODE = 'RETURN_ERRORS';
Shows exactly which rows or files have issues.
3.2.5 Column Mapping
Two modes:
Mapping by name makes COPY more robust for changing schemas.
4. Continuous Loading – Snowpipe / Snowpipe Streaming
Snowflake offers two continuous-ingestion mechanisms:
4.1 Snowpipe (Classic)
4.1.1 What Snowpipe Does
Snowpipe loads data automatically after files land in a stage.
Key characteristics:
Serverless (no warehouse needed)
Continuous ingestion
Delay usually seconds to minutes
Ideal for micro-batch architectures
4.1.2 Pipe Object
Snowpipe uses a pipe, which contains a COPY INTO command:
CREATE PIPE mypipe AS
COPY INTO raw_events
FROM @mystage/events/
FILE_FORMAT = (TYPE=JSON);
4.1.3 Trigger Methods
Snowpipe can be triggered by:
Cloud storage notifications (S3 Events, GCS PubSub, Azure Events)
REST API calls
Manual refresh (ALTER PIPE ... REFRESH)
4.2 Snowpipe Streaming
4.2.1 What It Is
Snowpipe Streaming provides:
4.2.2 Usage Scenarios
Commonly used with:
Snowpipe Streaming writes data directly into a target table without waiting for batch files.
5. Data Unloading (Export)
Unloading exports data from Snowflake into files in an internal or external stage.
5.1 COPY INTO (Unloading)
A typical unload pattern:
COPY INTO @mystage/unload/path/
FROM (SELECT * FROM my_table)
FILE_FORMAT = (TYPE = CSV);
This writes a set of output CSV files into the stage.
5.2 What You Do After Unloading
5.2.1 If It’s an Internal Stage
You download files using:
GET @mystage/unload/path/ file:///localfolder/
(The GET command works using SnowSQL.)
5.2.2 If It’s an External Stage
You download files using:
Snowflake simply writes files to the bucket path; you manage them from the cloud provider tools.
Data Loading and Unloading (Additional Content)
1. Key COPY INTO Options (Advanced)
Snowflake’s COPY INTO <table> command supports several advanced options that give fine control over load behavior. These options are frequently tested because they influence both correctness and performance in production pipelines.
1.1 FORCE = TRUE
Forces Snowflake to reload files even if they are already recorded in the load history.
Normally, Snowflake prevents reloading previously loaded files to ensure idempotency.
FORCE overrides this safeguard.
Useful when reprocessing files after schema changes or corrections.
1.2 PURGE = TRUE
Deletes files from the stage after successful load.
Helps prevent duplicate processing.
Typically used with external stages when files are only needed once.
1.3 PATTERN = ''
Filters which files Snowflake loads from a stage path.
Supports full regular expressions.
Important when stage contains many files but only certain subsets should be loaded.
1.4 FILES = ('file1', 'file2')
Restricts the load to specific files.
Overrides directory scanning and pattern matching.
Ideal for retrying or testing a small subset of files.
1.5 TRUNCATECOLUMNS = TRUE
Automatically truncates values that exceed the target column size.
Prevents load failures and allows partial ingestion.
Useful in semi-trusted data ingestion scenarios.
1.6 ERROR_ON_COLUMN_COUNT_MISMATCH
Controls behavior when the number of fields in a row does not match the target table schema.
Common settings:
This is required knowledge for handling CSV and semi-structured ingestion inconsistencies.
2. Load Monitoring and History
Snowflake provides extensive tooling to analyze, monitor, and validate bulk loading operations.
2.1 COPY_HISTORY Table Function
Returns historical load operations for a given table or stage.
Includes:
File names
Load status
Rows loaded
Load timestamps
Useful for debugging pipeline failures.
2.2 LOAD_HISTORY Views
Available via:
ACCOUNT_USAGE
INFORMATION_SCHEMA
These views store system-wide load audit data:
Often used in governance and monitoring dashboards.
2.3 VALIDATE() Function
Allows validating staged data without loading.
Returns details such as:
Parsing errors
Invalid rows
Incorrect column counts
Used during schema evolution and load testing.
2.4 RESULT_SCAN() After COPY
After a COPY INTO operation, results can be retrieved using:
SELECT * FROM TABLE(RESULT_SCAN(LAST_QUERY_ID()));
Provides row counts, skipped rows, and summary details.
Critical for automated ingestion monitoring.
3. Storage Integrations for External Stages
Storage integrations define secure connections between Snowflake and cloud storage platforms.
3.1 Supported Providers
Amazon S3
Azure Blob / ADLS
Google Cloud Storage
3.2 Purpose
Provide secure, credential-free access from Snowflake to external storage.
Remove the need to embed hard-coded credentials (keys, secrets).
Required when using event-based ingestion (Snowpipe auto-notifications).
3.3 Behavior
Snowflake stores and manages the integration metadata.
The cloud provider grants Snowflake IAM-level permissions via trust relationships.
Ensures secure, auditable, least-privilege access.
Storage integrations are essential for enterprises with strong security policies.
4. Transformations During LOAD
Snowflake allows transformations to be embedded inside the COPY INTO command through an inline SELECT.
4.1 Supported Transformations
CAST or TRY_CAST
Date/time conversions such as TO_TIMESTAMP
JSON field extraction using $1:property syntax
Column reordering or selective extraction
Data type normalization
4.2 Pattern Example
COPY INTO target_table
FROM (
SELECT
$1::NUMBER,
TO_TIMESTAMP($2),
$3:field::STRING
FROM @stage
)
FILE_FORMAT = (TYPE = CSV);
This lets you preprocess data during ingestion, reducing or eliminating the need for post-load transformation steps.
5. MERGE for Upsert Pipelines
MERGE is a foundational command for incremental ingestion patterns.
5.1 Typical Pattern
Load raw files into a staging table via COPY
Execute MERGE into the final modeled table
5.2 What MERGE Handles
5.3 Integration with Other Features
Can be paired with Streams and Tasks for automation
Works well for incremental ELT pipelines
Commonly used in production data warehousing workloads
MERGE is one of the most important transformation commands to know for the exam.
6. Snowpipe Cost and Behavior Details
Snowpipe provides continuous, serverless ingestion of staged files.
6.1 Compute Model
Does not use customer warehouses
Uses Snowflake-managed serverless compute
Credits billed per file loaded
6.2 Triggering Methods
6.3 Latency Profile
Snowpipe is ideal for continuous ingestion of small files.
7. Snowpipe Streaming Cost Model
Snowpipe Streaming provides ultra-low-latency ingestion without requiring staged files.
7.1 Characteristics
Ingests rows directly into Snowflake tables
Latency measured in milliseconds to seconds
Suitable for Kafka, CDC systems, and event buses
7.2 Cost Model
Snowpipe Streaming is the preferred option for real-time pipelines.
8. Unloading Advanced Options
Snowflake supports flexible data export via COPY INTO <location>.
8.1 OVERWRITE = TRUE
Replaces existing files in the target stage
Useful for periodic full exports
Prevents directory accumulation
8.2 SINGLE = TRUE
Produces a single output file
Useful for downstream systems requiring single-file loads
Not recommended for very large datasets due to size limits
8.3 MAX_FILE_SIZE
8.4 HEADER = TRUE
8.5 COMPRESSION
Supports:
Compression affects both file size and export performance.
Frequently Asked Questions
What is the purpose of the COPY INTO command?
Answer:COPY INTO is used to load data from staged files into Snowflake tables or unload data from tables into external locations.
Explanation:It is the primary bulk data loading mechanism. It supports multiple file formats and parallel loading. A common mistake is confusing load vs unload syntax.
Demand Score: 83
Exam Relevance Score: 93
What is a stage in Snowflake?
Answer:A stage is a location where data files are stored for loading or unloading.
Explanation:Stages can be internal (managed by Snowflake) or external (e.g., S3, Azure Blob). They act as intermediaries between storage and tables. A common mistake is thinking data loads directly from local systems without staging.
Demand Score: 82
Exam Relevance Score: 92
What is the difference between internal and external stages?
Answer:Internal stages are managed by Snowflake, while external stages reference cloud storage like S3 or Azure.
Explanation:Internal stages are easier to use but less flexible. External stages enable integration with existing data lakes. A common mistake is not configuring credentials correctly for external stages.
Demand Score: 80
Exam Relevance Score: 91
What file formats does Snowflake support for loading?
Answer:Snowflake supports formats like CSV, JSON, Avro, ORC, and Parquet.
Explanation:Semi-structured formats like JSON are automatically parsed into VARIANT columns. A common mistake is not defining proper file format options.
Demand Score: 78
Exam Relevance Score: 88
What happens if data loading fails partially?
Answer:Snowflake can continue loading valid rows while logging errors, depending on COPY options.
Explanation:Options like ON_ERROR control behavior. A common mistake is not reviewing load error logs.
Demand Score: 79
Exam Relevance Score: 90
COF-C02 Training Course