This section provides a step-by-step overview of the “Connecting to and Preparing Data” knowledge area in Tableau Desktop. It focuses on connecting to data sources and clarifies key concepts, including Live Connections, Extracts, and supported data sources.
The very first step in Tableau is to connect to a data source. Tableau is versatile and allows connections to multiple types of data sources, such as files, databases, cloud systems, and web-based data.
When connecting to a data source in Tableau, you will come across two main types of connections:
Example: Connecting Tableau to an online database like SQL Server to track real-time sales.
.hyper (newer format) or .tde (legacy format) extension. The extracted data is saved locally on your computer or server.Example: Extracting monthly sales data from an Excel spreadsheet to improve performance and work offline.
| Feature | Live Connection | Extract Connection |
|---|---|---|
| Data Freshness | Real-time updates | Static (requires manual refresh) |
| Performance | Slower for large datasets | Faster (optimized for Tableau) |
| Offline Access | Not available | Available |
| Storage Requirement | Minimal | Requires storage space |
Key Tip: Start with a Live Connection for small datasets or real-time needs. Use an Extract for better performance with large datasets or when working offline.
Tableau provides robust connectivity options to a variety of data sources. Let’s explore them in detail:
Files are one of the most common data sources for Tableau beginners. Tableau supports:
.xlsx, .xls).csv, .txt)Example: Connecting Tableau to a quarterly sales report stored in an Excel file.
Tableau can connect to relational databases and retrieve data using SQL queries. Supported databases include:
How It Works: You provide connection credentials (host, username, password), and Tableau queries the database to retrieve data.
Example: Analyzing a customer orders database stored in MySQL.
Tableau connects seamlessly to cloud-based data sources, allowing analysis of data stored on platforms like:
Example: Pulling data from Google Sheets for a shared sales team report.
Tableau can connect to web-based data sources using a Web Data Connector. A WDC allows you to fetch data from REST APIs or online systems.
Example: Connecting Tableau to a REST API to fetch real-time weather data.
Here’s a step-by-step guide to connect Tableau to any data source:
The Data Source Page is the interface where you can prepare and organize your connected data. Key components include:
Once you connect to a data source in Tableau, the next step is to prepare and clean the data to ensure it is ready for analysis. This involves combining data, managing metadata, filtering at the connection level, and creating extracts.
After connecting to a data source, you may need to manage or modify the connection. Tableau makes it simple to adjust these connections.
Example: If the Excel file path changes, you can edit the connection to point to the new location.
In Tableau, you can combine data from multiple tables or sources. There are three primary methods: Relationships, Joins, and Unions.
What Are Relationships?
Key Features:
How to Create Relationships:
Example: Connecting a Sales Table with an Orders Table using the Order ID field.
Inner Join: Keeps only rows with matching values in both tables.
Left Join: Keeps all rows from the left table and matching rows from the right table.
Right Join: Keeps all rows from the right table and matching rows from the left table.
Full Outer Join: Keeps all rows from both tables, including unmatched rows.
| Join Type | Description | Diagram |
|---|---|---|
| Inner Join | Keeps only matching rows. | Two overlapping circles (middle). |
| Left Join | Keeps all rows from the left. | Full left circle + overlap. |
| Right Join | Keeps all rows from the right. | Full right circle + overlap. |
| Full Outer Join | Keeps all rows from both tables. | Full circles (left and right). |
What Is a Union?
Example Use Case: Combining Q1, Q2, Q3, and Q4 sales data into a single table.
Steps to Perform Union:
Metadata refers to information about the structure of your data, such as field names, data types, and roles. Tableau allows you to manage metadata for better clarity and accuracy.
Example: Rename Cust_ID to Customer ID.
Example: Change TX to Texas and NY to New York for clarity.
Example: Convert Order Date from a string format to a date field.
Example: Assign State to the geographic role to create a map of sales by state.
When working with large datasets, it is often unnecessary to load all the data into Tableau. You can filter data at the connection level to improve performance and focus on relevant subsets of data. There are two main ways to filter data: Data Source Filters and Extract Filters.
| Feature | Data Source Filters | Extract Filters |
|---|---|---|
| When Applied | Before data is loaded into Tableau | When creating the extract file |
| Effect on Data | Limits data during live connections | Reduces the size of extracted data |
| Performance | Improves performance during data load | Improves performance of the extract |
| Connection Type | Works with Live and Extract connections | Only works with Extract connections |
An extract is a static snapshot of the data stored in a .hyper or .tde file. Extracts improve performance and allow offline access.
When creating an extract, Tableau allows you to customize how much data is included:
Filter Rows: Apply extract filters to include only specific subsets of data.
Aggregate Data: Combine rows of data by grouping fields and applying calculations (e.g., SUM, AVG).
Limit Rows: Load only a specific number of rows for quick testing.
Once you prepare and connect to a data source, you can save the connection for reuse in other workbooks. Tableau offers two formats for saving data sources:
Tableau Data Source (.tds)
Tableau Packaged Data Source (.tdsx)
.tds or .tdsx format.| Benefit | Explanation |
|---|---|
| Performance | Extracts are faster for analysis, especially for large datasets. |
| Offline Access | Extracts allow you to analyze data without a live connection. |
| Reusable Connections | Saved data sources can be reused in multiple workbooks. |
| Optimized Data Size | Filters and aggregations reduce the size of the extract. |
.tds or .tdsx formats for consistency.This concludes the "Connecting to and Preparing Data" knowledge area. We have covered:
.hyper and .tdsx files for better performance.Understanding the difference between Relationships and Joins is critical not only for modeling data correctly, but also for answering TDS-C01 questions that ask about when to use each.
| Aspect | Relationships | Joins |
|---|---|---|
| Layer | Logical Layer | Physical Layer |
| Data Merge Timing | During query time | At data source level |
| Output | Multiple tables remain separate until query | One flattened table |
| Performance | More efficient for large datasets | Can cause data duplication |
| Use Case | Multi-fact data models (e.g. Sales + Returns) | Simple row-level combinations |
Tip: Think of Relationships as a dynamic linkage that respects the context of the visualization, while Joins are a static merge that happens up front.
For exam and real-world use, it's helpful to visualize the four main types of joins as overlapping circles (like Venn diagrams):
This helps you quickly identify which join to use based on whether you want to preserve unmatched data.
When you first connect data, Tableau takes you to the Data Source Page, which includes:
Understanding this layout helps during both exam navigation and real Tableau usage.
The TDS-C01 exam often tests your ability to distinguish between similar concepts and apply them to real-world scenarios. Here are common formats and how to prepare:
You may be asked to:
Memory Tip:
Live = “Linked to source”
Extract = “Exported snapshot”
You may encounter questions like:
“You have four quarterly sales files with the same structure. How should you combine them?”
Correct Answer: Union
“You want to combine Orders and Returns but keep them logically separate in the view.”
Correct Answer: Relationship
“You need a single flat table of Orders and Customer Details.”
Correct Answer: Join
Understanding when to apply each method is key.
A common question may be:
“Which filtering method reduces extract size and improves workbook performance?”
Correct Answer: Extract Filter
Even though not technically part of this module, it is useful to briefly introduce field roles to prepare for the next topic.
After data is connected:
This sets the stage for the next section, Exploring and Analyzing Data, where we work with pills, marks, and visual encodings.
| Exam Trap | Clarification | Strategy |
|---|---|---|
| Confusing Live and Extract capabilities | Only Extract supports Extract Filters and offline use | Look for keywords like "snapshot", "refresh", "offline" |
| Assuming Join is always better | Relationships preserve data integrity and are more flexible | If multiple tables are needed with different grains, use Relationships |
| Overusing Worksheet Filters for optimization | Data Source and Extract Filters are better for performance | Choose filters before the worksheet level |
| Thinking Union = Join | Union appends rows; Join merges columns | Ask: Do you need more rows or more columns? |
When should you use a relationship instead of a join in Tableau?
Use relationships when working with logical layer modeling and preserving table-level granularity; use joins when you need row-level combination of data.
Relationships keep tables separate and only combine them at query time, avoiding duplication issues. Joins physically merge tables, which can create duplicated rows if cardinality is not handled correctly. Relationships are preferred for flexibility and accuracy, especially in multi-fact models. A common mistake is using joins for unrelated tables, leading to inflated measures.
Demand Score: 85
Exam Relevance Score: 95
What is the key difference between a live connection and an extract?
A live connection queries the data source in real time, while an extract is a snapshot stored locally for faster performance.
Live connections depend on database performance and reflect real-time updates, making them suitable for dynamic data. Extracts improve speed by storing optimized data locally but require refreshes to stay updated. A common mistake is using live connections on slow databases, causing dashboard lag. Extracts are preferred for performance optimization.
Demand Score: 82
Exam Relevance Score: 90
What happens if you join two tables with duplicate keys?
It results in row duplication, potentially inflating measures.
When keys are not unique, Tableau creates a many-to-many join, multiplying rows across matching records. This leads to incorrect aggregations like inflated sums. A common mistake is not validating key uniqueness before joining. Using relationships instead can prevent this issue by aggregating separately before combining.
Demand Score: 80
Exam Relevance Score: 92
When should you use a union instead of a join?
Use a union when combining tables with the same structure vertically.
Unions append rows from multiple tables with identical columns, while joins combine columns based on keys. A common mistake is attempting to join tables that should be stacked, leading to missing or null values. Unions are ideal for combining datasets like monthly files.
Demand Score: 78
Exam Relevance Score: 88
Why is an extract often faster than a live connection?
Because extracts are optimized, compressed, and stored locally.
Extracts use Tableau’s Hyper engine, enabling faster queries compared to external databases. They reduce dependency on network latency and database load. A common mistake is assuming live connections are always better for freshness, ignoring performance trade-offs. Extracts balance speed and usability when real-time data is not required.
Demand Score: 81
Exam Relevance Score: 90