Shopping cart

Subtotal:

$0.00

TDS-C01 Connecting to and Preparing Data

Connecting to and Preparing Data

Detailed list of TDS-C01 knowledge points

Connecting to and Preparing Data Detailed Explanation

This section provides a step-by-step overview of the “Connecting to and Preparing Data” knowledge area in Tableau Desktop. It focuses on connecting to data sources and clarifies key concepts, including Live Connections, Extracts, and supported data sources.

1. Connecting to Data Sources

The very first step in Tableau is to connect to a data source. Tableau is versatile and allows connections to multiple types of data sources, such as files, databases, cloud systems, and web-based data.

Types of Data Connections

When connecting to a data source in Tableau, you will come across two main types of connections:

1. Live Connection
  • Definition: A Live Connection is a direct link between Tableau and the original data source. Tableau queries the database or file every time you interact with the visualization.
  • Key Features:
    • Data updates in real time as the original source changes.
    • Ideal for small to medium-sized datasets or scenarios requiring up-to-the-minute accuracy.
    • Best suited for relational databases and dynamic data systems like MySQL or SQL Server.
  • Advantages:
    • Always uses the latest data.
    • No storage required on the local machine.
  • Disadvantages:
    • May slow performance for large datasets because Tableau queries the database repeatedly.
    • Requires a continuous connection to the data source.

Example: Connecting Tableau to an online database like SQL Server to track real-time sales.

2. Extract Connection
  • Definition: An Extract Connection takes a snapshot of the data and stores it in a file with a .hyper (newer format) or .tde (legacy format) extension. The extracted data is saved locally on your computer or server.
  • Key Features:
    • Data is static and does not update in real time. You must manually refresh the extract to pull new data.
    • Extract files improve performance, especially when working with large datasets.
    • Allows you to work offline without a connection to the original data source.
  • Advantages:
    • Faster performance since Tableau works on a local copy.
    • Supports offline analysis.
  • Disadvantages:
    • Data is not live, and you need to refresh the extract manually to include updated records.
    • Requires local storage space for the extracted file.

Example: Extracting monthly sales data from an Excel spreadsheet to improve performance and work offline.

Comparison: Live vs. Extract Connection
Feature Live Connection Extract Connection
Data Freshness Real-time updates Static (requires manual refresh)
Performance Slower for large datasets Faster (optimized for Tableau)
Offline Access Not available Available
Storage Requirement Minimal Requires storage space

Key Tip: Start with a Live Connection for small datasets or real-time needs. Use an Extract for better performance with large datasets or when working offline.

Supported Data Sources in Tableau

Tableau provides robust connectivity options to a variety of data sources. Let’s explore them in detail:

1. Files

Files are one of the most common data sources for Tableau beginners. Tableau supports:

  • Excel (.xlsx, .xls)
    • Common for small datasets and manual entry data.
  • Text Files (.csv, .txt)
    • Ideal for exported reports or large tabular data.
  • JSON Files
    • Used for semi-structured data like API responses.
  • PDF
    • Allows Tableau to extract tables directly from PDFs.

Example: Connecting Tableau to a quarterly sales report stored in an Excel file.

2. Databases

Tableau can connect to relational databases and retrieve data using SQL queries. Supported databases include:

  • MySQL: Open-source relational database.
  • SQL Server: Microsoft’s relational database management system.
  • PostgreSQL: Popular open-source database for structured data.
  • Oracle: Enterprise-grade database solution.

How It Works: You provide connection credentials (host, username, password), and Tableau queries the database to retrieve data.

Example: Analyzing a customer orders database stored in MySQL.

3. Cloud Platforms

Tableau connects seamlessly to cloud-based data sources, allowing analysis of data stored on platforms like:

  • Google Sheets: Collaboratively maintained spreadsheet data.
  • Google BigQuery: Google’s cloud-based database.
  • Amazon Redshift: Amazon’s cloud data warehouse.
  • Snowflake: Cloud-based data platform for big data analysis.

Example: Pulling data from Google Sheets for a shared sales team report.

4. Web Data Connectors (WDC)

Tableau can connect to web-based data sources using a Web Data Connector. A WDC allows you to fetch data from REST APIs or online systems.

  • Examples: Fetching social media data, public datasets, or web analytics.
  • Requires URL endpoint or a custom-built connector.

Example: Connecting Tableau to a REST API to fetch real-time weather data.

Steps to Connect to a Data Source

Here’s a step-by-step guide to connect Tableau to any data source:

  1. Launch Tableau Desktop: Open Tableau Desktop on your computer.
  2. Go to the "Connect" Pane:
    • Under To a File: Select options like Microsoft Excel, Text, or JSON.
    • Under To a Server: Choose databases or cloud connections like MySQL, Google Sheets, or Snowflake.
  3. Select or Enter Connection Details:
    • For files: Browse to the file location and select it.
    • For databases: Enter credentials like server name, username, and password.
  4. Load the Data:
    • Tableau opens the Data Source Page, where you can preview, filter, and prepare the data before visualizing it.
  5. Switch Between Live and Extract:
    • Choose Live for real-time updates or Extract for improved performance.

Data Source Page

The Data Source Page is the interface where you can prepare and organize your connected data. Key components include:

  • Connections: Shows connected data sources.
  • Tables: Lists available tables in the selected source.
  • Data Grid: A preview of the data, showing rows and columns.
  • Filters: Allows you to filter the data before loading it into Tableau.
  • Relationships/Joins/Unions: Combine data from multiple tables (covered later).

2. Preparing Data for Analysis

Once you connect to a data source in Tableau, the next step is to prepare and clean the data to ensure it is ready for analysis. This involves combining data, managing metadata, filtering at the connection level, and creating extracts.

1. Managing Data Connections

After connecting to a data source, you may need to manage or modify the connection. Tableau makes it simple to adjust these connections.

1.1 Edit Connection
  • What It Does: Modify an existing data connection (e.g., change the file path, credentials, or server details).
  • Steps:
    1. Go to the top menu > Data > Edit Connection.
    2. Modify the connection settings, such as credentials or file paths.
    3. Reload the data to reflect the changes.

Example: If the Excel file path changes, you can edit the connection to point to the new location.

1.2 Refresh Data
  • What It Does: Reloads the data from the original source to include updates.
  • Live Connection: Automatically reflects real-time updates.
  • Extract Connection: Requires manual refresh to pull the latest data.
  • Steps:
    1. Click the Refresh button in the toolbar (circular arrow icon).
    2. Alternatively, press F5 to reload the data.

2. Combining Data

In Tableau, you can combine data from multiple tables or sources. There are three primary methods: Relationships, Joins, and Unions.

2.1 Relationships (Logical Layer)
  • What Are Relationships?

    • Relationships are the default method in Tableau to combine tables.
    • They are defined at the logical layer, which means Tableau does not merge data physically but keeps it at the original grain.
  • Key Features:

    • Relationships are flexible: Tableau dynamically adjusts the level of detail based on the visualization.
    • Avoids data duplication issues seen with joins.
    • Suitable for multi-fact table analysis (e.g., Sales table + Returns table).
  • How to Create Relationships:

    1. Drag tables into the Logical Layer (Data Source Page).
    2. Define the relationship by selecting the matching fields between tables.

Example: Connecting a Sales Table with an Orders Table using the Order ID field.

2.2 Joins (Physical Layer)
  • What Are Joins?
    • Joins are used to combine two or more tables based on a common key field.
    • Joins occur at the physical layer and merge data into a single table.
Types of Joins
  1. Inner Join: Keeps only rows with matching values in both tables.

    • Example: Show orders with customer information when the Order ID matches in both tables.
  2. Left Join: Keeps all rows from the left table and matching rows from the right table.

    • Example: Show all customers, including those who may not have placed an order.
  3. Right Join: Keeps all rows from the right table and matching rows from the left table.

    • Example: Show all orders, even if there is no customer data.
  4. Full Outer Join: Keeps all rows from both tables, including unmatched rows.

    • Example: Show all customers and all orders, even if there is no match.
Steps to Create Joins
  1. Drag two tables onto the Physical Layer of the Data Source Page.
  2. Tableau automatically suggests an Inner Join.
  3. To change the join type:
    • Click the Join icon between the tables.
    • Select the desired join type (Inner, Left, Right, Full Outer).
  4. Define the matching fields (keys) for the join.
Visualizing Joins
Join Type Description Diagram
Inner Join Keeps only matching rows. Two overlapping circles (middle).
Left Join Keeps all rows from the left. Full left circle + overlap.
Right Join Keeps all rows from the right. Full right circle + overlap.
Full Outer Join Keeps all rows from both tables. Full circles (left and right).
2.3 Union
  • What Is a Union?

    • A union appends rows from one table to another.
    • Suitable when datasets have similar structures (i.e., same columns).
  • Example Use Case: Combining Q1, Q2, Q3, and Q4 sales data into a single table.

  • Steps to Perform Union:

    1. Drag the first table onto the Data Source Page.
    2. Drag the second table directly below the first table.
    3. Tableau automatically appends the rows.

3. Managing Metadata and Data Properties

Metadata refers to information about the structure of your data, such as field names, data types, and roles. Tableau allows you to manage metadata for better clarity and accuracy.

3.1 Rename Fields
  • Why Rename Fields?
    • Improve readability and ensure consistency in naming conventions.
  • Steps:
    1. Right-click the field name in the Data Pane.
    2. Select Rename.
    3. Enter the new name.

Example: Rename Cust_ID to Customer ID.

3.2 Assign Aliases
  • What Are Aliases?
    • Aliases allow you to change the display name of specific field values without altering the source data.
  • Steps:
    1. Right-click the field in the Data Pane.
    2. Select Aliases.
    3. Assign custom display names for values.

Example: Change TX to Texas and NY to New York for clarity.

3.3 Change Data Types
  • What It Does: Adjust the field type based on the content (e.g., convert a string to a date).
  • Steps:
    1. Click the data type icon next to the field name in the Data Pane.
    2. Select the correct data type:
      • String (Text)
      • Number (Integer or Float)
      • Date or DateTime
      • Boolean (True/False)

Example: Convert Order Date from a string format to a date field.

3.4 Assign Geographic Roles
  • What It Does: Converts fields into geographic roles for creating maps.
  • Examples: Country, State, City, ZIP Code, Latitude/Longitude.
  • Steps:
    1. Right-click a field in the Data Pane.
    2. Select Geographic Role > Choose the role (e.g., State).

Example: Assign State to the geographic role to create a map of sales by state.

4. Filtering Data at the Connection Level

When working with large datasets, it is often unnecessary to load all the data into Tableau. You can filter data at the connection level to improve performance and focus on relevant subsets of data. There are two main ways to filter data: Data Source Filters and Extract Filters.

4.1 Data Source Filters
What Are Data Source Filters?
  • Filters applied before data is loaded into Tableau.
  • Limits the rows or columns of data Tableau retrieves, making the workbook more efficient.
  • Benefit: Reduces the data size and improves workbook performance.
Steps to Apply Data Source Filters:
  1. Go to the Data Source Page.
  2. Click Add under the Filters section.
  3. Select the field you want to filter and click OK.
  4. Define the filter condition:
    • Range of Values (e.g., Sales > 1000).
    • Top N values.
    • Selected Categories (e.g., only specific regions or products).
Example:
  • Filter data to include only orders from the East and West regions.
  • Steps:
    • Select the Region field > Choose East and West.
4.2 Extract Filters
What Are Extract Filters?
  • Filters applied when creating a data extract (snapshot of the data).
  • Reduces the size of the extracted file by limiting the rows or columns included.
  • Helps optimize extract performance and focus on specific data.
Steps to Apply Extract Filters:
  1. Go to the Data Source Page.
  2. Switch the connection type to Extract (check the "Extract" box).
  3. Click Edit under Filters.
  4. Add fields and define filter conditions (similar to Data Source Filters).
  5. Click OK to save the filters.
Example:
  • Extract only orders from 2023 to create a smaller, faster file.
  • Steps:
    • Select the Order Date field.
    • Define the filter: Include orders from January 1, 2023, to December 31, 2023.
4.3 Differences Between Data Source Filters and Extract Filters
Feature Data Source Filters Extract Filters
When Applied Before data is loaded into Tableau When creating the extract file
Effect on Data Limits data during live connections Reduces the size of extracted data
Performance Improves performance during data load Improves performance of the extract
Connection Type Works with Live and Extract connections Only works with Extract connections

3. Creating Extracts and Saving Data Sources

3.1 Creating Extracts

An extract is a static snapshot of the data stored in a .hyper or .tde file. Extracts improve performance and allow offline access.

Steps to Create an Extract:
  1. Go to the Data Source Page.
  2. In the top-right corner, check the Extract option.
  3. Click on Edit to set extract options:
    • Filters: Include only specific rows or fields.
    • Aggregation: Aggregate data at a specific level (e.g., sum sales by region).
    • Row Limits: Load only a certain number of rows (useful for testing).
  4. Click OK and save the extract.
Extract File Types:
  • .hyper: Default format for Tableau extracts (faster and optimized).
  • .tde: Older format (used in older Tableau versions).

3.2 Extract Options

When creating an extract, Tableau allows you to customize how much data is included:

  1. Filter Rows: Apply extract filters to include only specific subsets of data.

    • Example: Only load orders from 2023.
  2. Aggregate Data: Combine rows of data by grouping fields and applying calculations (e.g., SUM, AVG).

    • Example: Aggregate sales by Region to reduce data size.
  3. Limit Rows: Load only a specific number of rows for quick testing.

    • Example: Load the first 1000 rows to test the connection.

3.3 Saving Data Sources

Once you prepare and connect to a data source, you can save the connection for reuse in other workbooks. Tableau offers two formats for saving data sources:

  1. Tableau Data Source (.tds)

    • Contains the connection information and metadata (e.g., field names, calculated fields).
    • Does not include the actual data.
    • Suitable for live connections.
  2. Tableau Packaged Data Source (.tdsx)

    • Contains the connection information, metadata, and the data extract.
    • Allows for offline sharing.
    • Suitable for extract connections.
Steps to Save a Data Source:
  1. Go to Data in the top menu.
  2. Select "Save As Data Source".
  3. Choose either .tds or .tdsx format.
  4. Save the file to your desired location.

Benefits of Extracts and Saved Data Sources

Benefit Explanation
Performance Extracts are faster for analysis, especially for large datasets.
Offline Access Extracts allow you to analyze data without a live connection.
Reusable Connections Saved data sources can be reused in multiple workbooks.
Optimized Data Size Filters and aggregations reduce the size of the extract.

4. Best Practices for Data Preparation

  1. Start with Clean Data: Ensure your source data is well-organized and free of errors.
  2. Use Extracts for Performance: Create extracts for large datasets to improve speed.
  3. Apply Filters Early: Use Data Source Filters or Extract Filters to limit unnecessary data.
  4. Simplify Connections: Avoid unnecessary joins or unions that may complicate data preparation.
  5. Save and Reuse Data Sources: Save prepared connections in .tds or .tdsx formats for consistency.

Final Summary of "Connecting to and Preparing Data"

This concludes the "Connecting to and Preparing Data" knowledge area. We have covered:

  1. Types of Data Connections: Live vs. Extract.
  2. Supported Data Sources: Files, Databases, Cloud Platforms, and Web Data.
  3. Managing Data Connections: Editing and Refreshing connections.
  4. Combining Data: Relationships, Joins, and Unions.
  5. Filtering at the Connection Level: Data Source Filters and Extract Filters.
  6. Creating and Saving Extracts: .hyper and .tdsx files for better performance.

Connecting to and Preparing Data (Additional Content)

1. Visual Enhancements for Better Understanding

1.1 Relationships vs Joins – Visual Breakdown

Understanding the difference between Relationships and Joins is critical not only for modeling data correctly, but also for answering TDS-C01 questions that ask about when to use each.

Aspect Relationships Joins
Layer Logical Layer Physical Layer
Data Merge Timing During query time At data source level
Output Multiple tables remain separate until query One flattened table
Performance More efficient for large datasets Can cause data duplication
Use Case Multi-fact data models (e.g. Sales + Returns) Simple row-level combinations

Tip: Think of Relationships as a dynamic linkage that respects the context of the visualization, while Joins are a static merge that happens up front.

1.2 Join Types – Use Venn Diagram Logic

For exam and real-world use, it's helpful to visualize the four main types of joins as overlapping circles (like Venn diagrams):

  • Inner Join: Only overlapping data from both tables
  • Left Join: All records from the left table + matching ones from the right
  • Right Join: All records from the right table + matching ones from the left
  • Full Outer Join: All records from both tables, matched and unmatched

This helps you quickly identify which join to use based on whether you want to preserve unmatched data.

1.3 Data Source Page – Key Layout Elements

When you first connect data, Tableau takes you to the Data Source Page, which includes:

  • Left pane: Lists all available connections and tables
  • Canvas (Top): Logical layer – where you define relationships
  • Canvas (Bottom): Physical layer – where joins and unions occur
  • Data Grid Preview: Preview of actual data rows and columns
  • Filter & Extract Controls: Where you define filters and switch between Live vs Extract

Understanding this layout helps during both exam navigation and real Tableau usage.

2. Exam-Focused Tips for TDS-C01

The TDS-C01 exam often tests your ability to distinguish between similar concepts and apply them to real-world scenarios. Here are common formats and how to prepare:

2.1 Live vs Extract – Test Triggers

You may be asked to:

  • Identify which is best for real-time dashboards (Live)
  • Choose Extract when working offline or optimizing large datasets
  • Decide which connection type is required to apply Extract Filters (only Extract)

Memory Tip:
Live = “Linked to source”
Extract = “Exported snapshot”

2.2 Union vs Join vs Relationship – When to Use Each

You may encounter questions like:

“You have four quarterly sales files with the same structure. How should you combine them?”

Correct Answer: Union

“You want to combine Orders and Returns but keep them logically separate in the view.”

Correct Answer: Relationship

“You need a single flat table of Orders and Customer Details.”

Correct Answer: Join

Understanding when to apply each method is key.

2.3 Filtering for Performance – Key Triggers

  • Use Extract Filters to reduce the size of extract files (only available with Extract connections)
  • Use Data Source Filters to limit data regardless of connection type (Live or Extract)
  • Avoid using too many Worksheet Filters in the dashboard as they increase memory usage

A common question may be:

“Which filtering method reduces extract size and improves workbook performance?”

Correct Answer: Extract Filter

3. Transitional Knowledge: Field Roles and Categorization

Even though not technically part of this module, it is useful to briefly introduce field roles to prepare for the next topic.

After data is connected:

  • Tableau automatically classifies fields into:
    • Dimensions (categorical data – blue pills)
    • Measures (quantitative data – green pills)
  • These roles influence how fields behave in visualizations, especially with aggregations and axes.

This sets the stage for the next section, Exploring and Analyzing Data, where we work with pills, marks, and visual encodings.

4. Common Exam Traps and Test Strategies

Exam Trap Clarification Strategy
Confusing Live and Extract capabilities Only Extract supports Extract Filters and offline use Look for keywords like "snapshot", "refresh", "offline"
Assuming Join is always better Relationships preserve data integrity and are more flexible If multiple tables are needed with different grains, use Relationships
Overusing Worksheet Filters for optimization Data Source and Extract Filters are better for performance Choose filters before the worksheet level
Thinking Union = Join Union appends rows; Join merges columns Ask: Do you need more rows or more columns?

Frequently Asked Questions

When should you use a relationship instead of a join in Tableau?

Answer:

Use relationships when working with logical layer modeling and preserving table-level granularity; use joins when you need row-level combination of data.

Explanation:

Relationships keep tables separate and only combine them at query time, avoiding duplication issues. Joins physically merge tables, which can create duplicated rows if cardinality is not handled correctly. Relationships are preferred for flexibility and accuracy, especially in multi-fact models. A common mistake is using joins for unrelated tables, leading to inflated measures.

Demand Score: 85

Exam Relevance Score: 95

What is the key difference between a live connection and an extract?

Answer:

A live connection queries the data source in real time, while an extract is a snapshot stored locally for faster performance.

Explanation:

Live connections depend on database performance and reflect real-time updates, making them suitable for dynamic data. Extracts improve speed by storing optimized data locally but require refreshes to stay updated. A common mistake is using live connections on slow databases, causing dashboard lag. Extracts are preferred for performance optimization.

Demand Score: 82

Exam Relevance Score: 90

What happens if you join two tables with duplicate keys?

Answer:

It results in row duplication, potentially inflating measures.

Explanation:

When keys are not unique, Tableau creates a many-to-many join, multiplying rows across matching records. This leads to incorrect aggregations like inflated sums. A common mistake is not validating key uniqueness before joining. Using relationships instead can prevent this issue by aggregating separately before combining.

Demand Score: 80

Exam Relevance Score: 92

When should you use a union instead of a join?

Answer:

Use a union when combining tables with the same structure vertically.

Explanation:

Unions append rows from multiple tables with identical columns, while joins combine columns based on keys. A common mistake is attempting to join tables that should be stacked, leading to missing or null values. Unions are ideal for combining datasets like monthly files.

Demand Score: 78

Exam Relevance Score: 88

Why is an extract often faster than a live connection?

Answer:

Because extracts are optimized, compressed, and stored locally.

Explanation:

Extracts use Tableau’s Hyper engine, enabling faster queries compared to external databases. They reduce dependency on network latency and database load. A common mistake is assuming live connections are always better for freshness, ignoring performance trade-offs. Extracts balance speed and usability when real-time data is not required.

Demand Score: 81

Exam Relevance Score: 90

TDS-C01 Training Course
$68$29.99
TDS-C01 Training Course