Data Views

Real-world data rarely comes in the exact shape you need for analysis. Sales data lives in one system, customer attributes in another, and product information somewhere else. Even within a single dataset, you often need to filter, aggregate, or transform before the data is ready for causal discovery.

Data Views solve this problem. They're a flexible layer between your raw data and your analysis—letting you join tables, apply filters, create calculated fields, and reshape data without ever touching the originals. When your source data updates, your Data Views update too.

Think of Data Views as saved recipes for data preparation. Define the transformations once, and RootCause.ai applies them automatically whenever you need the data. This means your analyses stay consistent and reproducible, even as underlying data changes.

(SCREENSHOT: Data View editor showing a visual pipeline of operations connecting source datasets to output)


Why Data Views Matter

For Causal Discovery

Causal discovery works best with a single, coherent dataset that represents your analytical domain. If you're analyzing customer behavior, you want customer attributes, transactions, support interactions, and outcomes all joined together. Data Views let you build this unified view.

For Data Quality

Raw data often contains issues—duplicates, missing values, outliers. Data Views let you clean data consistently, so these issues don't propagate into your analyses.

For Reproducibility

When you build a Data View, you're documenting exactly how you prepared the data. Anyone on your team can see the transformations, understand the logic, and reproduce the results.


Creating a Data View

  1. Navigate to DataData Views

  2. Click New Data View

  3. Add one or more Sources (datasets or other Data Views)

  4. Apply Operations to transform the data

  5. Preview the result to verify it looks correct

  6. Save your Data View

(SCREENSHOT: New Data View button and empty editor state with "Add Source" prompt)


Adding Sources

A Data View starts with one or more sources. The source determines what data flows into your pipeline.

Dataset Source

The most common starting point. Select a dataset from your workspace and all its columns and rows become available for transformation.

Data View Source

Use another Data View as a source. This lets you chain transformations—build a "cleaned customers" view, then use it as a source for more specific analyses. It's like building with Lego blocks.

Ontology Concept Source

Pull data based on an Ontology Concept, automatically gathering that concept from all datasets where it appears. Useful for building unified views across multiple data sources.

(SCREENSHOT: Source selector panel showing Dataset, Data View, and Concept options)


Operations Pipeline

Operations transform your data step by step. Each operation takes the output of the previous step as its input, creating a pipeline.

(SCREENSHOT: Operations pipeline showing multiple connected steps with data flowing through them)

Data Transformation Operations:

Operation
Description
Example Use Case

SQL Query

Execute custom SQL queries on your data

Complex calculations, window functions

Filter

Filter rows based on conditions

Remove test accounts, focus on recent data

Aggregate

Group and aggregate data by columns

Sum revenue by customer, count events by day

Join

Combine data from two sources

Link orders to customer attributes

Concatenate

Stack data sources vertically

Combine monthly files into one dataset

Pivot

Transform rows into columns

Turn event types into separate columns

Unpivot

Transform columns into rows

Normalize wide-format survey data

Sort

Sort data by one or more columns

Order by date for time series analysis

Column Operations:

Operation
Description

Split Column

Split a column into multiple columns (e.g., full name → first, last)

Drop Columns

Remove unwanted columns to simplify the dataset

Data Cleaning Operations:

Operation
Description

Remove Duplicates

Remove duplicate rows

Merge Duplicates

Merge duplicate rows with a strategy (keep first, sum values, etc.)

Impute Nulls

Fill missing values with mean, median, or custom values

Drop Nulls

Remove rows with missing values in specified columns

Replace Values

Replace specific values (fix typos, standardize categories)

Replace Type

Change column data types

Clip Outliers

Remove statistical outliers that might skew analysis

Normalize Numeric

Scale numeric values to a standard range

Category Operations:

Operation
Description

Merge Categories

Combine similar categories (e.g., "USA", "US", "United States" → "US")

Fill Missing Categories

Fill missing category values with a default

Map Categories

Map categories to new values (rename, recode)

Time Series Operations:

Operation
Description

Interpolate Time

Fill gaps in time series data

Time Series Aggregate

Aggregate data by time periods (daily → weekly → monthly)


Common Workflows

Example: Building a Customer 360 View

You have customer data in three places: demographics in your CRM, transactions in your database, and support tickets in your helpdesk. Here's how to combine them:

  1. Add all three datasets as sources

  2. Join demographics to transactions on customer_id

  3. Aggregate support tickets to get ticket_count per customer

  4. Join the aggregated tickets to your customer data

  5. Result: one row per customer with demographics, purchase history, and support metrics

(SCREENSHOT: Customer 360 Data View showing joined sources with preview of combined columns)

Filtering Data

Maybe you only want to analyze active customers from the last year:

  1. Add a Filter operation

  2. Define conditions: status = 'active' AND last_purchase_date >= '2024-01-01'

  3. Combine conditions with AND/OR logic as needed

Aggregating Data

To analyze at a higher level (e.g., monthly revenue by region):

  1. Add an Aggregate operation

  2. Group by: region, month

  3. Aggregate: SUM(revenue), COUNT(orders)

Custom SQL

For complex transformations, use the SQL Query operation. You get full SQL power:

(SCREENSHOT: SQL Query operation editor with syntax highlighting and preview results)


What Happens in the Background

Understanding the execution model helps you build efficient Data Views:

  1. Plan Building – RootCause.ai analyzes your operations and builds an execution plan

  2. Optimization – Operations are optimized for performance (reordering filters, pushing down predicates)

  3. Execution – The pipeline runs against your source data

  4. Schema Computation – Output columns and types are determined

  5. Preview Generation – A sample is generated so you can verify results

  6. Caching – Results are cached for quick access

When source data changes (e.g., a dataset sync), Data Views automatically reflect the new data—no manual refresh needed.


Using Data Views

For Causal Analysis

Data Views are the input for Digital Twins. When you create a Digital Twin, you select a Data View that contains all the variables you want to analyze. Go to IntelligenceDigital Twin and select your prepared Data View.

For Reports

Reference Data Views in your reports to include live, transformed data. Charts and tables pull from the Data View, so they update when the underlying data changes.

For Export

Sometimes you need to take your transformed data elsewhere. Export a Data View as CSV, Parquet, or JSON for use in other tools, sharing with stakeholders, or backup.

(SCREENSHOT: Export dialog showing format options and download button)


Best Practices

Keep It Simple

Start with fewer operations and add complexity as needed. A Data View with 20 operations is hard to debug when something goes wrong. If you need complex logic, consider breaking it into multiple Data Views that chain together.

Name Meaningfully

Future you will thank present you. Names like Customer_360_Monthly_Active tell you what's inside. Names like Data View 1 tell you nothing.

Document Purpose

Add descriptions explaining what the Data View contains and why. "Customer attributes joined with last 12 months of transactions, filtered to active customers only" is much more useful than a blank description.

Test Incrementally

Preview results after each operation. It's much easier to fix an issue when you know exactly which step introduced it.


Next Steps

With your data prepared in a Data View, you're ready to add meaning:

  • Tag columns with Ontology Concepts to link related data across your workspace

  • Create a Digital Twin to discover the causal relationships hidden in your data

Last updated