Data Views
Real-world data rarely comes in the exact shape you need for analysis. Sales data lives in one system, customer attributes in another, and product information somewhere else. Even within a single dataset, you often need to filter, aggregate, or transform before the data is ready for causal discovery.
Data Views solve this problem. They're a flexible layer between your raw data and your analysis—letting you join tables, apply filters, create calculated fields, and reshape data without ever touching the originals. When your source data updates, your Data Views update too.
Think of Data Views as saved recipes for data preparation. Define the transformations once, and RootCause.ai applies them automatically whenever you need the data. This means your analyses stay consistent and reproducible, even as underlying data changes.
(SCREENSHOT: Data View editor showing a visual pipeline of operations connecting source datasets to output)
Why Data Views Matter
For Causal Discovery
Causal discovery works best with a single, coherent dataset that represents your analytical domain. If you're analyzing customer behavior, you want customer attributes, transactions, support interactions, and outcomes all joined together. Data Views let you build this unified view.
For Data Quality
Raw data often contains issues—duplicates, missing values, outliers. Data Views let you clean data consistently, so these issues don't propagate into your analyses.
For Reproducibility
When you build a Data View, you're documenting exactly how you prepared the data. Anyone on your team can see the transformations, understand the logic, and reproduce the results.
Creating a Data View
Navigate to Data → Data Views
Click New Data View
Add one or more Sources (datasets or other Data Views)
Apply Operations to transform the data
Preview the result to verify it looks correct
Save your Data View
(SCREENSHOT: New Data View button and empty editor state with "Add Source" prompt)
Adding Sources
A Data View starts with one or more sources. The source determines what data flows into your pipeline.
Dataset Source
The most common starting point. Select a dataset from your workspace and all its columns and rows become available for transformation.
Data View Source
Use another Data View as a source. This lets you chain transformations—build a "cleaned customers" view, then use it as a source for more specific analyses. It's like building with Lego blocks.
Ontology Concept Source
Pull data based on an Ontology Concept, automatically gathering that concept from all datasets where it appears. Useful for building unified views across multiple data sources.
(SCREENSHOT: Source selector panel showing Dataset, Data View, and Concept options)
Operations Pipeline
Operations transform your data step by step. Each operation takes the output of the previous step as its input, creating a pipeline.
(SCREENSHOT: Operations pipeline showing multiple connected steps with data flowing through them)
Data Transformation Operations:
SQL Query
Execute custom SQL queries on your data
Complex calculations, window functions
Filter
Filter rows based on conditions
Remove test accounts, focus on recent data
Aggregate
Group and aggregate data by columns
Sum revenue by customer, count events by day
Join
Combine data from two sources
Link orders to customer attributes
Concatenate
Stack data sources vertically
Combine monthly files into one dataset
Pivot
Transform rows into columns
Turn event types into separate columns
Unpivot
Transform columns into rows
Normalize wide-format survey data
Sort
Sort data by one or more columns
Order by date for time series analysis
Column Operations:
Split Column
Split a column into multiple columns (e.g., full name → first, last)
Drop Columns
Remove unwanted columns to simplify the dataset
Data Cleaning Operations:
Remove Duplicates
Remove duplicate rows
Merge Duplicates
Merge duplicate rows with a strategy (keep first, sum values, etc.)
Impute Nulls
Fill missing values with mean, median, or custom values
Drop Nulls
Remove rows with missing values in specified columns
Replace Values
Replace specific values (fix typos, standardize categories)
Replace Type
Change column data types
Clip Outliers
Remove statistical outliers that might skew analysis
Normalize Numeric
Scale numeric values to a standard range
Category Operations:
Merge Categories
Combine similar categories (e.g., "USA", "US", "United States" → "US")
Fill Missing Categories
Fill missing category values with a default
Map Categories
Map categories to new values (rename, recode)
Time Series Operations:
Interpolate Time
Fill gaps in time series data
Time Series Aggregate
Aggregate data by time periods (daily → weekly → monthly)
Common Workflows
Example: Building a Customer 360 View
You have customer data in three places: demographics in your CRM, transactions in your database, and support tickets in your helpdesk. Here's how to combine them:
Add all three datasets as sources
Join demographics to transactions on
customer_idAggregate support tickets to get
ticket_countper customerJoin the aggregated tickets to your customer data
Result: one row per customer with demographics, purchase history, and support metrics
(SCREENSHOT: Customer 360 Data View showing joined sources with preview of combined columns)
Filtering Data
Maybe you only want to analyze active customers from the last year:
Add a Filter operation
Define conditions:
status = 'active' AND last_purchase_date >= '2024-01-01'Combine conditions with AND/OR logic as needed
Aggregating Data
To analyze at a higher level (e.g., monthly revenue by region):
Add an Aggregate operation
Group by:
region,monthAggregate:
SUM(revenue),COUNT(orders)
Custom SQL
For complex transformations, use the SQL Query operation. You get full SQL power:
(SCREENSHOT: SQL Query operation editor with syntax highlighting and preview results)
What Happens in the Background
Understanding the execution model helps you build efficient Data Views:
Plan Building – RootCause.ai analyzes your operations and builds an execution plan
Optimization – Operations are optimized for performance (reordering filters, pushing down predicates)
Execution – The pipeline runs against your source data
Schema Computation – Output columns and types are determined
Preview Generation – A sample is generated so you can verify results
Caching – Results are cached for quick access
When source data changes (e.g., a dataset sync), Data Views automatically reflect the new data—no manual refresh needed.
Using Data Views
For Causal Analysis
Data Views are the input for Digital Twins. When you create a Digital Twin, you select a Data View that contains all the variables you want to analyze. Go to Intelligence → Digital Twin and select your prepared Data View.
For Reports
Reference Data Views in your reports to include live, transformed data. Charts and tables pull from the Data View, so they update when the underlying data changes.
For Export
Sometimes you need to take your transformed data elsewhere. Export a Data View as CSV, Parquet, or JSON for use in other tools, sharing with stakeholders, or backup.
(SCREENSHOT: Export dialog showing format options and download button)
Best Practices
Keep It Simple
Start with fewer operations and add complexity as needed. A Data View with 20 operations is hard to debug when something goes wrong. If you need complex logic, consider breaking it into multiple Data Views that chain together.
Name Meaningfully
Future you will thank present you. Names like Customer_360_Monthly_Active tell you what's inside. Names like Data View 1 tell you nothing.
Document Purpose
Add descriptions explaining what the Data View contains and why. "Customer attributes joined with last 12 months of transactions, filtered to active customers only" is much more useful than a blank description.
Test Incrementally
Preview results after each operation. It's much easier to fix an issue when you know exactly which step introduced it.
Next Steps
With your data prepared in a Data View, you're ready to add meaning:
Tag columns with Ontology Concepts to link related data across your workspace
Create a Digital Twin to discover the causal relationships hidden in your data
Last updated

