Selecting Data

RootCause.ai adapts rapidly to your data in its current format, without requiring lengthy data engineering work.

Upload a file, and the platform automatically detects column types, identifies patterns, and prepares your data for analysis.

Connect a database, and your data stays in sync without manual exports.

If a database connector you need isn't supported, let us know as we develop new ones based on customer requirements. If you can't wait, you can optionally just export the data from its original source and upload it directly into the platform.


Dataset Management

Once your data is in RootCause.ai, you can explore it, keep it fresh, and organize it for your team.

Viewing a Dataset

Click on any dataset to see its full details:

  • Schema (columns and data types)

  • Data preview (first rows)

  • Statistics (row count, column distributions)

  • Connection details (for connected sources)

(SCREENSHOT: Dataset detail view with schema panel, data preview table, and statistics sidebar)

Refreshing Data

For connected data sources, keeping data current is straightforward:

  • Sync Now – Click to manually refresh the data immediately

  • Schedule Sync – Set automatic refresh intervals (hourly, daily, weekly)

When a sync runs, RootCause.ai pulls fresh data from the source and updates all Data Views and analyses that depend on it.

(SCREENSHOT: Sync settings panel showing schedule options with last sync timestamp)

Renaming and Organizing

As your workspace grows, organization becomes important:

  • Click the dataset name to rename it—use names that describe the contents, not the source

  • Use folders to organize datasets by project, department, or topic

  • Add descriptions to help team members understand what each dataset contains and where it came from


Schema Detection

RootCause.ai automatically analyzes your data to detect column types. This matters because causal discovery algorithms treat numbers, categories, and dates differently.

Detected Type
Description

Number

Integers and decimals (revenue, counts, measurements)

Text

Strings and categorical values (names, IDs, labels)

DateTime

Dates and timestamps (order dates, event times)

Boolean

True/false values (flags, binary indicators)

Category

Columns with limited unique values (status, region, tier)

Adjusting Types

Automatic detection is usually correct, but sometimes context matters. A column of ZIP codes might be detected as numbers when it should be categories. A date stored as text might need conversion.

If a column is detected incorrectly:

  1. Open the dataset

  2. Click on the column type

  3. Select the correct type from the dropdown

  4. Changes are applied to the dataset

(SCREENSHOT: Column type dropdown showing available type options)


Best Practices

File Size

RootCause.ai handles large files well, but format matters:

  • Files up to several GB can be uploaded directly

  • For very large files, use Parquet format—it's compressed and faster to process

  • For massive datasets (tens of GB), consider using cloud storage connectors (S3, Azure) which stream data more efficiently

Data Quality

Better data in means better insights out:

  • Clean your data before uploading when possible—remove test records, fix obvious errors

  • Use consistent date formats within columns

  • Ensure column headers are meaningful names that describe the content

Naming Conventions

Future you (and your teammates) will thank you:

  • Use descriptive dataset names: Sales_Transactions_2024_Q1 beats data_export_final_v2

  • Include date ranges or versions when relevant

  • Avoid special characters that might cause issues


Next Steps

Once your data is uploaded, you're ready to start preparing it for analysis:

  1. Create a Data View to transform and combine your datasets

  2. Tag columns with Ontology Concepts to link related data across sources

  3. Build a Digital Twin to discover causal relationships

Last updated