# Selecting Data

RootCause.ai adapts rapidly to your data in its current format, without requiring lengthy data engineering work.

**Upload a file**, and the platform automatically detects column types, identifies patterns, and prepares your data for analysis.

**Connect a database**, and your data stays in sync without manual exports.

<figure><img src="https://1662811113-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FBXg3gZLR0e2Q8SzeQmql%2Fuploads%2Fx24EOzecYknLQrMpEGKb%2FScreenshot%202025-12-05%20at%204.00.07%E2%80%AFPM.png?alt=media&#x26;token=4c2ec060-7985-49ea-b930-f1cc69693d62" alt=""><figcaption></figcaption></figure>

If a database connector you need isn't supported, let us know as we develop new ones based on customer requirements. If you can't wait, you can optionally just export the data from its original source and upload it directly into the platform.

***

### Dataset Management

Once your data is in RootCause.ai, you can explore it, keep it fresh, and organize it for your team.

**Viewing a Dataset**

Click on any dataset to see its full details:

* Schema (columns and data types)
* Data preview (first rows)
* Statistics (row count, column distributions)
* Connection details (for connected sources)

(SCREENSHOT: Dataset detail view with schema panel, data preview table, and statistics sidebar)

**Refreshing Data**

For connected data sources, keeping data current is straightforward:

* **Sync Now** – Click to manually refresh the data immediately
* **Schedule Sync** – Set automatic refresh intervals (hourly, daily, weekly)

When a sync runs, RootCause.ai pulls fresh data from the source and updates all Data Views and analyses that depend on it.

(SCREENSHOT: Sync settings panel showing schedule options with last sync timestamp)

**Renaming and Organizing**

As your workspace grows, organization becomes important:

* Click the dataset name to rename it—use names that describe the contents, not the source
* Use folders to organize datasets by project, department, or topic
* Add descriptions to help team members understand what each dataset contains and where it came from

***

### Schema Detection

RootCause.ai automatically analyzes your data to detect column types. This matters because causal discovery algorithms treat numbers, categories, and dates differently.

| Detected Type | Description                                               |
| ------------- | --------------------------------------------------------- |
| Number        | Integers and decimals (revenue, counts, measurements)     |
| Text          | Strings and categorical values (names, IDs, labels)       |
| DateTime      | Dates and timestamps (order dates, event times)           |
| Boolean       | True/false values (flags, binary indicators)              |
| Category      | Columns with limited unique values (status, region, tier) |

**Adjusting Types**

Automatic detection is usually correct, but sometimes context matters. A column of ZIP codes might be detected as numbers when it should be categories. A date stored as text might need conversion.

If a column is detected incorrectly:

1. Open the dataset
2. Click on the column type
3. Select the correct type from the dropdown
4. Changes are applied to the dataset

(SCREENSHOT: Column type dropdown showing available type options)

***

### Best Practices

**File Size**

RootCause.ai handles large files well, but format matters:

* Files up to several GB can be uploaded directly
* For very large files, use Parquet format—it's compressed and faster to process
* For massive datasets (tens of GB), consider using cloud storage connectors (S3, Azure) which stream data more efficiently

**Data Quality**

Better data in means better insights out:

* Clean your data before uploading when possible—remove test records, fix obvious errors
* Use consistent date formats within columns
* Ensure column headers are meaningful names that describe the content

**Naming Conventions**

Future you (and your teammates) will thank you:

* Use descriptive dataset names: `Sales_Transactions_2024_Q1` beats `data_export_final_v2`
* Include date ranges or versions when relevant
* Avoid special characters that might cause issues

***

### Next Steps

Once your data is uploaded, you're ready to start preparing it for analysis:

1. Create a [Data View](https://docs.rootcause.ai/user-guide/data-management/data-views) to transform and combine your datasets
2. Tag columns with [Ontology Concepts](https://docs.rootcause.ai/user-guide/data-management/ontology-concepts) to link related data across sources
3. Build a [Digital Twin](https://docs.rootcause.ai/user-guide/digital-twin/creating-digital-twin) to discover causal relationships
