# Seven-Step Workflow

RootCause transforms your data into an explainable causal model through seven steps. Each step produces a specific artifact that feeds the next stage. The steps are sequential for your first project. Once you are familiar with the system, you may revisit earlier steps as your understanding evolves.

<figure><img src="/files/6sGNVyKjKphE9qEPmfWt" alt="The RootCause seven-step workflow: Connect Data, Build Ontology, Build 360 Table, Build Causal Graph, Build Digital Twin, Run Simulations, Produce Reports"><figcaption></figcaption></figure>

***

## 1. Connect Data

The first step is to get data into RootCause. The system supports direct file upload for one-off analysis and live data connectors for systems that change regularly.

**File uploads** support CSV, Parquet, JSON, and Excel. You can drag and drop files or use the file picker.

**Live connectors** keep data in sync with external systems. Currently supported connectors include [PostgreSQL](/data-connectors/postgresql.md), [MySQL](/data-connectors/mysql.md), [MongoDB](/data-connectors/mongodb.md), [Snowflake](/data-connectors/snowflake.md), [Google Analytics](/data-connectors/google-analytics.md), [REST API](/data-connectors/rest-api.md), [Amazon S3](/data-connectors/s3.md), and [Azure Data Lake](/data-connectors/azure-data-lake.md).

Once connected, data can be refreshed manually or on a schedule. After import, verify column types, preview rows, and check that the dataset name and description clearly reflect the business meaning.

**Why this matters:** Everything downstream depends on having the right data in the workspace. Garbage in, garbage out.

See: [Selecting Data](/more-details/data-management/uploading-datasets.md), [File Uploads](/more-details/data-management/uploading-datasets/file-uploads.md), [Data Connectors](/more-details/data-management/uploading-datasets/data-connectors.md)

***

## 2. Build Ontology

After the data is loaded, RootCause builds and maintains an ontology — the semantic map of your business. The ontology identifies when fields from different datasets represent the same underlying concept and classifies them in ways that matter for downstream analysis.

Concepts are classified as one of four types:

* **Identifier** — Fields used for linking records (e.g., Customer ID). Generally excluded from causal modeling.
* **Time** — Fields that tell the model to respect chronological ordering.
* **Location** — Fields that represent geographic or spatial dimensions.
* **Entity** — Ordinary business variables used in analysis (e.g., Revenue, Churn Status).

RootCause creates ontology concepts automatically when data is uploaded, inferring types and suggesting merges across similar columns. You should review the results and adjust where needed — merge concepts that refer to the same thing, split concepts that were incorrectly combined, and reclassify any that were tagged with the wrong type.

**Why this matters:** The ontology is the foundation of the platform. If it misunderstands what your data means, everything built on top of it will be wrong.

See: [Ontology Concepts](/user-guide/ontology-concepts.md), [Ontology (Core Technology)](/core-technologies/ontology.md)

***

## 3. Build 360 Table

Once the ontology exists, you create the analysis-ready dataset. In RootCause, this is called a **Data View**. A Data View is a reusable preparation layer between raw data and modeling — it lets you join sources, filter rows, aggregate data, create calculated fields, clean duplicates and missing values, standardize categories, and perform time-series transformations, all without modifying the original datasets.

For most Digital Twins, you should prepare a single, coherent analysis table that represents the domain you want to model. For example, combining customer demographics, transactions, and support tickets into one row per customer. That is why "Build 360 Table" is a better way to think about this step than just "create a Data View" — it tells you what you are producing, not just the product term.

{% hint style="warning" %}
**Important:** Applying operations does not finalize the Data View. You must click **Save** before leaving the editor.
{% endhint %}

**Why this matters:** Causal discovery works best with a single, coherent dataset that represents your analytical domain. The 360 Table is the input for the Digital Twin.

See: [Data Views](/user-guide/data-views.md)

***

## 4. Build Causal Graph

The next step is to discover and refine the causal graph. A causal graph, or **Directed Acyclic Graph (DAG)**, is a visual map of what drives what in the system. It uses arrows to show cause-and-effect relationships, and it has no feedback loops — causes flow in one direction.

RootCause builds this graph from your 360 Table, combining domain rules with statistical evidence. The system initially produces a **Partially Directed Acyclic Graph (PDAG)** — a graph where some causal directions are still uncertain. You then use the Relationships tab to refine the discovered structure: resolving ambiguous directions, removing spurious links, and adding known relationships that the algorithm may have missed.

This step is distinct from building the Digital Twin. The graph is the structural layer — the map of what causes what. The twin (next step) is the operational model that quantifies those relationships and makes them runnable.

**Why this matters:** The causal graph is the heart of the Digital Twin. If the structure is wrong, simulations will give misleading answers. Human review at this stage catches errors that no algorithm can.

See: [Causal Graph](/user-guide/causal-graph.md), [Relationships Tab](/more-details/digital-twin/tabs/relationships-tab.md), [Causal Discovery (Core Technology)](/core-technologies/causal-discovery.md)

***

## 5. Build Digital Twin

After the graph is discovered and reviewed, RootCause creates the Digital Twin — the causal model of your business or system and the execution layer of the platform. The twin takes the graph structure and fits equations, constraints, and optional confounder logic to create a model that can actually run simulations.

RootCause supports two twin types:

* **Static twins** for cross-sectional data (one snapshot in time).
* **Temporal twins** for time-series data with lags and changing effects over time.

**Advanced configuration** lets you refine the model before training:

* **Exogenous variables** — Factors determined outside the model that the business cannot directly control (e.g., gender, partner status, or variables defined by fundamental delivery mechanisms). Marking these correctly constrains the model to respect reality.
* **Known relationships** — Rules you already know about the system that you want the model to respect.
* **Variable ordering** — Constraints that tell the system some variables must logically come before others (e.g., a customer must have internet service before they can have streaming), narrowing the search space.

**What happens during training:** RootCause prepares the data, searches for plausible causal structure, tests for latent confounders (hidden common causes), fits equations to the discovered relationships, and packages the result into a Digital Twin ready for simulation.

**Why this matters:** The graph is the structure; the twin is the operational model. This is where cause-and-effect understanding becomes a tool for decision support.

See: [Creating a Digital Twin](/user-guide/creating-digital-twin.md), [Digital Twin (Core Technology)](/core-technologies/digital-twin-and-simulations.md)

***

## 6. Run Simulations

Once the twin exists, RootCause becomes actionable. Simulations are the payoff for all the prior work: connecting data, building ontology, preparing the 360 Table, and discovering causes.

The platform currently supports several simulation types:

| Simulation                | Question it answers                                                           |
| ------------------------- | ----------------------------------------------------------------------------- |
| **Intervention**          | "What happens if we do X?"                                                    |
| **Optimization**          | "What is the best action given goals and constraints?"                        |
| **Best Action**           | "What is the smallest change needed to hit a target?"                         |
| **Explanation**           | "Why did this outcome happen?" (works backward from outcome to likely causes) |
| **Prediction**            | "What is the likely outcome for this specific case?"                          |
| **Forecast**              | "What will happen over time?"                                                 |
| **Temporal Intervention** | "What happens if we make timed changes?" (for temporal twins)                 |

**Why this matters:** This is the main decision surface of the system. Everything before this step was preparation — simulations are where you get answers.

See: [Simulations](/user-guide/simulations.md)

***

## 7. Produce Reports

The last step is reporting. Reports are not just presentation — they are documentation of the analytical work. A report packages the model's conclusions into an auditable narrative that others can review, challenge, and build on.

Reports can include:

* Headings and narrative text
* Live widgets — charts, KPIs (headline metrics), and tables
* Evidence links and inline mentions of workspace objects (Data Views, Digital Twins, simulation runs)
* Code blocks for technical documentation

Widgets can draw from Data Views, Digital Twin results, or custom queries, and they update when underlying data changes. Reports can be exported to PDF or Markdown for sharing outside the platform.

**Why this matters:** Analysis that lives only in a simulation tab gets forgotten. Reports make findings shareable, auditable, and connected to the evidence that produced them.

See: [Reports](/user-guide/reports.md)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.rootcause.ai/start-here/workflow.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.