Creating a Digital Twin

A Digital Twin is where data becomes actionable. It's a causal model of your business that understands not just what happened, but why—and what would happen if you made different choices.

Traditional predictive models learn patterns from historical data. They can forecast what's likely to happen if conditions stay the same. But they can't reliably predict what happens when you intervene—when you change prices, launch a campaign, or restructure operations. That's because they confuse correlation with causation.

A Digital Twin is different. It learns the causal structure of your data: which variables actually influence which others, and how. This means you can ask "what if" questions and get meaningful answers. What if we raised prices 10%? What if we doubled marketing spend? What's the most efficient path to reduce churn?

This guide walks you through creating your first Digital Twin—from selecting data to running causal discovery.

For the technical background, see Causal Discovery and Digital Twin & Simulations.

(SCREENSHOT: Digital Twin overview page showing the causal graph visualization)


When to Use a Digital Twin

Digital Twins are valuable whenever you need to make decisions based on causal understanding:

Strategic Planning "What drives customer lifetime value? Which levers have the biggest impact?"

Marketing Optimization "How do different channels actually affect conversions? What's the true ROI of each campaign?"

Operational Improvement "What's causing production delays? Which process changes would have the most impact?"

Risk Management "What factors lead to churn? How can we identify at-risk customers before it's too late?"

If you're asking "what if" or "what causes", a Digital Twin can help.


Before You Start

Ensure you have:

  1. A Data View prepared with your analysis data—this should contain all the variables you want to analyze

  2. Clean data with meaningful column names—causal discovery works better when variables are clearly defined

  3. At least a few hundred rows for reliable causal discovery—more data means more statistical power

The quality of your Digital Twin depends on the quality of your data. Garbage in, garbage out still applies.


Starting from a Data View

  1. Navigate to IntelligenceDigital Twin

  2. Click New Digital Twin

  3. Select the Data View you want to analyze from the dropdown

  4. The available fields from your Data View are displayed

(SCREENSHOT: New Digital Twin dialog with Data View selector and field list)

You're choosing which slice of reality you want to model. A Data View containing customer transactions, attributes, and outcomes will produce a model of customer behavior. A Data View with production metrics will model your manufacturing process.


Choosing a Twin Type

RootCause.ai offers two types of Digital Twins, each suited to different analytical scenarios:

Static Twin

Best for data without significant time dependencies. Static twins discover relationships between variables at a point in time, treating each row as an independent observation.

Use static twins when:

  • Your data represents snapshots (customer attributes, survey responses, cross-sectional studies)

  • Time is not a meaningful factor in your analysis

  • You want to understand what drives outcomes without temporal dynamics

  • You have data aggregated to remove time variation (yearly totals, average metrics)

Example: Analyzing what customer attributes predict churn likelihood.

(SCREENSHOT: Twin type selector with Static option highlighted)

Temporal Twin

Best for time-series data where relationships evolve over time. Temporal twins discover how variables influence each other across time periods, capturing lags, trends, and dynamic effects.

Use temporal twins when:

  • Your data has timestamps or time periods as a key dimension

  • You want to forecast future values based on causal relationships

  • You're analyzing trends, seasonality, or lagged effects

  • Understanding "when" matters as much as "what"

Example: Analyzing how marketing spend in Q1 affects sales in Q2 and Q3.

Temporal twins require a DateTime field marked as an index in your Data View. If no suitable time field exists, you'll only see the static option.

(SCREENSHOT: Twin type selector with Temporal option highlighted, showing DateTime field requirement)


Selecting Fields

Not every column in your Data View should be included in causal discovery. Review the fields and decide what to include:

Include fields that are:

  • Potential causes (inputs, decisions, actions)

  • Potential effects (outcomes, KPIs, results)

  • Potential confounders (variables that might influence multiple others)

  • Relevant to your analytical questions

Exclude fields that are:

  • Unique identifiers (customer_id, order_id)—they don't have causal meaning

  • Metadata (created_at, updated_by)—unless actually relevant

  • Redundant (highly correlated with other included fields)

  • Irrelevant to your analysis

Click the toggle next to each field to include or exclude it.

(SCREENSHOT: Field selection panel with toggles, showing some fields included and others excluded)

Tips:

  • When in doubt, include it. RootCause.ai's algorithms handle spurious variables well, but missing important variables can bias results.

  • Fewer fields = faster discovery. Start focused, then expand if needed.

  • Think about your questions. What are you trying to predict? What might cause it?


Initialization Options

Once you've selected fields, you have two paths forward:

Auto Discovery (Recommended for most users)

Let RootCause.ai automatically discover causal relationships. This is the "just make it work" option.

  1. Click Auto Discovery

  2. Causal discovery begins immediately

  3. A progress indicator shows the current stage

  4. Wait for completion (minutes to hours depending on data size and complexity)

  5. Review the discovered causal graph

Auto Discovery applies ontology rules, temporal constraints, and statistical tests to identify the most likely causal structure. It handles the complexity so you don't have to.

(SCREENSHOT: Auto Discovery in progress with progress indicator showing stages)

Manual Configuration (For domain experts)

If you have strong prior knowledge about which relationships should or shouldn't exist, start with manual configuration:

  1. Click Manual Configuration

  2. Define Known relationships (edges that must exist based on domain knowledge)

  3. Define Blocked relationships (edges that definitely shouldn't exist)

  4. Run discovery with your constraints incorporated

Manual configuration is powerful when you know things the algorithm can't learn from data alone. For example, you might know that "marketing spend" can influence "brand awareness" but not vice versa, even if statistical tests are ambiguous.

(SCREENSHOT: Manual configuration panel with Known and Blocked relationship editors)


What Happens During Discovery

Causal discovery is computationally intensive. Here's what happens behind the scenes:

  1. Data Preparation – Variables are standardized, missing values handled, and data formatted for analysis

  2. Constraint Application – Ontology rules kick in. Time flows forward (so effects can't precede causes). Entity relationships are respected. Location hierarchies are honored.

  3. Independence Testing – Statistical tests (conditional independence tests) identify potential causal relationships. Variables that are independent given others probably aren't directly causally related.

  4. Graph Search – Optimization algorithms search through possible causal structures to find the one that best explains your data. This is where combinatorial explosion would normally be a problem—RootCause.ai uses specialized techniques to keep this tractable.

  5. Validation – The resulting graph is validated. Uncertain edges are flagged for human review. Model quality metrics are computed.

Discovery time varies from minutes (small datasets, few variables) to hours (large datasets, many variables). The progress indicator keeps you informed.

(SCREENSHOT: Completed discovery with success message and "View Results" button)


Versioning

Digital Twins support versioning, which is essential for iterative development:

  • New Version on Each Run – Every time you run discovery (with new data or settings), a new version is created

  • History Preserved – Previous versions remain accessible for comparison

  • Easy Switching – Toggle between versions to see how the model evolved

This is useful for:

  • Tracking how your model improves as you refine it

  • Comparing results when you add new data or change settings

  • Rolling back if a new version performs worse

  • Documenting the evolution of your understanding

(SCREENSHOT: Version selector dropdown showing multiple versions with timestamps)


Reviewing Results

Once discovery completes, you'll land on the Digital Twin detail page with several tabs:

  1. Home Tab – Overview of the Digital Twin, current version, key statistics

  2. Config Tab – View and modify discovery settings, see included fields

  3. Relationships Tab – Full list of discovered causal relationships with strength indicators

  4. Path Analysis – Sankey diagrams showing causal paths and contribution weights

  5. Evaluation – Model performance metrics (accuracy, fit statistics)

  6. Simulate – Run simulations against the model

Take time to explore the Causal Graph before diving into simulations. Understanding the discovered structure helps you ask better questions.

(SCREENSHOT: Digital Twin detail page showing tabs and causal graph visualization)


Best Practices

Start Simple

For your first Digital Twin, pick a focused question and a manageable dataset. "What drives churn among our premium customers?" is better than "model everything about our business." Once you're comfortable, expand.

Leverage Domain Knowledge

Algorithms are powerful but not omniscient. If you know that A causes B (from experiments, physics, or business logic), tell the system. Use Manual Configuration to encode what you know.

Review Uncertain Edges

RootCause.ai flags edges where the evidence is ambiguous. These are opportunities for human judgment. Share the graph with domain experts—they often recognize which uncertain edges make sense and which don't.

Iterate

Causal discovery is rarely one-and-done. The first model reveals insights that prompt new questions. Add variables, refine constraints, re-run discovery. Each iteration deepens your understanding.

Document Your Reasoning

When you add Known or Blocked relationships, note why. Future you (or your colleagues) will appreciate understanding the reasoning behind modeling decisions.


Next Steps

Your Digital Twin is ready. Now it's time to use it:

  • Understand the Causal Graph visualization—learn to read and interpret the discovered relationships

  • Run Simulations to test interventions and optimize decisions

Last updated