> For the complete documentation index, see [llms.txt](https://docs.rootcause.ai/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.rootcause.ai/more-details/digital-twin/model-quality.md).

# Reviewing Model Quality

Causal discovery finds relationships; evaluation tells you how trustworthy the model actually is. The **Evaluation** tab shows overall predictive accuracy, per-variable performance, and side-by-side version comparisons.

Read this before you trust simulation results. A model that predicts poorly will also simulate poorly.

For context, see [Step 5: Build Digital Twin](/user-guide/creating-digital-twin.md) and [Exploring the Causal Model](/more-details/digital-twin/exploring-causal-model.md).

***

## Opening the Evaluation tab

The twin's Home shows the headline accuracy figure on its **Model evaluation** card; open the card for the full view, which fills the page: the best-predicted charts and per-node metrics on the left, with the version-comparison configuration and a **Model health** panel on the right.

<figure><img src="/files/NvbZB3vUzE7jFZKb05uA" alt="The Model Evaluation page for a 19-variable, 31-relationship Churn model, with best-predicted charts, the version comparison selector, and a Model health panel reporting 69.6% predictive accuracy"><figcaption><p>The full Evaluation view. The Model health panel carries the headline verdict; the charts and table break it down per variable.</p></figcaption></figure>

***

## Predictive accuracy at a glance

The **Model health** panel is the summary verdict:

* A single headline accuracy number (here, 69.6%).
* How many nodes scored well, mid, and poor (4 above 80%, 9 between 50-80%, 1 below 50%, across 14 evaluatable nodes).
* A note that deterministic variables are excluded from the aggregate score.
* A reminder that the variable you care most about may score very differently — check the per-node table.

If you recently removed redundant or derived columns, the headline number can drop because near-perfect-fit nodes are no longer in the average. That's not a degradation in quality; it's a change in what's being averaged.

***

## Best predicted variables

Two bar charts highlight the model's strengths:

* **Best Predicted Categories** — ranked by accuracy: the percentage of correct predictions. 100% means every prediction was right.
* **Best Predicted Numeric Variables** — ranked by R²: how much of each variable's variation the model can explain. 1.0 is perfect; 0 is no better than guessing the mean; negative means worse than that.

***

## Per-variable metrics

The table at the bottom of the panel shows every variable and its performance. Click a row to expand a per-class accuracy breakdown.

| Variable type         | Metrics shown                                           |
| --------------------- | ------------------------------------------------------- |
| Boolean / Category    | Accuracy, Precision, Recall, F1, Weighted Accuracy, AUC |
| Numeric               | MSE, MAE, R², Log Likelihood                            |
| Numeric (time series) | MAPE per forecast horizon                               |

Reading the classification metrics:

* **Accuracy** — overall correctness.
* **Precision** — when the model says "yes", how often is it right?
* **Recall** — of all true "yes" cases, how many did the model find?
* **F1** — the harmonic mean of precision and recall.
* **AUC** — discrimination ability. 1.0 is perfect; 0.5 is chance.

For rare-event variables, precision and recall are usually more informative than accuracy alone.

***

## Comparing versions

The **Version Comparison** dropdown selects one or more versions to chart side by side. Useful for confirming a configuration change improved things — and for spotting versions that improved some metrics while quietly degrading others.

***

## When a variable scores poorly

A low metric is information, not a verdict. Common causes:

* **Missing causes.** The variable's true drivers aren't in the model.
* **Data quality.** Noise, errors, or too many missing values.
* **Wrong model type.** A time-dependent variable in a static twin.
* **Too little data.** Not enough examples to learn the pattern.

If the missing driver is already a column in the Data View, open the twin's Config tab, click **Modify model**, include the field, and retrain — see [Configuration for Existing Twin](/more-details/digital-twin/configuration.md). If it isn't in the Data View yet, add it there first (see [Build 360 Table](/user-guide/data-views.md)), then rebuild. If the variable is genuinely hard to predict, that's a constraint to remember for any simulation that touches it.

***

## Other Working with a Digital Twin pages

* [Exploring the Causal Model](/more-details/digital-twin/exploring-causal-model.md) — graph layouts and variable details.
* [Inspecting Causal Relationships](/more-details/digital-twin/causal-relationships.md) — individual edges and their statistics.
* [Configuration for Existing Twin](/more-details/digital-twin/configuration.md) — model settings, included variables, constraints.
* [Version History](/more-details/digital-twin/version-history.md) — multiple versions of the same twin.

See [Digital Twin overview](/more-details/digital-twin.md) — general overview.