For the complete documentation index, see llms.txt. This page is also available as Markdown.

Reviewing Model Quality

Causal discovery finds relationships; evaluation tells you how trustworthy the model actually is. The Evaluation panel — switched on from the right-hand panel in the Digital Twin Management view — shows overall predictive accuracy, per-variable performance, and side-by-side version comparisons.

Read this before you trust simulation results. A model that predicts poorly will also simulate poorly.

For context, see Step 5: Build Digital Twin and Exploring the Causal Model.


Opening the Evaluation panel

The Overview panel's Evaluation section shows a single accuracy bar. Click the View Evaluation link on the right of that section to open the full view.

The Overview panel of a Digital Twin, with an arrow pointing at the link in the Evaluation section that opens the full Evaluation view
The View Evaluation link on the right of the Evaluation section opens the full panel.

The graph stays visible on the left; the right panel switches its content to metrics. A link at the top returns you to Overview.

The Evaluation panel for a 19-node Churn model, showing the predictive accuracy banner, two bar charts, and a per-node metrics table
The full Evaluation panel against the DAG view.

Predictive accuracy at a glance

The banner at the top is the summary verdict:

  • A single headline accuracy number (here, 72%).

  • How many nodes scored well, mid, and poor (3 above 80%, 12 between 50–80%, 1 below 50%).

  • A note that deterministic variables are excluded from the aggregate score.

  • A reminder that the variable you care most about may score very differently — check the per-node table.

If you recently removed redundant or derived columns, the headline number can drop because near-perfect-fit nodes are no longer in the average. That's not a degradation in quality; it's a change in what's being averaged.


Best predicted variables

Two bar charts highlight the model's strengths:

  • Best Predicted Categories — ranked by accuracy: the percentage of correct predictions. 100% means every prediction was right.

  • Best Predicted Numeric Variables — ranked by R²: how much of each variable's variation the model can explain. 1.0 is perfect; 0 is no better than guessing the mean; negative means worse than that.


Per-variable metrics

The table at the bottom of the panel shows every variable and its performance. Click a row to expand a per-class accuracy breakdown.

Variable type
Metrics shown

Boolean / Category

Accuracy, Precision, Recall, F1, Weighted Accuracy, AUC

Numeric

MSE, MAE, R², Log Likelihood

Numeric (time series)

MAPE per forecast horizon

Reading the classification metrics:

  • Accuracy — overall correctness.

  • Precision — when the model says "yes", how often is it right?

  • Recall — of all true "yes" cases, how many did the model find?

  • F1 — the harmonic mean of precision and recall.

  • AUC — discrimination ability. 1.0 is perfect; 0.5 is chance.

For rare-event variables, precision and recall are usually more informative than accuracy alone.


Comparing versions

The Version Comparison dropdown selects one or more versions to chart side by side. Useful for confirming a configuration change improved things — and for spotting versions that improved some metrics while quietly degrading others.


When a variable scores poorly

A low metric is information, not a verdict. Common causes:

  • Missing causes. The variable's true drivers aren't in the model.

  • Data quality. Noise, errors, or too many missing values.

  • Wrong model type. A time-dependent variable in a static twin.

  • Too little data. Not enough examples to learn the pattern.

If you can add the missing drivers, do so in the Config panel and retrain. If the variable is genuinely hard to predict, that's a constraint to remember for any simulation that touches it.


Other Working with a Digital Twin pages

See Digital Twin overview — general overview.

Last updated