Reviewing Model Quality
Causal discovery finds relationships; evaluation tells you how trustworthy the model actually is. The Evaluation panel — switched on from the right-hand panel in the Digital Twin Management view — shows overall predictive accuracy, per-variable performance, and side-by-side version comparisons.
Read this before you trust simulation results. A model that predicts poorly will also simulate poorly.
For context, see Step 5: Build Digital Twin and Exploring the Causal Model.
Opening the Evaluation panel
The Overview panel's Evaluation section shows a single accuracy bar. Click the View Evaluation link on the right of that section to open the full view.

The graph stays visible on the left; the right panel switches its content to metrics. A link at the top returns you to Overview.

Predictive accuracy at a glance
The banner at the top is the summary verdict:
A single headline accuracy number (here, 72%).
How many nodes scored well, mid, and poor (3 above 80%, 12 between 50–80%, 1 below 50%).
A note that deterministic variables are excluded from the aggregate score.
A reminder that the variable you care most about may score very differently — check the per-node table.
If you recently removed redundant or derived columns, the headline number can drop because near-perfect-fit nodes are no longer in the average. That's not a degradation in quality; it's a change in what's being averaged.
Best predicted variables
Two bar charts highlight the model's strengths:
Best Predicted Categories — ranked by accuracy: the percentage of correct predictions. 100% means every prediction was right.
Best Predicted Numeric Variables — ranked by R²: how much of each variable's variation the model can explain. 1.0 is perfect; 0 is no better than guessing the mean; negative means worse than that.
Per-variable metrics
The table at the bottom of the panel shows every variable and its performance. Click a row to expand a per-class accuracy breakdown.
Boolean / Category
Accuracy, Precision, Recall, F1, Weighted Accuracy, AUC
Numeric
MSE, MAE, R², Log Likelihood
Numeric (time series)
MAPE per forecast horizon
Reading the classification metrics:
Accuracy — overall correctness.
Precision — when the model says "yes", how often is it right?
Recall — of all true "yes" cases, how many did the model find?
F1 — the harmonic mean of precision and recall.
AUC — discrimination ability. 1.0 is perfect; 0.5 is chance.
For rare-event variables, precision and recall are usually more informative than accuracy alone.
Comparing versions
The Version Comparison dropdown selects one or more versions to chart side by side. Useful for confirming a configuration change improved things — and for spotting versions that improved some metrics while quietly degrading others.
When a variable scores poorly
A low metric is information, not a verdict. Common causes:
Missing causes. The variable's true drivers aren't in the model.
Data quality. Noise, errors, or too many missing values.
Wrong model type. A time-dependent variable in a static twin.
Too little data. Not enough examples to learn the pattern.
If you can add the missing drivers, do so in the Config panel and retrain. If the variable is genuinely hard to predict, that's a constraint to remember for any simulation that touches it.
Other Working with a Digital Twin pages
Exploring the Causal Model — graph layouts and variable details.
Inspecting Causal Relationships — individual edges and their statistics.
Configuration — model settings, included variables, constraints.
Version History — multiple versions of the same twin.
See Digital Twin overview — general overview.
Last updated

