Ontology Concepts

The ontology is your organization's semantic layer—a map of what your data actually means, independent of how it's stored. When "customer_id" in your sales database is the same concept as "cust_id" in your support system, the ontology makes that explicit.

This matters for causal discovery. When RootCause.ai builds a Digital Twin, it needs to understand that these two columns represent the same thing. Without an ontology, you'd be manually specifying joins every time. With one, the platform automatically knows how your datasets connect.

Think of it as the bridge between the physical reality of your data (column names, table structures, database schemas) and the business reality of what that data represents (customers, transactions, products, events).

(SCREENSHOT: Ontology concepts page showing network view with datasets and concepts interconnected)


Why Ontology Matters

Unified Data Understanding

Data rarely lives in one place. Customer information might span your CRM, billing system, support tickets, and web analytics. The ontology unifies these sources—one concept can map to fields across dozens of datasets.

Automatic Join Discovery

When you create a Data View or build a Digital Twin, RootCause.ai uses the ontology to automatically discover how datasets connect. Select a customer identifier concept, and the platform finds all datasets that share it.

Consistent Semantics

Is "revenue" gross or net? Does "created_date" mean when the record was created or when the event occurred? The ontology provides a single source of truth for what each concept means in your organization.

Causal Model Quality

Causal discovery algorithms work better when they understand the semantic structure of your data. Marking a column as a "time" concept tells the model to respect temporal ordering. Marking something as an "identifier" prevents it from being treated as a causal variable.


Concept Classifications

Concepts can be tagged with classifications that carry semantic meaning:

Identifier

Unique keys that link records across datasets. Customer ID, Product SKU, Transaction ID, Order Number. These serve as join keys and are typically excluded from causal analysis (you don't model "customer_id causes revenue"—the customer causes revenue).

Time

Temporal attributes that enable time-based analysis. Order Date, Timestamp, Created At, Event Time. Essential for temporal Digital Twins and time-series operations. Time concepts define the ordering of events.

Location

Geographic or spatial attributes. City, Region, Country, Postal Code, Latitude/Longitude. Enable location-based filtering and geographic analysis.

Entity (Default)

General business concepts without special classification. Revenue, Quantity, Status, Price, Rating. These are the variables typically involved in causal relationships.

(SCREENSHOT: Classification dropdown showing Identifier, Time, Location options)


View Modes

The ontology can be explored in three different views:

Network View

Visualizes relationships between concepts and datasets as an interactive graph. Concepts appear as nodes, with edges showing which datasets contain them. Datasets sharing concepts are visually connected.

This view is best for:

  • Understanding how datasets relate to each other

  • Finding concepts that bridge multiple data sources

  • Discovering the "shape" of your data landscape

Table View

A traditional list of concepts with sortable columns: name, classification, data type, number of sources, creation date. Supports filtering and search.

This view is best for:

  • Quickly finding a specific concept

  • Bulk operations

  • Detailed metadata inspection

Card View

Concepts organized by dataset or classification in a card layout. Each card shows the concept name, type, and connected sources.

This view is best for:

  • Browsing concepts by category

  • Visual exploration

  • Understanding what data each dataset contributes

(SCREENSHOT: Toggle between Network, Table, and Card views)


Network Anchor Modes

In Network View, you can change how the graph is organized using anchor modes:

Dataset Anchor

Groups concepts around their source datasets. Each dataset appears as a hub, with its concepts radiating outward. Concepts shared between datasets create bridges.

Use this when you want to understand what data each source contributes.

Concept Type Anchor

Groups concepts by their classification (Identifier, Time, Location, Entity). See all your identifier concepts in one cluster, all time concepts in another.

Use this when you want to understand the semantic structure of your ontology.

Identifier Anchor

Centers the view around identifier concepts. Shows how entities (customers, products, orders) connect to their attributes across datasets.

Use this when you want to understand the key entities in your data and what you know about them.

Time Anchor

Displays a timeline view showing when data is available for each concept. Useful for understanding temporal coverage and identifying gaps.

Use this when planning time-series analysis or checking data availability.

(SCREENSHOT: Network view with Dataset anchor mode showing datasets as hubs)


Concept Details

Click any concept to open the detail panel:

Basic Information

  • Name: The semantic name for this concept

  • Description: What this concept represents (editable)

  • Classification: Identifier, Time, Location, or Entity

  • Data Type: String, Number, DateTime, Boolean, Category

  • Schema Field Name: The canonical field name used in Data Views

Metadata

Additional properties depending on the concept type:

  • Categories: For categorical concepts, the list of valid values

  • Min/Max Values: For numeric concepts, the expected range

  • Is Unique: Whether values should be unique (useful for identifiers)

  • Is Monotonically Increasing/Decreasing: For time-ordered values

  • Location Type: For location concepts—country, city, postal code, coordinates

Data Sources

A table showing every dataset that contains this concept, with:

  • Dataset name

  • Field name in that dataset

  • Actions (Split)

Data Preview

Sample rows from each connected dataset, showing actual values for this concept.

(SCREENSHOT: Concept detail panel showing metadata, sources, and data preview)


Merge and Split Operations

Merging Concepts

When two concepts represent the same thing but were created separately, merge them:

  1. Open the concept you want to keep

  2. Click "Merge"

  3. Select the concept to merge into this one

  4. Confirm

All field mappings from the source concept transfer to the target. The source concept is deleted.

Example: You have "customer_id" from your sales data and "cust_identifier" from support tickets. After merging, one concept maps to both fields.

Splitting Concepts

When a concept incorrectly combines fields that mean different things, split them:

  1. Open the concept

  2. In the Data Sources table, find the mapping that should be separate

  3. Click "Split" on that row

  4. A new concept is created with just that mapping

Example: An "id" concept accidentally maps to both customer IDs and product IDs. Split the product mapping into its own concept.

(SCREENSHOT: Merge dialog showing concept selection and preview)


Data View Composer

The ontology includes a visual composer for creating Data Views:

  1. Click "New View" in the sidebar

  2. The graph enters composer mode

  3. Click concepts to add them as anchors

  4. The system automatically discovers all connected concepts transitively

  5. Configure join behavior for each anchor

  6. Preview the resulting Data View

  7. Save

Anchor Concepts

When you select a concept as an anchor, you're saying "I want data about this thing." The composer then finds all datasets containing that concept and all other concepts that can be joined through identifier relationships.

Join Configuration

For each anchor, you can configure:

  • Which fields to include

  • Join type (inner, left, right)

  • Aggregation if needed

(SCREENSHOT: Composer mode with selected anchor concepts and join configuration)


Automatic Concept Creation

RootCause.ai automatically creates ontology concepts when you upload data:

  1. Each column becomes a concept

  2. Data types are inferred from values

  3. Columns with similar names across datasets may be suggested for merging

  4. Common patterns (emails, dates, IDs) are detected and classified

You can always edit, merge, or reclassify concepts after automatic creation.


Best Practices

Name Concepts Clearly

Use business terminology, not technical column names. "Customer Lifetime Value" is better than "cust_ltv_amt".

Add Descriptions

Future you (and your colleagues) will thank you. Describe what the concept represents, not just what column it came from.

Classify Intentionally

Identifier, Time, and Location classifications carry semantic meaning that affects how RootCause.ai treats the data. Don't leave everything as Entity.

Merge Aggressively

If two concepts mean the same thing, merge them. More mappings per concept = better data integration.

Review After Uploads

Automatic concept creation is smart but not perfect. Always review new concepts after uploading data.


For deeper understanding of how ontologies enable causal discovery, see Ontology in Core Technologies.

Last updated