Azure Data Lake

Connect RootCause.ai to Azure Data Lake Storage Gen2 to import files from your storage containers.


Prerequisites

Before connecting, ensure you have:

  • Azure Storage account with Data Lake Storage Gen2 enabled

  • Container name within the storage account

  • Authentication credentials (see options below)

  • Read access to the container and blobs you want to import


Setting Up the Connection

  1. Navigate to DataDatasets in your workspace

  2. Click New Connection and select Azure Data Lake

  3. Enter your Container Name

  4. Select an authentication method (see below)

  5. Click Test Connection to verify your credentials

  6. Click Save to create the connector


Authentication Methods

Connection String (Simplest)

Copy the connection string from the Azure Portal:

  1. Go to your Storage Account → Access keys

  2. Copy the Connection string

Field
Description

Connection String

Full connection string from Azure Portal

Account Key

Use storage account name and key:

Field
Description
Example

Account Name

Storage account name

mystorageaccount

Account Key

Access key from Azure Portal

abc123...

Endpoint Suffix (Optional)

Custom endpoint

core.windows.net

Service Principal (RBAC) – Recommended for production

Use Azure AD service principal with role-based access:

Field
Description

Account Name

Storage account name

Tenant ID

Azure AD tenant ID (GUID)

Client ID

App registration client ID (GUID)

Client Secret

App registration client secret

Endpoint Suffix (Optional)

Custom endpoint

For Service Principal authentication, ensure the service principal has Storage Blob Data Reader or Storage Blob Data Contributor role assigned on the storage account in Azure Portal.


Importing Data

Once your connector is saved:

  1. Click Import Data on the connector

  2. Enter the Path to your files:

    • Specific file: data/sales.parquet

    • Directory: data/2024/

  3. Enable Recursive to include files in subdirectories

  4. Select Formats to import (Parquet, CSV, JSON)

Supported File Formats:

Format
Description

Parquet

Columnar storage format, recommended for large datasets

CSV

Comma-separated values

JSON

JSON or newline-delimited JSON


What Happens When You Import

  1. RootCause.ai connects to your Azure Storage container

  2. Files matching your path and format criteria are listed

  3. Files are downloaded and parsed

  4. Multiple files are combined into a single dataset

  5. Schema is automatically detected from the file content

  6. Data is stored in your workspace's data lake

The imported dataset becomes available for use in Data Views and Ontology mapping.


Live Data Sync

You can configure automatic syncing to keep your dataset current:

  • Manual – Only sync when you click "Sync Now"

  • Hourly – Refresh every hour

  • Daily – Refresh once per day

  • Weekly – Refresh once per week

When a sync runs, the container path is re-scanned and data is refreshed.


Setting Up Service Principal

To create a service principal for RootCause.ai:

  1. Create an App Registration in Azure AD:

    • Go to Azure Active Directory → App registrations → New registration

    • Name it (e.g., "RootCause.ai Integration")

    • Note the Application (client) ID and Directory (tenant) ID

  2. Create a Client Secret:

    • Go to Certificates & secrets → New client secret

    • Copy the Value immediately (it won't be shown again)

  3. Assign Storage Role:

    • Go to your Storage Account → Access control (IAM)

    • Add role assignment → Storage Blob Data Reader

    • Select your app registration as the member


Troubleshooting

Access Denied / AuthorizationPermissionMismatch

  • Verify authentication credentials are correct

  • For Service Principal: ensure the app has Storage Blob Data Reader role

  • Check that the container name is correct

Container not found

  • Verify the container name is spelled correctly

  • Ensure the storage account has Data Lake Gen2 enabled

No files found

  • Verify the path prefix is correct

  • Check that files exist in the specified location

  • Ensure the selected formats match your file types

Connection timeout

  • Verify network connectivity to Azure

  • Check any firewall or virtual network settings

  • Ensure the storage account allows access from your network

Last updated