Azure Data Lake
Connect RootCause.ai to Azure Data Lake Storage Gen2 to import files from your storage containers.
Prerequisites
Before connecting, ensure you have:
Azure Storage account with Data Lake Storage Gen2 enabled
Container name within the storage account
Authentication credentials (see options below)
Read access to the container and blobs you want to import
Setting Up the Connection
Navigate to Data → Datasets in your workspace
Click New Connection and select Azure Data Lake
Enter your Container Name
Select an authentication method (see below)
Click Test Connection to verify your credentials
Click Save to create the connector
Authentication Methods
Connection String (Simplest)
Copy the connection string from the Azure Portal:
Go to your Storage Account → Access keys
Copy the Connection string
Connection String
Full connection string from Azure Portal
Account Key
Use storage account name and key:
Account Name
Storage account name
mystorageaccount
Account Key
Access key from Azure Portal
abc123...
Endpoint Suffix (Optional)
Custom endpoint
core.windows.net
Service Principal (RBAC) – Recommended for production
Use Azure AD service principal with role-based access:
Account Name
Storage account name
Tenant ID
Azure AD tenant ID (GUID)
Client ID
App registration client ID (GUID)
Client Secret
App registration client secret
Endpoint Suffix (Optional)
Custom endpoint
For Service Principal authentication, ensure the service principal has Storage Blob Data Reader or Storage Blob Data Contributor role assigned on the storage account in Azure Portal.
Importing Data
Once your connector is saved:
Click Import Data on the connector
Enter the Path to your files:
Specific file:
data/sales.parquetDirectory:
data/2024/
Enable Recursive to include files in subdirectories
Select Formats to import (Parquet, CSV, JSON)
Supported File Formats:
Parquet
Columnar storage format, recommended for large datasets
CSV
Comma-separated values
JSON
JSON or newline-delimited JSON
What Happens When You Import
RootCause.ai connects to your Azure Storage container
Files matching your path and format criteria are listed
Files are downloaded and parsed
Multiple files are combined into a single dataset
Schema is automatically detected from the file content
Data is stored in your workspace's data lake
The imported dataset becomes available for use in Data Views and Ontology mapping.
Live Data Sync
You can configure automatic syncing to keep your dataset current:
Manual – Only sync when you click "Sync Now"
Hourly – Refresh every hour
Daily – Refresh once per day
Weekly – Refresh once per week
When a sync runs, the container path is re-scanned and data is refreshed.
Setting Up Service Principal
To create a service principal for RootCause.ai:
Create an App Registration in Azure AD:
Go to Azure Active Directory → App registrations → New registration
Name it (e.g., "RootCause.ai Integration")
Note the Application (client) ID and Directory (tenant) ID
Create a Client Secret:
Go to Certificates & secrets → New client secret
Copy the Value immediately (it won't be shown again)
Assign Storage Role:
Go to your Storage Account → Access control (IAM)
Add role assignment → Storage Blob Data Reader
Select your app registration as the member
Troubleshooting
Access Denied / AuthorizationPermissionMismatch
Verify authentication credentials are correct
For Service Principal: ensure the app has Storage Blob Data Reader role
Check that the container name is correct
Container not found
Verify the container name is spelled correctly
Ensure the storage account has Data Lake Gen2 enabled
No files found
Verify the path prefix is correct
Check that files exist in the specified location
Ensure the selected formats match your file types
Connection timeout
Verify network connectivity to Azure
Check any firewall or virtual network settings
Ensure the storage account allows access from your network
Last updated

