# Azure Data Lake

Connect RootCause.ai to Azure Data Lake Storage Gen2 to import files from your storage containers.

***

### Prerequisites

Before connecting, ensure you have:

* Azure Storage account with Data Lake Storage Gen2 enabled
* Container name within the storage account
* Authentication credentials (see options below)
* Read access to the container and blobs you want to import

***

### Setting Up the Connection

1. Navigate to **Data** → **Datasets** in your workspace
2. Click **New Connection** and select **Azure Data Lake**
3. Enter your **Container Name**
4. Select an authentication method (see below)
5. Click **Test Connection** to verify your credentials
6. Click **Save** to create the connector

***

### Authentication Methods

**Connection String** (Simplest)

Copy the connection string from the Azure Portal:

1. Go to your Storage Account → **Access keys**
2. Copy the **Connection string**

| Field             | Description                              |
| ----------------- | ---------------------------------------- |
| Connection String | Full connection string from Azure Portal |

**Account Key**

Use storage account name and key:

| Field                      | Description                  | Example            |
| -------------------------- | ---------------------------- | ------------------ |
| Account Name               | Storage account name         | `mystorageaccount` |
| Account Key                | Access key from Azure Portal | `abc123...`        |
| Endpoint Suffix (Optional) | Custom endpoint              | `core.windows.net` |

**Service Principal (RBAC)** – Recommended for production

Use Azure AD service principal with role-based access:

| Field                      | Description                       |
| -------------------------- | --------------------------------- |
| Account Name               | Storage account name              |
| Tenant ID                  | Azure AD tenant ID (GUID)         |
| Client ID                  | App registration client ID (GUID) |
| Client Secret              | App registration client secret    |
| Endpoint Suffix (Optional) | Custom endpoint                   |

For Service Principal authentication, ensure the service principal has **Storage Blob Data Reader** or **Storage Blob Data Contributor** role assigned on the storage account in Azure Portal.

***

### Importing Data

Once your connector is saved:

1. Click **Import Data** on the connector
2. Enter the **Path** to your files:
   * Specific file: `data/sales.parquet`
   * Directory: `data/2024/`
3. Enable **Recursive** to include files in subdirectories
4. Select **Formats** to import (Parquet, CSV, JSON)

**Supported File Formats:**

| Format  | Description                                             |
| ------- | ------------------------------------------------------- |
| Parquet | Columnar storage format, recommended for large datasets |
| CSV     | Comma-separated values                                  |
| JSON    | JSON or newline-delimited JSON                          |

***

### What Happens When You Import

1. RootCause.ai connects to your Azure Storage container
2. Files matching your path and format criteria are listed
3. Files are downloaded and parsed
4. Multiple files are combined into a single dataset
5. Schema is automatically detected from the file content
6. Data is stored in your workspace's data lake

The imported dataset becomes available for use in [Data Views](https://docs.rootcause.ai/user-guide/data-management/data-views) and [Ontology](https://docs.rootcause.ai/core-technologies/ontology) mapping.

***

### Live Data Sync

You can configure automatic syncing to keep your dataset current:

* **Manual** – Only sync when you click "Sync Now"
* **Hourly** – Refresh every hour
* **Daily** – Refresh once per day
* **Weekly** – Refresh once per week

When a sync runs, the container path is re-scanned and data is refreshed.

***

### Setting Up Service Principal

To create a service principal for RootCause.ai:

1. **Create an App Registration** in Azure AD:
   * Go to Azure Active Directory → App registrations → New registration
   * Name it (e.g., "RootCause.ai Integration")
   * Note the **Application (client) ID** and **Directory (tenant) ID**
2. **Create a Client Secret**:
   * Go to Certificates & secrets → New client secret
   * Copy the **Value** immediately (it won't be shown again)
3. **Assign Storage Role**:
   * Go to your Storage Account → Access control (IAM)
   * Add role assignment → **Storage Blob Data Reader**
   * Select your app registration as the member

***

### Troubleshooting

**Access Denied / AuthorizationPermissionMismatch**

* Verify authentication credentials are correct
* For Service Principal: ensure the app has Storage Blob Data Reader role
* Check that the container name is correct

**Container not found**

* Verify the container name is spelled correctly
* Ensure the storage account has Data Lake Gen2 enabled

**No files found**

* Verify the path prefix is correct
* Check that files exist in the specified location
* Ensure the selected formats match your file types

**Connection timeout**

* Verify network connectivity to Azure
* Check any firewall or virtual network settings
* Ensure the storage account allows access from your network
