# Amazon S3

Connect RootCause.ai to Amazon S3 to import files from your S3 buckets.

***

### Prerequisites

Before connecting, ensure you have:

* AWS Access Key ID and Secret Access Key
* S3 bucket name
* AWS region where the bucket is located
* Read access to the bucket and objects you want to import

***

### Setting Up the Connection

1. Navigate to **Data** → **Datasets** in your workspace
2. Click **New Connection** and select **S3**
3. Enter your connection details:

| Field                   | Description                               | Example                              |
| ----------------------- | ----------------------------------------- | ------------------------------------ |
| Access Key ID           | AWS access key                            | `AKIAIOSFODNN7EXAMPLE`               |
| Secret Access Key       | AWS secret key                            | `wJalrXUtnFEMI/K7MDENG/...`          |
| Bucket Name             | S3 bucket name                            | `my-data-bucket`                     |
| Region                  | AWS region                                | `us-east-1`                          |
| Endpoint URL (Optional) | Custom endpoint for S3-compatible storage | `https://s3.us-east-1.amazonaws.com` |

4. Click **Test Connection** to verify your credentials
5. Click **Save** to create the connector

***

### Importing Data

Once your connector is saved:

1. Click **Import Data** on the connector
2. Enter the **Path** to your files:
   * Specific file: `/data/sales.parquet`
   * Directory prefix: `/data/2024/`
   * Full S3 URI: `s3://bucket/data/`
3. Enable **Recursive** to include files in subdirectories
4. Select **Formats** to import (Parquet, CSV, JSON)

**Supported File Formats:**

| Format  | Description                                             |
| ------- | ------------------------------------------------------- |
| Parquet | Columnar storage format, recommended for large datasets |
| CSV     | Comma-separated values                                  |
| JSON    | JSON or newline-delimited JSON                          |

***

### What Happens When You Import

1. RootCause.ai connects to your S3 bucket
2. Files matching your path and format criteria are listed
3. Files are downloaded and parsed
4. Multiple files are combined into a single dataset
5. Schema is automatically detected from the file content
6. Data is stored in your workspace's data lake

The imported dataset becomes available for use in [Data Views](https://docs.rootcause.ai/user-guide/data-management/data-views) and [Ontology](https://docs.rootcause.ai/core-technologies/ontology) mapping.

***

### Live Data Sync

You can configure automatic syncing to keep your dataset current:

* **Manual** – Only sync when you click "Sync Now"
* **Hourly** – Refresh every hour
* **Daily** – Refresh once per day
* **Weekly** – Refresh once per week

When a sync runs, the S3 path is re-scanned and data is refreshed.

***

### IAM Policy

Create an IAM policy with minimum required permissions:

```json
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:ListBucket"
      ],
      "Resource": [
        "arn:aws:s3:::your-bucket-name",
        "arn:aws:s3:::your-bucket-name/*"
      ]
    }
  ]
}
```

Attach this policy to a dedicated IAM user for RootCause.ai.

***

### S3-Compatible Storage

The S3 connector also works with S3-compatible storage services:

* **MinIO** – Set Endpoint URL to your MinIO server
* **DigitalOcean Spaces** – Use Spaces endpoint (e.g., `https://nyc3.digitaloceanspaces.com`)
* **Backblaze B2** – Use B2 S3-compatible endpoint

Set the **Endpoint URL** to your storage provider's S3-compatible endpoint.

***

### Troubleshooting

**Access Denied**

* Verify the IAM user has `s3:GetObject` and `s3:ListBucket` permissions
* Check that the bucket policy doesn't explicitly deny access
* Ensure the bucket name and region are correct

**Bucket not found**

* Verify the bucket name is spelled correctly
* Check the region matches where the bucket was created

**No files found**

* Verify the path prefix is correct
* Check that files exist in the specified location
* Ensure the selected formats match your file types

**Connection timeout**

* Verify network connectivity to AWS
* Check that any firewall or VPC settings allow outbound HTTPS
* Try a different AWS region endpoint
