# Amazon S3

Connect RootCause.ai to Amazon S3 to import files from your S3 buckets.

***

### Prerequisites

Before connecting, ensure you have:

* AWS Access Key ID and Secret Access Key
* S3 bucket name
* AWS region where the bucket is located
* Read access to the bucket and objects you want to import

***

### Setting Up the Connection

1. Navigate to **Data** → **Datasets** in your workspace
2. Click **New Connection** and select **S3**
3. Enter your connection details:

| Field                   | Description                               | Example                              |
| ----------------------- | ----------------------------------------- | ------------------------------------ |
| Access Key ID           | AWS access key                            | `AKIAIOSFODNN7EXAMPLE`               |
| Secret Access Key       | AWS secret key                            | `wJalrXUtnFEMI/K7MDENG/...`          |
| Bucket Name             | S3 bucket name                            | `my-data-bucket`                     |
| Region                  | AWS region                                | `us-east-1`                          |
| Endpoint URL (Optional) | Custom endpoint for S3-compatible storage | `https://s3.us-east-1.amazonaws.com` |

4. Click **Test Connection** to verify your credentials
5. Click **Save** to create the connector

***

### Importing Data

Once your connector is saved:

1. Click **Import Data** on the connector
2. Enter the **Path** to your files:
   * Specific file: `/data/sales.parquet`
   * Directory prefix: `/data/2024/`
   * Full S3 URI: `s3://bucket/data/`
3. Enable **Recursive** to include files in subdirectories
4. Select **Formats** to import (Parquet, CSV, JSON)

**Supported File Formats:**

| Format  | Description                                             |
| ------- | ------------------------------------------------------- |
| Parquet | Columnar storage format, recommended for large datasets |
| CSV     | Comma-separated values                                  |
| JSON    | JSON or newline-delimited JSON                          |

***

### What Happens When You Import

1. RootCause.ai connects to your S3 bucket
2. Files matching your path and format criteria are listed
3. Files are downloaded and parsed
4. Multiple files are combined into a single dataset
5. Schema is automatically detected from the file content
6. Data is stored in your workspace's data lake

The imported dataset becomes available for use in [Data Views](/user-guide/data-views.md) and [Ontology](/core-technologies/ontology.md) mapping.

***

### Live Data Sync

You can configure automatic syncing to keep your dataset current:

* **Manual** – Only sync when you click "Sync Now"
* **Hourly** – Refresh every hour
* **Daily** – Refresh once per day
* **Weekly** – Refresh once per week

When a sync runs, the S3 path is re-scanned and data is refreshed.

***

### IAM Policy

Create an IAM policy with minimum required permissions:

```json
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:ListBucket"
      ],
      "Resource": [
        "arn:aws:s3:::your-bucket-name",
        "arn:aws:s3:::your-bucket-name/*"
      ]
    }
  ]
}
```

Attach this policy to a dedicated IAM user for RootCause.ai.

***

### S3-Compatible Storage

The S3 connector also works with S3-compatible storage services:

* **MinIO** – Set Endpoint URL to your MinIO server
* **DigitalOcean Spaces** – Use Spaces endpoint (e.g., `https://nyc3.digitaloceanspaces.com`)
* **Backblaze B2** – Use B2 S3-compatible endpoint

Set the **Endpoint URL** to your storage provider's S3-compatible endpoint.

***

### Troubleshooting

**Access Denied**

* Verify the IAM user has `s3:GetObject` and `s3:ListBucket` permissions
* Check that the bucket policy doesn't explicitly deny access
* Ensure the bucket name and region are correct

**Bucket not found**

* Verify the bucket name is spelled correctly
* Check the region matches where the bucket was created

**No files found**

* Verify the path prefix is correct
* Check that files exist in the specified location
* Ensure the selected formats match your file types

**Connection timeout**

* Verify network connectivity to AWS
* Check that any firewall or VPC settings allow outbound HTTPS
* Try a different AWS region endpoint


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.rootcause.ai/data-connectors/s3.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
