Amazon S3

Connect RootCause.ai to Amazon S3 to import files from your S3 buckets.


Prerequisites

Before connecting, ensure you have:

  • AWS Access Key ID and Secret Access Key

  • S3 bucket name

  • AWS region where the bucket is located

  • Read access to the bucket and objects you want to import


Setting Up the Connection

  1. Navigate to DataDatasets in your workspace

  2. Click New Connection and select S3

  3. Enter your connection details:

Field
Description
Example

Access Key ID

AWS access key

AKIAIOSFODNN7EXAMPLE

Secret Access Key

AWS secret key

wJalrXUtnFEMI/K7MDENG/...

Bucket Name

S3 bucket name

my-data-bucket

Region

AWS region

us-east-1

Endpoint URL (Optional)

Custom endpoint for S3-compatible storage

https://s3.us-east-1.amazonaws.com

  1. Click Test Connection to verify your credentials

  2. Click Save to create the connector


Importing Data

Once your connector is saved:

  1. Click Import Data on the connector

  2. Enter the Path to your files:

    • Specific file: /data/sales.parquet

    • Directory prefix: /data/2024/

    • Full S3 URI: s3://bucket/data/

  3. Enable Recursive to include files in subdirectories

  4. Select Formats to import (Parquet, CSV, JSON)

Supported File Formats:

Format
Description

Parquet

Columnar storage format, recommended for large datasets

CSV

Comma-separated values

JSON

JSON or newline-delimited JSON


What Happens When You Import

  1. RootCause.ai connects to your S3 bucket

  2. Files matching your path and format criteria are listed

  3. Files are downloaded and parsed

  4. Multiple files are combined into a single dataset

  5. Schema is automatically detected from the file content

  6. Data is stored in your workspace's data lake

The imported dataset becomes available for use in Data Views and Ontology mapping.


Live Data Sync

You can configure automatic syncing to keep your dataset current:

  • Manual – Only sync when you click "Sync Now"

  • Hourly – Refresh every hour

  • Daily – Refresh once per day

  • Weekly – Refresh once per week

When a sync runs, the S3 path is re-scanned and data is refreshed.


IAM Policy

Create an IAM policy with minimum required permissions:

Attach this policy to a dedicated IAM user for RootCause.ai.


S3-Compatible Storage

The S3 connector also works with S3-compatible storage services:

  • MinIO – Set Endpoint URL to your MinIO server

  • DigitalOcean Spaces – Use Spaces endpoint (e.g., https://nyc3.digitaloceanspaces.com)

  • Backblaze B2 – Use B2 S3-compatible endpoint

Set the Endpoint URL to your storage provider's S3-compatible endpoint.


Troubleshooting

Access Denied

  • Verify the IAM user has s3:GetObject and s3:ListBucket permissions

  • Check that the bucket policy doesn't explicitly deny access

  • Ensure the bucket name and region are correct

Bucket not found

  • Verify the bucket name is spelled correctly

  • Check the region matches where the bucket was created

No files found

  • Verify the path prefix is correct

  • Check that files exist in the specified location

  • Ensure the selected formats match your file types

Connection timeout

  • Verify network connectivity to AWS

  • Check that any firewall or VPC settings allow outbound HTTPS

  • Try a different AWS region endpoint

Last updated