Ingest Data into MotherDuck via S3 with Hevo Pipelines

MotherDuck is a serverless data warehouse built on DuckDB that enables SQL analytics at scale and integrates seamlessly with cloud storage.

You can create a Hevo Pipeline to ingest data from any of the supported Sources and load it in near real-time to an S3 Destination. Hevo loads the ingested data as Parquet or JSONL files to your S3 bucket using the Append Rows on Update mode. Once the data is loaded, you can connect the S3 bucket to MotherDuck using a secret. After the connection is established, data from the S3 bucket can be replicated into a MotherDuck database for analytics and transformations, such as filtering records, performing aggregations, and joining tables.

Eliminate manual setup and seamlessly load data from 150+ sources into MotherDuck via S3 using Hevo’s no-code pipelines. Achieve real-time data replication, automated schema mapping, and error-free data delivery, without writing a single line of code.

Get Started with Hevo for Free

Table of Contents

Prerequisites

You are assigned the Team Administrator, Team Collaborator, or Pipeline Administrator role in Hevo to create a Pipeline.
An active Hevo Pipeline replicating data from your Source into an S3 bucket. If you do not have one, refer to the Creating a Pipeline with S3 Destination section to create one.
An active MotherDuck account with a database where data from the S3 bucket is to be loaded.
An IAM user in your AWS account to generate access credentials.
The access credentials required to create a secret in MotherDuck are available.

Perform the following steps to replicate data from your S3 bucket into a MotherDuck database:

Step 1: Creating a Pipeline with S3 Destination (Optional)

If you do not have an active Pipeline that replicates data from your Source to an S3 Destination, refer to the respective Source documentation along with the S3 Destination documentation to create the Pipeline. However, if you already have a Pipeline with S3 as the Destination, skip to the Creating a Database in MotherDuck section.

Step 2: Creating a Database in MotherDuck (Optional)

If you do not have a database in MotherDuck, refer to the MotherDuck documentation to create one for replicating data from your S3 bucket.

Step 3: Obtaining the Amazon S3 Access Credentials

If your S3 Destination in the Pipeline is configured using access credentials, skip to the Creating a Secret in MotherDuck section.
If your S3 Destination in the Pipeline is configured using IAM role-based credentials, you need to generate access credentials to connect your S3 bucket to MotherDuck. Refer to the S3 Destination documentation for the steps to do this. Use these credentials to create a secret in MotherDuck and connect to your S3 bucket.

Step 4: Creating a Secret in MotherDuck

To allow MotherDuck to access and replicate data from your S3 bucket, you need to create a secret that securely stores your S3 access credentials. This secret allows MotherDuck to authenticate with your S3 bucket without exposing sensitive information in queries.

You can create the secret using either of the following methods:

Using the MotherDuck User Interface (UI)
Using the SQL query editor

Method 1: Using the MotherDuck UI

Perform the following steps to create a secret using the MotherDuck UI:

Log in to your MotherDuck account.
In the top left corner of the page, click your profile name, and then click Settings.

3. In the left navigation pane, under INTEGRATIONS, click Secrets.

4. On the Secrets page, click + Add secret.

5. In the Add secret dialog box, do the following:

Click the Secret Type drop-down and select Amazon S3.
Specify the following:
- Secret Name: A unique name for your secret. For example, hevo_secret.
- Access Key ID: The publicly shareable unique identifier associated with the access key pair created for your IAM user. For example, AKIAIOSFODNN7EAAMMBB.
- Secret Access Key: The cryptographic key associated with the access key ID generated for your IAM user. For example, wJalrXUtnFEMI/K7MDENG/bPxRfiCYAABBCCDDEE.
- Region: The AWS region where your S3 bucket is located. For example, us-east-1.
Click Add secret.

The newly created secret is now listed on the Secrets page and appears as unscoped because the Scope field is optional.

Note: An unscoped secret does not limit its functionality to access the S3 bucket.

Method 2: Using the SQL query editor

Perform the following steps to create a secret from the SQL query editor:

Log in to your MotherDuck account.
On the MotherDuck home page, click + Add Cell to open a new SQL query editor.

3. Run the following command in the SQL query editor to create a secret:

CREATE SECRET <secret_name> (
  TYPE S3,
  KEY_ID '<access_key_ID_created_for_the_IAM_user>',
  SECRET '<secret_access_key_created_for_the_IAM_user>',
  REGION '<aws_region_where_the_S3_bucket_is_located>'
);

Note: Replace the placeholder values in the commands above with your own. For example, <aws_region_where_the_S3_bucket_is_located> with us-east-1.

Verifying the Secret

Run the following command in the SQL query editor to verify if the secret is configured correctly:

FROM duckdb_secrets();

This command lists all the secrets in your MotherDuck environment, confirming that your S3 secret was created successfully and is available for use.

Sample Output:

name	type	provider	persistent	storage	scope	secret_string
hevo_secret	s3	config	true	motherduck	[‘s3://’, ‘s3n://’, ‘s3a://’]	name=hevo_secret;type=s3;provider=config;serializable=true;scope=s3://,s3n://,s3a://;endpoint=s3.amazonaws.com;key_id=AKIAIOSFODNN7EAAMMBB;region=us-east-1;secret=redacted;session_token=redacted;url_compatibility_mode=0;url_style=;use_ssl=1

Step 5: Loading Data from S3 Bucket to a Database in MotherDuck

Once you have successfully created the secret, the connection between MotherDuck and your S3 bucket is established. To load data from your S3 bucket into the MotherDuck database, you need to manually create the corresponding tables in the database to store data in a structured format.

MotherDuck provides the following functions to load data from your S3 bucket into the database. The function to be used depends on the format of the files stored in the S3 bucket:

read_json_auto – Use this function to read data from JSONL files in the S3 bucket and load the data into your database.
read_parquet – Use this function to read data from Parquet files in the S3 bucket and load the data into your database.

Examples

The following examples demonstrate how to use the above-mentioned functions to read data from your S3 bucket and load it into your database.

Note: Replace the placeholder values in the commands below with your own. For example, <secret_name> with hevo_secret.

Reading JSONL files

Let us suppose you want to read all JSONL files from a specific S3 path and load the data into your database. The sample SQL query would be:

CREATE TABLE AS
SELECT *
FROM read_json_auto('s3:////*.json'(SECRET ));

Reading Parquet files

Let us suppose you want to read all Parquet files from a specific S3 path and load the data into your database. The sample SQL query would be:

CREATE TABLE <table_name> AS
SELECT *
FROM read_parquet('s3://<bucket_name>/<file_name>/*.parquet'(SECRET <secret_name>));

Skand Agrawal Customer Experience Engineer, Hevo Data

Skand is a dedicated Customer Experience Engineer at Hevo Data, specializing in MySQL, Postgres, and REST APIs. With three years of experience, he efficiently troubleshoots customer issues, contributes to the knowledge base and SOPs, and assists customers in achieving their use cases through Hevo's platform.

Ingest Data into MotherDuck via S3 Staging Using Hevo Pipelines