Kinesis Stream to S3: A Comprehensive Guide

on Data Integration, ETL, Tutorials • February 1st, 2022 • Write for Hevo

Kinesis Stream to S3

Do you want to stream your data to Amazon S3? Are you finding it challenging to load your data into your Amazon S3 buckets? If yes, then you’ve landed at the right place! This article will answer all your queries & relieve you of the stress of finding a truly efficient solution. Follow our easy step-by-step guide to help you master the skill of seamlessly setting up the Kinesis Stream to S3 to bring in your data from a source of your choice in real-time!

It will help you take charge in a hassle-free way without compromising efficiency. This article aims at making the data streaming process as smooth as possible.

Upon a complete walkthrough of the content, you will be able to seamlessly transfer your data to Amazon S3 for a fruitful analysis in real-time. It will further help you build a customized ETL pipeline for your organization. Through this article, you will get a deep understanding of the tools & techniques, and thus, it will help you hone your skills further.

Table of Contents

Introduction to Amazon S3

Kinesis Stream to S3- Amazon S3 Logo.
Image Source: dashsdk.com/

Amazon S3 is one of the most popular and robust object-based storage services that allow users to store large volumes of data of various types such as blogs, application files, codes, documents, etc. with ease. It stands for Amazon Simple Storage Service and houses the support for ensuring high data availability and durability of 99.999999999%.

It houses robust integration support, allowing users to integrate it with numerous ETL tools to manage their data needs with ease. Users can further leverage the Amazon S3 console or the CLI to add, modify, view and manipulate data in their Amazon S3 buckets seamlessly. It houses the support for various programming languages such as Python, Java, Scala, etc. and numerous APIs, thereby allowing users to manage, backup and version their data securely.

For further information on Amazon S3, you can check the official website here.

Introduction to Amazon Kinesis

Kinesis Stream to S3- Amazon Kinesis Logo.
Image Source: towardsdatascience.com

Amazon Kinesis is a fully-managed service provided by Amazon that allows users to process data from a diverse set of sources and stream it to various destinations in real-time. It houses the support for streaming data to the following storage destinations:

Amazon Kinesis encapsulates the following set of entities that help stream data in real-time:

  • Data Producers: The producers are responsible for generating and transferring data to Amazon Kinesis seamlessly. For example, mobile applications, a system producing logs files, clickstreams, etc.
  • Records: It represents the data that the Amazon Kinesis Firehose delivery system receives from the data producer.
  • Buffer Size and Stream: These consists of the various configuration-based settings that help boost performance and optimise the data delivery process.

For further information on Amazon Kinesis, you can check the official website here.

Download the Guide on Data Streaming
Download the Guide on Data Streaming
Download the Guide on Data Streaming
Learn how you can enable real-time analytics with a Modern Data Stack

Understanding Streaming Data

Streaming data refers to the data an application or a source generates in real-time. The volume of this data depends solely upon the data source that helps produce it. It thus varies largely; for example, it can be as high as 20000+ records per second or as low as one record per second. A wide variety of tools available in the market, such as Amazon Kinesis, Apache Kafka, Apache Spark, etc. help capture and stream this data in real-time.

Some examples of streaming data are as follows:

  • Web clicks/stream generated by a web application.
  • Log files generated by an application.
  • Stock market data.
  • IoT device data (sensors, performance monitors etc.).

Simplify Data Streaming with Hevo’s No-code Data Pipelines

Hevo Data, a No-code Data Pipeline, helps you stream data from 100+ sources to Amazon S3 & lets you visualize it in a BI tool. Hevo is fully-managed and completely automates the process of not only loading data from your desired source but also enriching the data and transforming it into an analysis-ready form without having to write a single line of code. Its fault-tolerant architecture ensures that the data is handled in a secure, consistent manner with zero data loss.

It provides a consistent & reliable solution to manage data in real-time and always have analysis-ready data in your desired destination. It allows you to focus on key business needs and perform insightful analysis using various BI tools such as Power BI, Tableau, etc. 

Get Started with Hevo for Free

Check out what makes Hevo amazing:

  • Secure: Hevo has a fault-tolerant architecture that ensures that the data is handled in a secure, consistent manner with zero data loss.
  • Schema Management: Hevo takes away the tedious task of schema management & automatically detects schema of incoming data and maps it to the destination schema.
  • Minimal Learning: Hevo, with its simple and interactive UI, is extremely simple for new customers to work on and perform operations.
  • Hevo Is Built To Scale: As the number of sources and the volume of your data grows, Hevo scales horizontally, handling millions of records per minute with very little latency.
  • Incremental Data Load: Hevo allows the transfer of data that has been modified in real-time. This ensures efficient utilization of bandwidth on both ends.
  • Live Support: The Hevo team is available round the clock to extend exceptional support to its customers through chat, email, and support calls.
  • Live Monitoring: Hevo allows you to monitor the data flow and check where your data is at a particular point in time.
Sign up here for a 14-Day Free Trial!

Prerequisites

  • Working knowledge of Amazon S3.
  • Working knowledge of Amazon Kinesis.
  • An Amazon S3 account.

Steps to Set Up the Kinesis Stream to S3

Amazon Kinesis allows users to leverage its Firehose functionality to start streaming data to Amazon S3 from a data producer of their choice in real-time.

Kinesis Stream to S3- Streaming Data to S3 using Firehose.
Image Source: towardsdatascience.com

You can set up the Kinesis Stream to S3 to start streaming your data to Amazon S3 buckets using the following steps:

Step 1: Signing in to the AWS Console for Amazon Kinesis

To start setting up the Kinesis Stream to S3, you first need to log in to the AWS console for AWS Kinesis. To do this, go to the official website of the AWS console and login using your credentials such as username and password.

Once you’ve logged in, the following page will now open up on your screen in case you’re using AWS Kinesis for the very first time. Click on the get started option to begin the setup process.

Kinesis Stream to S3- AWS Kinesis Starting Page.
Image Source: towardsdatascience.com

Once you’ve clicked on it, you will now be able to see four different options on your screen to create an AWS Kinesis stream, where you need to click and select the “Deliver Streaming Data with Kinesis Firehose Delivery Streams” option.

Kinesis Stream to S3- Selecting the Delivery stream option.
Image Source: towardsdatascience.com

Step 2: Configuring the Delivery Stream

With the desired option in place, you can start configuring the data delivery stream by first providing a unique name for it and then selecting the “Direct PUT or Source” as data source option.

Kinesis Stream to S3- Configuring the Delivery Stream.
Image Source: towardsdatascience.com

This is how you can configure the delivery stream in AWS console for AWS Kinesis.

Step 3: Transforming Records using a Lambda Function

With your data source and delivery stream now configured, you now need to transform the incoming data records. To do this, Amazon Kinesis allows users to leverage its lambda function to transform their data in their desired format and then deliver it to a destination of their choice such as Amazon S3.

While transforming the records, ensure that it contains the following set of parameters:

  • RecordID: It acts as the unique identifier that AWS Kinesis passes to the lambda function during execution.
  • Result: It represents the status of the newly transformed data.
  • Data: It contains the transformed data.

To start transforming your data, you first need to select the “Enabled” option found under the transform source records section.

Kinesis Stream to S3- Enabling the Record Transformations.
Image Source: towardsdatascience.com

Once you’ve enabled it, you can now either make use of any existing lambda function, present in the list of numerous lambda blueprints or make use of any custom lambda function by writing it. Here, you need to select the “General Firehose Processing” as your lambda blueprint.

Kinesis Stream to S3- Selecting the Lambda Blueprint.
Image Source: towardsdatascience.com

Once you’ve selected it, you now need to provide a unique name for your lambda function and choose the IAM role that will help access both AWS S3 and Firehose with necessary privileges.

Kinesis Stream to S3- Naming the lambda function and assigning the IAM role.
Image Source: towardsdatascience.com

Now, click on the view policy option button, click on the edit option & add the following lines of code to the policy and then edit the AWS-region, AWS-account-id, stream-name as follows:

{
        	"Effect": "Allow",
        	"Action": [
            	"firehose:PutRecordBatch"
        	],
        	"Resource": [
            	"arn:aws:firehose:your-region:your-aws-account-id:deliverystream/your-stream-name"
        	]
}

You will now be able to see the streaming data in the following format:

{"TICKER_SYMBOL":"ABC","SECTOR":"AUTOMOBILE","CHANGE":-0.15,"PRICE":44.89}

You now need to modify this code by performing a simple transformation that will help ignore the change attribute while streaming data. To do this, copy the following lines of code and paste it in the lambda function:

use strict';
console.log('Loading function');
exports.handler = (event, context, callback) => {
   /* Process the list of records and transform them */
   const output = event.records.map((record) => {
       console.log(record.recordId);
       const payload =JSON.parse((Buffer.from(record.data, 'base64').toString()))
       const resultPayLoad = {
               ticker_symbol : payload.ticker_symbol,
               sector : payload.sector,
               price : payload.price,
           };
          
       return{
           recordId: record.recordId,
           result: 'Ok',
           data: (Buffer.from(JSON.stringify(resultPayLoad))).toString('base64'),
       };
   });
   console.log(`Processing completed.  Successful records ${output.length}.`);
   callback(null, { records: output });
};

Once you’ve made the necessary changes and saved it, you will now be able to see a new lambda function in the delivery stream page.

New Lambda Function.
Image Source: towardsdatascience.com

This is how you can transform your records using the lambda function and further set up the Kinesis Stream to S3.

Step 4: Configuring Amazon S3 Destination to the Enable Kinesis Stream to S3

With your lambda function in place, you now need to select the desired destination to save your data records, choosing between Amazon S3, Redshift, Elasticsearch and Splunk. Here you need to select Amazon S3 as the desired destination.

Kinesis Stream to S3- Selecting the Destination in Kinesis.
Image Source: towardsdatascience.com

Once you’ve selected Amazon S3 as your destination, you now need to provide the name of your desired Amazon S3 bucket. In case you want to create a backup of your data, you can click on the enabled button and give the desired path.

Kinesis Stream to S3- Configuring the S3 bucket.
Image Source: towardsdatascience.com

With your destination now selected, the configurations page will now open up on your screen that will help you configure buffer size, buffer interval, encryption, S3 compression, and error logging. Here, you need to select the default values for all these and then choose an IAM role that allows AWS Kinesis to access your Amazon S3 buckets.

Kinesis Stream to S3- Reviewing Kinesis Configurations.
Image Source: towardsdatascience.com

With your delivery stream now set up, you can now test the stream by opening a test with the demo data node.

Kinesis Stream to S3- Testing Delivery stream.
Image Source: towardsdatascience.com

Once you’ve opened the data node, click on the start button to start transferring data to your kinesis delivery stream. In case you want to stop the process, you can click on the stop button. You can now verify the data streaming process by opening a file in your Amazon S3 bucket and checking if the streaming data contains the change attribute.

Kinesis Stream to S3- Streaming Data to S3.
Image Source: towardsdatascience.com

This is how you can stream your data in real-time by setting up the Kinesis Stream to S3.

Conclusion

This article teaches you how to set up the Kinesis Stream to S3 with ease. It provides in-depth knowledge about the concepts behind every step to help you understand and implement them efficiently. These methods, however, can be challenging especially for a beginner & this is where Hevo saves the day.

Visit our Website to Explore Hevo

 Hevo Data, a No-code Data Pipeline, helps you transfer data from a source of your choice in a fully-automated and secure manner without having to write the code repeatedly. Hevo, with its strong integration with 100+ sources & BI tools, allows you to not only export & load data but also transform & enrich your data & make it analysis-ready in a jiff.

Want to take Hevo for a spin? Sign Up for a 14-day free trial and experience the feature-rich Hevo suite first hand. You can also have a look at the unbeatable pricing that will help you choose the right plan for your business needs.

Tell us about your experience of setting up the Kinesis Stream to S3! Share your thoughts in the comments section below!

No-code Data Pipeline For Amazon S3