This article is going to describe the process of setting up Amazon Kinesis S3 Integration; where Amazon Kinesis is a service that can stream data to Amazon S3 from numerous data sources, stores, tools, etc. Amazon S3 is a service offered by Amazon Web Services that provides object storage through a web interface.
Initially, the article provides an overview of the two services while briefly exploring a few of their benefits that make them handy for your workflow. Next, it delves deeper into implementing a solution to set up Amazon Kinesis S3 Integration, sourcing data from the AWS Management Console.
The article then wraps up with the challenges involved with Amazon Kinesis S3 Integration, and viable alternatives to building an efficient data pipeline solution to send the data to Amazon S3, Amazon Redshift, etc.
Note: Currently, Hevo Data doesn’t support Amazon Kinesis as a source and S3 as a destination.
What is Amazon Kinesis Firehose?
Amazon Kinesis Firehose allows you to stream data into data lakes, data stores, and analytics services. This is a service that is fully managed and can automatically scale to match the throughput of your data without needing any extra help. Amazon Kinesis can deliver data to destinations like Amazon S3, Amazon Redshift, Amazon Elasticsearch, Splunk, etc. It can also deliver data to any custom HTTP endpoint owned by third-party service providers like Datadog, and MongoDB.
Amazon Kinesis Data Firehose is a part of the Amazon Kinesis data platform which includes Amazon Kinesis Data Streams, Amazon Kinesis Video Streams, and Amazon Kinesis Data Analytics. Amazon Kinesis Data Firehose allows you to send data through it; by configuring your data producers to send data to it which then delivers it to the destination of the user’s choice.
Key Features of Amazon Kinesis Data Firehose:
- It’s easy to use.
- Supports integration with AWS services and numerous service providers.
- Serverless Data Transformation.
- Data loading in near-real-time.
- Pay only for what you use.
- No ongoing administration is needed.
Here’s an illustration demonstrating how Amazon Kinesis Firehose works:
To know more about Amazon Kinesis check our other post here. Want to know more? Here’s a video to give you a better idea.
What is Amazon S3?
Amazon S3 or Amazon Simple Storage Service is an Object Storage Service that offers top-notch scalability, security, data availability, and performance. This can therefore be used by customers of all sizes and industries to store and protect any volume of data for several use cases like data lakes, websites, mobile applications, enterprise applications, backup and restore, IoT devices, etc.
Key features of Amazon S3:
- Wide range of cost-effective storage classes.
- Robust security, compliance, and audit capabilities.
- Industry-leading scalability, performance, availability, and durability.
- Query in-place services for analytics.
- Easily manage data and access controls.
- Highly supported cloud storage service.
Know more about Amazon S3 from our other article- Understanding AWS S3 Amazon: 3 Critical Aspects.
A fully-managed No-code Data Pipeline platform like Hevo Data, helps you load data from 150+ different sources (including 40+ free sources) to a destination of your choice in real-time in an effortless manner. Hevo with its minimal learning curve can be set up in just a few minutes allowing the users to load data without having to compromise performance.
A few salient features of Hevo are as follows:
- Completely Automated: The Hevo platform can be set up in just a few minutes and requires minimal maintenance.
- Real-Time Data Transfer: Hevo provides real-time data migration, so you can have analysis-ready data always.
- 24/7 Live Support: The Hevo team is available round the clock to extend exceptional support to you through chat, email, and support calls.
- Schema Management: Hevo takes away the tedious task of schema management & automatically detects the schema of incoming data and maps it to the destination schema.
- Live Monitoring: Hevo allows you to monitor the data flow so you can check where your data is at a particular point in time.
Sign up here for a 14-Day Free Trial!
Prerequisites
A few concepts to keep in mind as the article progresses further:
- Kinesis Data Firehose Delivery Stream: This is the underlying entity of Amazon Kinesis Data Firehose. You can create a Delivery Stream from Amazon Kinesis Data Firehose and send data to it.
- Record: Data of interest sent by the data producer to an Amazon Kinesis Data Firehose Delivery Stream which can be as large as 1000 KB.
- Data Producer: A producer will send records to the Delivery Stream. E.g. A web server sending log data to a Delivery Stream is a data producer. The Delivery Stream can be configured to automatically read data from an existing Amazon Kinesis Data Stream, and load it to the destinations.
- Buffer Size and Buffer Interval: Amazon Kinesis Data Firehose buffers the incoming data to a certain size or for a specified period before delivering this data to the destinations. Buffer Size is in MBs and Buffer Interval is in seconds.
Steps to Set Up Amazon Kinesis S3 Integration
You can integrate Amazon S3 with Amazon Kinesis Firehose by implementing the following steps:
Step 1: Creating & Configuring a Delivery Stream
Begin by first creating an Amazon Kinesis Data Firehose Delivery Stream using AWS Management Console or an AWS SDK for the chosen destination which is Amazon S3 in this case. Sign in to the AWS Management Console and open the Kinesis Console. Next, choose Data Firehose from the navigation pane and choose the Create Delivery System option.
Then choose a name for the Delivery System by entering the value in the Delivery Stream Name field. Under the source, select Direct PUT or other sources, this enables the creation of a Data Delivery Stream that producer applications can directly write to.
If Amazon Kinesis Stream is selected, it’ll configure an Amazon Kinesis Firehose Delivery Stream that will use an Amazon Kinesis Data Stream as a data source. Amazon Kinesis Data Firehose can then be used to read data easily from an existing Amazon Kinesis Data Stream and load it into destinations.
Integrate data from Amazon S3 to Redshift
Integrate data from Amazon S3 to Snowflake
Integrate data from Amazon RDS to Redshift
Step 2: Configuring Data Transformations
In the Transform Source Records with AWS Lambda section, values for Record Transformation need to be provided. Two choices are provided here, Enabled and Disabled. To create a Delivery Stream that doesn’t transform incoming data; choose Disabled. Choose Enabled when an AWS Lambda function needs to be specified which will transform the incoming data before delivering it.
A new AWS Lambda function can be configured using one of the AWS Lambda blueprints or going with an existing Lambda function. The Convert Record Format section provides values for the Record format Conversion field. If the Delivery Stream shouldn’t convert the format of the incoming records pick the Disabled option. If the Delivery Stream needs to convert the format of the incoming records, then choose Enabled and specify the format required.
Step 3: Choosing the Destination
You now need to choose the desired destination for your Amazon Kinesis Delivery Stream to load data. To select Amazon S3 as the destination, you need to put Amazon S3 in the Destination field.
Apart from the Destination field, you will also have to choose an S3 bucket where the streaming data should be delivered. This can either be an existing Amazon S3 bucket or a new one.
The Prefix and Error Prefix are optional fields. The Prefix field can be left blank if the default prefix for Amazon S3 objects is to be used. A custom prefix can be provided which is different from the default UTC format for delivered Amazon S3 objects.
Error prefix can be specified for Amazon Kinesis Data Firehose to use when delivering data to Amazon S3 in error conditions. For more information, one can refer to Amazon S3 Object Name Format and Custom Prefixes for Amazon S3 Objects.
This example uses ExtendedS3DestinationConfiguration property to specify an Amazon S3 destination for the delivery stream.
Here’s the code snippet in JSON:
{
"AWSTemplateFormatVersion" : "2010-09-09",
"Description" : "Stack for Firehose DeliveryStream S3 Destination.",
"Resources": {
"deliverystream": {
"DependsOn": ["deliveryPolicy"],
"Type": "AWS::KinesisFirehose::DeliveryStream",
"Properties": {
"ExtendedS3DestinationConfiguration": {
"BucketARN": {"Fn::Join": ["", ["arn:aws:s3:::", {"Ref":"s3bucket"}]]},
"BufferingHints": {
"IntervalInSeconds": "60",
"SizeInMBs": "50"
},
"CompressionFormat": "UNCOMPRESSED",
"Prefix": "firehose/",
"RoleARN": {"Fn::GetAtt" : ["deliveryRole", "Arn"] },
"ProcessingConfiguration" : {
"Enabled": "true",
"Processors": [
{
"Parameters": [
{
"ParameterName": "LambdaArn",
"ParameterValue": {"Fn::GetAtt" : ["myLambda", "Arn"] }
}],
"Type": "Lambda"
}]
}
}
}
},
"s3bucket": {
"Type": "AWS::S3::Bucket",
"Properties": {
"VersioningConfiguration": {
"Status": "Enabled"
}
}
},
"deliveryRole": {
"Type": "AWS::IAM::Role",
"Properties": {
"AssumeRolePolicyDocument": {
"Version": "2012-10-17",
"Statement": [
{
"Sid": "",
"Effect": "Allow",
"Principal": {
"Service": "firehose.amazonaws.com"
},
"Action": "sts:AssumeRole",
"Condition": {
"StringEquals": {
"sts:ExternalId": {"Ref":"AWS::AccountId"}
}
}
}
]
}
}
},
"deliveryPolicy": {
"Type": "AWS::IAM::Policy",
"Properties": {
"PolicyName": "firehose_delivery_policy",
"PolicyDocument": {
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:AbortMultipartUpload",
"s3:GetBucketLocation",
"s3:GetObject",
"s3:ListBucket",
"s3:ListBucketMultipartUploads",
"s3:PutObject"
],
"Resource": [
{"Fn::Join": ["", ["arn:aws:s3:::", {"Ref":"s3bucket"}]]},
{"Fn::Join": ["", ["arn:aws:s3:::", {"Ref":"s3bucket"}, "*"]]}
]
}
]
},
"Roles": [{"Ref": "deliveryRole"}]
}
}
}
}
Step 4: Configuring Settings for Amazon Kinesis Firehose
On the configure settings page, provide values for the following fields: Buffer Size and Buffer Interval for the destination. Amazon Kinesis Data Firehose will buffer incoming data before delivering it to the specified destination. For Amazon S3, a buffer size of 1-128 MiBs and a buffer interval of 60-900 seconds can be chosen.
- Buffer Size & Buffer Interval: Amazon S3 is used by Amazon Kinesis Firehose to backup all data or only failed data to the destination. If the data delivery to the destination falls behind data writing to the delivery stream, Amazon Kinesis will dynamically raise the buffer size to catch up. This ensures all data is delivered to the destination.
- Amazon S3 Compression: Choose from GZip, Snappy, Zip, or Hadoop-compatible Snappy data compression or no data compression whatsoever.
- Encryption: Amazon Kinesis Data Firehose supports Amazon S3 server-side encryption with AWS Key Management Service (AWS KMS) to encrypt delivered data in Amazon S3. One can choose to either not encrypt the data or encrypt the data with a key from the list of AWS KMS keys owned.
- Error Logging: If data transformation is enabled, Amazon Kinesis will be able to log the AWS Lambda invocation and send the data delivery errors to CloudWatch logs. The specific error logs can then be viewed if the AWS Lambda invocation or data delivery fails.
- IAM Role: The user can choose to create a new role where the required permissions will be assigned automatically, or they can choose an existing role created for Amazon Kinesis Firehose. This role is then used to grant access to the user’s Amazon S3 bucket, AWS KMS key, and AWS Lambda function.
After reviewing properly, choose the Create Delivery Stream option to make your Delivery Stream available. Once the Delivery Stream is in an Active state, data can be sent from the producer to the Amazon Kinesis Firehose Delivery Stream.
Once the Delivery Stream has been constructed, it needs to be tested out before being put to actual use. Testing is carried out as follows:
- Open the Amazon Kinesis Data Firehose Console to begin the process.
- Select the Delivery Stream.
- Under the option Test with Demo Data, choose to start sending demo data to generate sample stock sticker data.
- The on-screen instructions should be followed to verify that the data is being delivered to the chosen Amazon S3 bucket.
- After the completion of the test, choose to Stop Sending Demo Data to halt any incurring usage charges.
This process of setting up Amazon Kinesis S3 Integration is not without its challenges. A major challenge encountered while undertaking Amazon Kinesis S3 Integration is the data not being delivered to Amazon S3. In the next section, a few ways of tackling this problem are discussed.
Overcoming Challenges of Setting Up Kinesis S3 Integration
If the data doesn’t get delivered to your Amazon S3 bucket, you can check the following:
- You should check the Amazon Kinesis Data Firehose’s IncomingBytes and IncomingRecords metrics to make sure that the data has been sent to their Amazon Kinesis Data Firehose Delivery Stream successfully.
- Error logging should be enabled and error logs should be checked for delivery failure.
- Ensure that the Amazon S3 bucket specified in the Amazon Kinesis Firehose Delivery Stream still exists.
- If the data transformation is enabled, ensure that the AWS Lambda function specified in the Delivery Stream still exists.
- Make sure that if data transformation is used, the AWS Lambda function never returns the responses whose payload size happens to exceed 6 MB.
- Ensure that the IAM role specified in the Delivery Stream has access to the Amazon S3 bucket and the AWS Lambda function.
Start S3 Integration in Real-time
Conclusion
This article teaches you how to set up Amazon Kinesis S3 Integration with ease. It first provides a brief overview of what these two services are before diving into the procedure of setting up an Amazon Kinesis S3 connection. Finally, the article comes to an end with some ways to troubleshoot a very common problem that may occur while trying to set up an Amazon Kinesis S3 connection.
A simpler way to transfer data from a source to a Data Warehouse would be to leverage Hevo as a part of your workflow. Hevo Data, a No-code Data Pipeline helps to transfer data from 150+ data sources (including 60+ free sources) to a destination of your choice in a fully automated and secure manner without having to dabble with code.
Frequently Asked Questions
1. Does Kinesis use S3?
Amazon Kinesis can be integrated with S3 to store and archive data, but it does not inherently use S3. You can configure Kinesis Firehose to deliver data directly to S3.
2. What is AWS kinesis used for?
AWS Kinesis is used for real-time processing and analyzing streaming data. It allows you to collect, process, and analyze data from various sources in real-time, such as application logs, event streams, and IoT devices.
3. How to save Kinesis data stream to S3?
-Set up an Amazon Kinesis Firehose delivery stream.
-Configure the delivery stream to receive data from the Kinesis Data Stream.
-Set Amazon S3 as the destination for your Firehose stream.
-Firehose will automatically deliver and store the stream data in the specified S3 bucket.
Amit is a Content Marketing Manager at Hevo Data. He is passionate about writing for SaaS products and modern data platforms. His portfolio of more than 200 articles shows his extraordinary talent for crafting engaging content that clearly conveys the advantages and complexity of cutting-edge data technologies. Amit’s extensive knowledge of the SaaS market and modern data solutions enables him to write insightful and informative pieces that engage and educate audiences, making him a thought leader in the sector.