AWS Kinesis PutRecord Command: Made Easy 101

on Amazon Kinesis, AWS, AWS Kinesis, Data Streaming, PutRecord Command • May 17th, 2022

AWS Kinesis is a set of managed services that allows you to collect and process streaming data in real time. Big Data is fueled by streaming data processing platforms. Streaming data can be collected from website click-throughs, marketing, and financial information, social media feeds, Internet of Things sensors, and operations logs.

In this article, you will gain information about AWS Kinesis. You will also gain a holistic understanding of streaming data, different components of Kinesis, streams, shard, and record, and the usage of Kinesis PutRecord Command.

Read along to find out in-depth information about the AWS Kinesis PutRecord command.

Table of Contents

What is Streaming Data?

AWS Kinesis PutRecord Command: Streaming Data
Image Source

The phrase “streaming data” refers to data generated in real-time by multiple data sources sending data records in real-time (in a stream). Data streaming is continuous and therefore the data sent in streams is called unbounded data instead of batch (or batch-processed) data.

Some of the different kinds of data that can be sent in streams are as follows:

  • Data generated by customers’ mobile devices or web applications.
  • The activity of customers on e-commerce sites.
  • The activity of in-game players.
  • Facebook and Twitter feed.
  • Financial market data in real-time.
  • Geospatial feeds of location information.
  • Telemetry data is collected from connected devices.

A streaming dataset can be handled one record at a time, sequentially and progressively, or in huge batches aggregated across a sliding time window.

What is AWS Kinesis?

AWS Kinesis PutRecord Command: Amazon Kinesis Logo
Image Source

Kinesis is a service provided by Amazon Web Services that collects, processes, and analyses streaming data. By leveraging Kinesis, you can get timely insights into your data, whether they’re scheduled or event-based, and take action based on those insights, just like with all the other services.

AWS Kinesis offers four specialized services categorized roughly by types and stages of streaming data processing. Those are as follows:

1) Kinesis Data Streams (KDS)

Kinesis PutRecord Command: KDS
Image Source

With Kinesis Data Streams, a variety of streaming data sources can be captured in real-time. Data is written to the Kinesis Data Stream by producer applications and then read by consumer applications for various types of processing.

2) Kinesis Data Firehose (KDF)

Kinesis PutRecord Command: KDF
Image Source

Kinesis Data Firehose eliminates the need to write applications and manage resources. The data producers submit the data to Kinesis Data Firehose, and the data is automatically delivered to the destination. Additionally, Kinesis Data Firehose can also be configured to transform the data before sending it.

3) Kinesis Data Analytics (KDA)

Kinesis PutRecord Command: KDA
Image Source

Streaming data can be processed and analyzed with Kinesis Data Analytics. The Apache Flink framework provides useful operators for querying streaming data, including the map, filter, aggregate, window, etc., which makes it possible to run applications efficiently and with flexible scalability.

4) Kinesis Video Streams (KVS)

Kinesis PutRecord Command: KVS
Image Source

The Kinesis Video Streams service allows you to stream live video to the AWS Cloud from video and audio-capturing devices or build applications for online video analysis or real-time video processing.

Seamless Data Replication from AWS Sources Using Hevo’s No-Code Data Pipeline

Hevo Data, a Fully-managed Data Pipeline platform, can help you automate, simplify & enrich your data replication process in a few clicks. With Hevo’s wide variety of connectors and blazing-fast Data Pipelines, you can extract & load data from AWS S3 and AWS Elasticsearch straight into your Data Warehouse or any Databases.

To further streamline and prepare your data for analysis, you can process and enrich raw granular data using Hevo’s robust & built-in Transformation Layer without writing a single line of code!

Get Started with Hevo for Free

Hevo is the fastest, easiest, and most reliable data replication platform that will save your engineering bandwidth and time multifold. Try our 14-day full access free trial to experience an entirely automated hassle-free Data Replication!

Understanding Streams, Shards, and Records

Kinesis Data Streams (KDS) is designed so that you can set up a data stream and build producer applications that push data to it and consumer applications that read and process that data.

There are many data carriers in Kinesis Data Streams called shards. However, there is a fixed amount of capacity according to each shard. The number of shards in a data stream determines a stream’s data capacity. The sum of all the capacities of the shards makes up the data stream’s total capacity.

A shard of data is called a record. A shard consists of several data records. Kinesis Data Stream assigns a sequence number to each data record.

AWS Kinesis PutRecord Command

In order to write a single record i.e., a data record into an Amazon Kinesis data stream, the Kinesis PutRecord command is used. Kinesis PutRecord command enables real-time ingestion and subsequent processing of business records as they arrive one at a time. The maximum number of records that a shard can support is 1,000 records per second, which implies that a maximum of one megabyte per second can be written to the data.

The stream name must be specified, a partition key must be specified, and the data blob itself must be specified along with the Kinesis PutRecord command. You can define a data blob as information of any type, for example, a segment from a log file, geographic or location data about a location, website clickstreams about a website, etc.

Using partition keys, Kinesis Data Streams distributes data between shards. Data Streams uses the partition keys associated with each data record to partition data records into shards. In Kinesis Data Streams, the partition key specifies which shard each data record belongs to.

A partition key can be a string up to 256 characters long, which is a Unicode set of characters. For partition keys, MD5 is used to map them to 128-bit integer values, and for data records, MD5 is used to map them to shards using hash key ranges. In order to determine the shard, you can use the ExplicitHashKey parameter to explicitly specify a hash value instead of calculating the partition key. The PutRecord function returns the shard ID and the sequence number of the placed data record.

The different points to be considered while using AWS Kinesis PutRecord Command is as follows:

1) Request Syntax

{
   "Data": blob,
   "ExplicitHashKey": "string",
   "PartitionKey": "string",
   "SequenceNumberForOrdering": "string",
   "StreamName": "string"
}

2) Request Parameters

The request takes JSON data in the following format.

A) Kinesis PutRecord Command: Data

When serializing the data blob, base64-encoding is performed, resulting in a base64-coded record. The size of the data blob (the payload before base64 encoding) must not exceed the maximum record size (1 MiB) when added to the partition key size.

  • Type: Binary data object encoded in Base64
  • Length Constraints: The minimum length is 0 while the maximum length is 1048576.
  • Required: Yes

B) Kinesis PutRecord Command: ExplicitHashKey

A hash value is used to override the partition key hash to determine which shard a data record belongs to.

  • Type: String
  • Pattern: 0|([1-9]d{0,38})
  • Required: No

What Makes Hevo’s ETL Process Best-In-Class

Providing a high-quality ETL solution can be a difficult task if you have a large volume of data. Hevo’s No-Code Automated Data Pipeline empowers you with a fully-managed solution for all your data collection, processing, and loading needs.

Hevo’s native integration with AWS S3 and AWS Elasticsearch empowers you to transform and load data straight to a Data Warehouse such as Redshift, Snowflake, BigQuery & more!

Our platform has the following in store for you:

  • Fully Managed: Hevo requires no management and maintenance as it is a fully automated platform.
  • Faster Insight Generation: Hevo offers near real-time data replication so you have access to real-time insight generation and faster decision making. 
  • Transformations: Hevo provides preload transformations to make your incoming data from AWS S3 and AWS Elasticsearch fit for the chosen destination. You can also use drag and drop transformations like Date and Control Functions, JSON, and Event Manipulation to name a few.
  • Scalable Infrastructure: Hevo has in-built integrations for 100+ sources (with 40+ free sources) that can help you scale your data infrastructure as required.
  • Live Support: The Hevo team is available round the clock to extend exceptional support to its customers through chat, email, and support calls.
Sign up here for a 14-Day Free Trial!

C) Kinesis PutRecord Command: PartitionKey

Data records are assigned PartitionKeys based on which shard they belong to. Each partition key can contain up to 256 characters in Unicode. Streams are mapped to specific shards using a hash function generated by Amazon Kinesis Data Streams using the partition key. 

Partition keys are mapped to 128-bit integer values using the MD5 hash function, and associated data records are mapped to shards using the MD5 hash function. In practice, all information with the same partition key maps to the same shard within the stream as a result of this hashing mechanism.

  • Type: String
  • Length Constraints: The Minimum length is 1. The maximum length is 256.
  • Required: Yes

D) Kinesis PutRecord Command: SequenceNumberForOrdering

A strict increasing sequence number will be guaranteed for puts from the same client and partition key. This lets you set the SequenceNumberForOrdering of record n to the sequence number for record n-1 (as returned when record n-1 is inserted). In the absence of this parameter, records are sorted by arrival time.

  • Type: String
  • Pattern: 0|([1-9]d{0,128})
  • Required: No

E) Kinesis PutRecord Command: StreamName

This is the name of the stream where the data record should go.

  • Type: String
  • Length Constraints: Minimum length of 1. The maximum length of 128.
  • Pattern: [a-zA-Z0-9_.-]+
  • Required: Yes

3) Response Syntax

{
   "EncryptionType": "string",
   "SequenceNumber": "string",
   "ShardId": "string"
}

4) Response Elements

Upon successful completion of the action, the service returns an HTTP 200 response.

JSON data is returned from the service in the following format.

1) EncryptionType

A record’s encryption type. A record’s encryption type can be one of the following:

  • ‘NONE’: Do not encrypt the stream records.
  • AWS KMS: Encrypt the records in the stream on the server side using a customer-managed AWS KMS key.
  • Type: String
  • Valid Values: NONE | KMS

2) SequenceNumber

The sequence number for the data record that was put into the database. Every record in the stream has the same sequence number. Each record in a stream is assigned a sequence number.

  • Type: String
  • Pattern: 0|([1-9]d{0,128})

3) ShardId

This is the shard ID for the shard containing the data record.

  • Type: String
  • Pattern: shardId-[0-9]{12}
Examples

Adding data to a stream

Adding data to the stream using JSON is demonstrated in the following example.

Sample Request

POST / HTTP/1.1
Host: kinesis.<region>.<domain>
Content-Length: <PayloadSizeBytes>
User-Agent: <UserAgentString>
Content-Type: application/x-amz-json-1.1
Authorization: <AuthParams>
Connection: Keep-Alive 
X-Amz-Date: <Date>
X-Amz-Target: Kinesis_20131202.PutRecord
{
  "StreamName": "exampleStreamName",
  "Data": "XzxkYXRhPl8x",
  "PartitionKey": "partitionKey"
}

Sample Response

HTTP/1.1 200 OK
x-amzn-RequestId: <RequestId>
Content-Type: application/x-amz-json-1.1
Content-Length: <PayloadSizeBytes>
Date: <Date> 
{
  "SequenceNumber": "21269319989653637946712965403778482177",
  "ShardId": "shardId-000000000001"
}

Conclusion

In this article, you have learned about AWS Kinesis PutRecord Command. This article also provided information on AWS Kinesis, its components and streaming data.

Hevo Data, a No-code Data Pipeline provides you with a consistent and reliable solution to manage data transfer between a variety of sources and a wide variety of Desired Destinations with a few clicks.

Visit our Website to Explore Hevo

Hevo Data with its strong integration with 100+ Data Sources (including 40+ Free Sources) allows you to not only export data from your desired data sources & load it to the destination of your choice but also transform & enrich your data to make it analysis-ready. Hevo also allows integrating data from non-native sources using Hevo’s in-built REST API & Webhooks Connector. You can then focus on your key business needs and perform insightful analysis using BI tools. 

Want to give Hevo a try? Sign Up for a 14-day free trial and experience the feature-rich Hevo suite first hand. You may also have a look at the amazing price, which will assist you in selecting the best plan for your requirements.

Share your experience of understanding the AWS Kinesis PutRecord Command in the comment section below! We would love to hear your thoughts.

No-code Data Pipeline for your Data Warehouse