Understanding Amazon Kinesis: 4 Important Components

• June 3rd, 2021

Amazon Kinesis: Featured Image

Data generation and storage have become a key focus for regular operations across numerous Business organizations. Managing these data streams from various sources can become a challenging task to handle as the scale of its deployment increases. Amazon Kinesis is one such fully-managed Cloud-based service that allows you to process data from a diverse set of sources and stream it to various destinations in real-time.

In this article, you will be introduced to Amazon Kinesis and its key features. You will learn about the different components of Amazon Kinesis and Kinesis Agent for Windows. 

Table of Contents 

Introduction to Amazon Kinesis 

Amazon Kinesis: Amazon Kinesis Logo
Image Source

Amazon Kinesis is a fully-managed, scalable, Cloud-Based service provided by Amazon that allows users to process real-time streaming of large amounts of data per second from a diverse set of sources. It can be used to capture, store, and process data from large, distributed streams which can include event logs and social media feeds. The services availed by the platform can be scaled up and down [which are deployed on EC2 instances] according to the unique data requirements.

The platform can be used to distribute the transferred data across multiple consumers simultaneously. Users can build custom streaming applications for their streaming requirements. It provides the ability to perform real-time analytics on data that has traditionally been analyzed using batch processing. An example would be, you can use Kinesis Data Firehose to continuously load streaming data into your Amazon S3 data lake or analytics services. 

Amazon Kinesis: Data Analytics Pipeline - Amazon Kinesis
 Image Source

Official documentation of Amazon Kinesis can be found here.

Key Features of Amazon Kinesis

  • Ease of Use: Users can seamlessly set up custom streams and deploy their data pipelines by setting the requirements and start streaming quickly.
  • No Server Administration Required: There is no infrastructure that is required to be managed and many of the services are operated automatically so no continuous administration for deployments is required.
  • Stream from Millions of Devices: The SDKs [Software Development Kit] provided with Amazon Video Streams enable streaming media to AWS for playback, storage, analytics, machine learning and other relevant processes. 
  • Cost Efficient: The platform offers pay as you use model which makes it cost-effective for organizations.
  • High Scalability: Based on Amazon Web Services, it provides the ability to rapidly scale up and down according to the requirements of the user. 

Use Hevo Data for Seamless Data Migration to Destination of Your Choice

Hevo is a No-code Data Pipeline that offers a fully managed solution to set up data integration from 100+ data sources (including 30+ Free Data Sources) and will let you directly load data to a Data Warehouse and visualize it in a BI tool of your choice. It will automate your data flow in minutes without writing any line of code. Its fault-tolerant architecture makes sure that your data is secure and consistent. Hevo provides you with a truly efficient and fully automated solution to manage data in real-time and always have analysis-ready data.

Get Started with Hevo for Free

Check out what makes Hevo amazing:

  • Secure: Hevo has a fault-tolerant architecture that ensures that the data is handled in a secure, consistent manner with zero data loss.
  • Schema Management: Hevo takes away the tedious task of schema management & automatically detects schema of incoming data and maps it to the destination schema.
  • Minimal Learning: Hevo, with its simple and interactive UI, is extremely simple for new customers to work on and perform operations.
  • Hevo Is Built To Scale: As the number of sources and the volume of your data grows, Hevo scales horizontally, handling millions of records per minute with minimal latency.
  • Incremental Data Load: Hevo allows the transfer of data that has been modified in real-time. This ensures efficient utilization of bandwidth on both ends.
  • Live Support: The Hevo team is available round the clock to extend exceptional support to its customers through chat, email, and support calls.
  • Live Monitoring: Hevo allows you to monitor the data flow and check where your data is at a particular point in time.
Sign up here for a 14-Day Free Trial!

Understanding Components of Amazon Kinesis 

There are four main components of Kinesis that can be used to accomplish different tasks using their AWS services. 

Kinesis Firehose

Amazon Kinesis: Firehose Data Pipeline- Amazon Kinesis
Image Source

Kinesis Firehose is responsible for loading data into Amazon Web Services. It performs a critical function of transforming and loading data into the Cloud services which are primarily used for analytical purposes. Some of these services include Kinesis Analytics, Amazon Redshift, Amazon S3, Amazon Elasticsearch Service etc. This platform requires no continuous administration and automatically scales and manages its functions according to the data throughput. 

Kinesis Data Analytics

Amazon Kinesis: Amazon Kinesis Data Analytics Data Pipeline
Image Source

Kinesis Data Analytics is a platform for analysing and processing any Real-Time streaming data using Standard SQL. It is mainly used to analyze the data being ingested from Kinesis Firehose and Kinesis Data Streams. It can detect the standard data formats and starts by automatically parsing the data while it recommends a schema which can be customized by the user using the interactive schema editor. The function of the Kinesis Analytics in the data pipeline can be understood using the illustration below. 

Amazon Kinesis: AWS Data Analytics Stack - Amazon Kinesis
Image Source

Kinesis Data Streams

Amazon Kinesis: Data Stream Data Pipeline Illustration - Amazon Kinesis
Image Source

Kinesis Data Streams provides a platform for continuous data processing of Real-Time streaming data. It can be used to collect log events from servers and other mobile deployments. The platform strongly focuses on security and allows its users to encrypt sensitive data using server-side encryption and AWS KMS master keys. Kinesis Data Streams can be created very easily by using Kinesis Producer Library [KPL].

Kinesis Video Streams

Amazon Kinesis: Video Streams Data Pipeline Illustration - Amazon Kinesis
Image Source

Kinesis Video Streams provides a use-case specific platform with the ability to stream video from Camera-equipped devices to Amazon Web Services. It can be incorporated/implemented for use cases that involve video streaming over the Internet or for applications like storage of security footage on Cloud Data Warehouse Deployments. The platform also provides support for WebRTC, which is an open-source project that enables real-time media streaming and interaction between web browsers, mobile applications, and connected devices using simple APIs [Application Programming Interface].

Prerequisites for Installation of Amazon Kinesis

  • Working Knowledge of Amazon Kinesis concepts. 
  • Active Amazon Web Services [AWS] Account with access to various AWS Services. 
  • Access to the streams from where you would want to send data using Kinesis Agent. 
  • Access to Firehouse delivery streams where you want to send data using Kinesis Agent. 
  • Microsoft .NET Framework 4.6 or Later installed on all the Desktop or Servers where the Kinesis is supposed to be deployed. 

If you are not sure about your .NET version, you can run the following commands and check for it. 

    [System.Version](
     (Get-ChildItem 'HKLM:SOFTWAREMicrosoftNET Framework SetupNDP' -recurse `
     | Get-ItemProperty -Name Version -ErrorAction SilentlyContinue `
     | Where-Object { ($_.PSChildName -match 'Full') } `
     | Select-Object Version | Sort-Object -Property Version -Descending)[0]).Version

Understanding Kinesis Agent for Windows

Amazon Kinesis Agent for Microsoft Windows [Kinesis Agent for Windows] is a configurable and extensible agent for accessing the services of Kinesis components. It is required to be installed on the computers and servers where the Kinesis services are supposed to be accessed and controlled. The software is responsible for establishing out the connection with Cloud resources efficiently; Parse, Transform, and Stream logs, events and metrics to multiple AWS services of the user’s choice. The steps for streaming of log files using Kinesis Data Streams are illustrated in the figure below. 

Amazon Kinesis: Working of Kinesis Agent for Windows - Amazon Kinesis
Image Source

The software can also be used to build custom, Real-time Data Pipelines using stream-processing frameworks according to the requirements of the user. Below is an illustration of one such Data Pipeline that can be created using the platform. 

Amazon Kinesis: Sample Data Streaming Pipeline Setup - Amazon Kinesis
Image Source

Installing Kinesis Agent using Command Line Interface 

The process of installation of Kinesis Agent for Windows is fairly simple using Command Line Interface . Firstly, you are required to store the following Command Script using a Text Editor as a PowerShell Script.

For this instance we will save this script with the name InstallKinesisAgent.ps1 as an example. 

Param(
    [ValidateSet("prod", "beta", "test")]
    [string] $environment = 'prod',
    [string] $version,
    [string] $baseurl
)

# Self-elevate the script if required.
if (-Not ([Security.Principal.WindowsPrincipal] [Security.Principal.WindowsIdentity]::GetCurrent()).IsInRole([Security.Principal.WindowsBuiltInRole] 'Administrator')) {
    if ([int](Get-CimInstance -Class Win32_OperatingSystem | Select-Object -ExpandProperty BuildNumber) -ge 6000) {
        $CommandLine = '-File "' + $MyInvocation.MyCommand.Path + '" ' + $MyInvocation.UnboundArguments
        Start-Process -FilePath PowerShell.exe -Verb Runas -ArgumentList $CommandLine
        Exit
    }
}
# this code will allow you to change the base url which is useful for #testing
# Allows input to change base url. Useful for testing.
if ($baseurl) {
    if (!$baseUrl.EndsWith("/")) {
        throw "Invalid baseurl param value. Must end with a trailing forward slash ('/')"
    }

    $kinesistapBaseUrl = $baseurl
} else {
    $kinesistapBaseUrl = "https://s3-us-west-2.amazonaws.com/kinesis-agent-windows/downloads/"
}

Write-Host "Using $kinesistapBaseUrl as base url"

$webClient = New-Object System.Net.WebClient

try {
    $packageJson = $webClient.DownloadString($kinesistapBaseUrl + 'packages.json' + '?_t=' + [System.DateTime]::Now.Ticks) | ConvertFrom-Json
} catch {
    throw "Downloading package list failed."
}


if ($version) {
    $kinesistapPackage = $packageJson.packages | Where-Object { $_.packageName -eq "AWSKinesisTap.$version.nupkg" }

    if ($null -eq $kinesistapPackage) {
        throw "No package found matching input version $version"
    }
} else {
    $packageJson = $packageJson.packages | Where-Object { $_.packageName -match ".nupkg" }
    $kinesistapPackage = $packageJson[0]
}

$packageName = $kinesistapPackage.packageName
$checksum = $kinesistapPackage.checksum

#Create the requisite 
#Create %TEMP%/kinesistap if not exists
$kinesistapTempDir = Join-Path $env:TEMP 'kinesistap'
if (![System.IO.Directory]::Exists($kinesistapTempDir)) {[void][System.IO.Directory]::CreateDirectory($kinesistapTempDir)}

#Download KinesisTap.x.x.x.x.nupkg package
$kinesistapNupkgPath = Join-Path $kinesistapTempDir $packageName
$webClient.DownloadFile($kinesistapBaseUrl + $packageName, $kinesistapNupkgPath)
$kinesistapUnzipPath = $kinesistapNupkgPath.Replace('.nupkg', '')

# Calculates hash of downloaded file. Downlevel compatible using .Net hashing on PS < 4
if ($PSVersionTable.PSVersion.Major -ge 4) {
    $calculatedHash = Get-FileHash $kinesistapNupkgPath -Algorithm SHA256
    $hashAsString = $calculatedHash.Hash.ToLower()
} else {
    $sha256 = New-Object System.Security.Cryptography.SHA256CryptoServiceProvider
    $calculatedHash = [System.BitConverter]::ToString($sha256.ComputeHash([System.IO.File]::ReadAllBytes($kinesistapNupkgPath)))
    $hashAsString = $calculatedHash.Replace("-", "").ToLower()
}

if ($checksum -eq $hashAsString) {
	Write-Host 'Local file hash matches checksum.' -ForegroundColor Green
} else {
	throw ("Get-FileHash does not match! Package may be corrupted.")
}

#Delete Unzip path if not empty
if ([System.IO.Directory]::Exists($kinesistapUnzipPath)) {Remove-Item –Path $kinesistapUnzipPath -Recurse -Force}

#Unzip KinesisTap.x.x.x.x.nupkg package
$null = [System.Reflection.Assembly]::LoadWithPartialName('System.IO.Compression.FileSystem')
[System.IO.Compression.ZipFile]::ExtractToDirectory($kinesistapNupkgPath, $kinesistapUnzipPath)

#Execute chocolaeyInstall.ps1 in the package and wait for completion.
$installScript = Join-Path $kinesistapUnzipPath 'toolschocolateyInstall.ps1'
& $installScript

# Verify service installed.
$serviceName = 'AWSKinesisTap'
$service = Get-Service -Name $serviceName -ErrorAction Ignore
if ($null -eq $service) {
    throw ("Service not installed correctly.")
} else {
    Write-Host "Kinesis Tap Installed." -ForegroundColor Green
    Write-Host "After configuring run the following to start the service: Start-Service -Name $serviceName." -ForegroundColor Green
}

After saving the Script, open a command prompt window and navigate to the directory where the saved file is located. Subsequently run the command given below: 

PowerShell.exe -File ".InstallKinesisAgent.ps1"

If you need to install a specific version of the software you can add “-version” in your command like shown here: 

PowerShell.exe -File ".InstallKinesisAgent.ps1" -version "version"

Limitations of Amazon Kinesis 

Here  are some  limitations that should be kept into consideration while managing data using Amazon Kinesis Streams:

  • The records of a stream are by default stored and accessible for up to 24 hours. It is possible to extend this duration by 7 days by enabling extended data retention. 
  • The maximum size of a Data Blob, which refers to the data payload before applying Base64-encoding, for one record is of 1 MB. 
  • One Shard [Base Throughput unit of Kinesis Data Stream] can support up to 1000 PUT records each second. 

More information regarding sizes and limits in Kinesis can be found here.

Conclusion 

In this article, you learned about Amazon Kinesis, its key features, its different components namely Kinesis Firehose, Kinesis Data Analytics, Kinesis Data Streams, Kinesis Video Streams. You were introduced to Amazon Kinesis agent to Windows and steps for installation of Kinesis agent using Command Line Interface. 

If you are interested in understanding the method to connect the Kinesis stream to Amazon S3, you can find the guide here

Visit our Website to Explore Hevo

Integrating and analyzing data from a huge set of diverse sources can be challenging, this is where Hevo comes into the picture. Hevo Data, a No-code Data Pipeline helps you transfer data from a source of your choice in a fully automated and secure manner without having to write the code repeatedly. Hevo with its strong integration with 100+ sources & BI tools, allows you to not only export & load Data but also transform & enrich your Data & make it analysis-ready in a jiffy.

Want to take Hevo for a spin? Sign Up for a 14-day free trial and experience the feature-rich Hevo suite first hand.

Share your understanding of Amazon Kinesis in the comments section!

No-code Data Pipeline for your Data Warehouse