Amazon Redshift is a leading data warehouse in the market, and many organizations are opting for the services of AWS for their day-to-day analysis. While selecting Redshift, it is essential to know about available compute nodes that Redshift offers and which one will best suit the requirement. In this blog, we will provide an in-depth report on Redshift, Redshift Node Types, and pricing. Here’s the detailed list of what you’ll be covering in this blog.
Table of Contents
What is Redshift Architecture?
AWS Redshift has a very simple Architecture. It contains a leader node and cluster of compute nodes that perform analytics on data. The below snap depicts the schematics of AWS Redshift architecture:
AWS Redshift offers JDBC connectors to interact with client applications using major programming languages like Python, Scala, Java, Ruby, etc.
Hevo offers a faster way to move data from databases or SaaS applications into your data warehouse such as Amazon Redshift, to be visualized in a BI tool. Hevo is fully automated and hence does not require you to code.
Get Started with Hevo for Free
Check out some of the cool features of Hevo:
- Completely Automated: The Hevo platform can be set up in just a few minutes and requires minimal maintenance.
- Real-time Data Transfer: Hevo provides real-time data migration, so you can have analysis-ready data always.
- 100% Complete & Accurate Data Transfer: Hevo’s robust infrastructure ensures reliable data transfer with zero data loss.
- Scalable Infrastructure: Hevo has in-built integrations for 100+ sources that can help you scale your data infrastructure as required.
- 24/7 Live Support: The Hevo team is available round the clock to extend exceptional support to you through chat, email, and support calls.
- Schema Management: Hevo takes away the tedious task of schema management & automatically detects the schema of incoming data and maps it to the destination schema.
- Live Monitoring: Hevo allows you to monitor the data flow so you can check where your data is at a particular point in time.
Simplify your Data Analysis with Hevo today!
Sign up here for a 14-Day Free Trial!
What are Redshift Node Types?
From the above architecture, you can see, the Redshift cluster consists of nodes (Compute and Leader) to perform parallel computing. The leader node acts as a master that receives queries from the client, parses the query, and develops the execution plans. Once the execution plans are created, it then coordinates with compute nodes for parallel execution of the queries and then aggregates the intermediate results from the nodes. It then finally returns the results to the client applications.
Download the Cheatsheet on How to Set Up High-performance ETL to Redshift
Learn the best practices and considerations for setting up high-performance ETL to Redshift
In this section, you explore:
Redshift Node Types Details
Redshift Node Types are classified on the below parameters. While choosing the node, you need to carefully examine the parameters and choose the one that suits best your requirement. Let’s have a detailed look at these parameters –
- vCPU is the number of virtual CPUs for each node.
- RAM is the amount of memory in gibibytes (GiB) for each node.
- Slice per Node – It is the parameter that defines the number of slices the nodes are partitioned into when the cluster is created or resized.
- Storage is the capacity and type of storage for each node.
- Node Range is the minimum and the maximum number of nodes that Amazon Redshift supports for the node type and size.
A typical node configuration table will look like this-
Node comes with pre-configured vCPU, RAM, etc, and hence Amazon Redshift has a complete list of nodes along with its configuration for you to choose the best-suited one.
Redshift Node Types
Redshift offers three different node types and that you can choose the best one based on your requirement. Let’s dive deep into each of the node types and their usage.
1) RA3 Node
AWS introduced the RA3 node in late 2019, and it is the 3rd generation instance type for the Redshift family. RA3 features high-speed caching, managed store, and high bandwidth networking.
In the new RA3 generation instance type, Redshift stores permanent data to S3 and uses the local disk for caching purposes. The data from S3 can be retrieved on-demand, and hence the RA3 instances split the cost between computing and storage. You have to pay for storage and computing per GB.
As shown in the table, the RA3 node comes with two options of 12 and 48 vCPU cores with pre-configured RAM, Slices, and storage quotas with a minimum of two instances.
2) Dense Compute Node (DC2)
Dense Compute nodes (DC2) are optimized for processing data and are compute-intensive data warehouses that use SSD for local storage. It allows you to choose several nodes based on your data size and performance requirements. As a thumb rule, if you have less than 500 GB of data then it is advisable to go for DC2 instance type as it can provide excellent computation power and SSD for optimal storage.
DC2 stores the data locally for high performance, and it allows you to add more compute nodes if you need extra space.
3) Dense Storage Node (DS2)
DS2 allows you to have a storage-intensive data warehouse with vCPU and RAM included for computation. DS2 nodes use HDD(Hard Disk Drive) for storage and as a rule of thumb, if you have data more than 500 GB, then it is advisable to go for DS2 instances.
Redshift Node Types Sizing
Redshift Node Types Sizing is an important aspect that you need to look at when you’re opting for Redshift for your migration and ETL activities. The nodes are the backbone for the computation, and sufficient nodes will help you to migrate the data with efficient performance.
Redshift provides a storage-centric sizing approach for migrating approx one petabyte of uncompressed data.
With the simple-sizing approach, the data volume is the key and Redshift achieves 3x-4x data compression, which means the Redshift will reduce the size of the data while storing it by compressing it to 3x-4x times of original data volume.
Also, you must look out for free capacity in the cluster, which may be around 20% of the total size.
The below equation represents the simple-sizing approach.
This equation is appropriate for typical data migrations, but it is important to note that suboptimal data modeling practices could artificially lead to insufficient storage capacity.
Hence for 100 TB of data, we approximately need 21 DS2 – xLarge nodes for optimum storage and computations. However, there are other factors like replication, data processing layers, etc. that might affect this equation, and that need to be addressed separately.
Redshift Node Types Pricing
Redshift works on two pricing models, viz On-demand and reserved instance pricing. Both the pricing depends on the type of nodes you have selected, the number of nodes, RAM, vCPU’s. Below are the pricing tables for nodes in the US East zone; however, different regions have different pricing. For more information, you can have a look at the official documentation here.
Factors that Affect Redshift Node Types Price
The prime factor that affects Redshift Node Types price is as below:
- Pay Model: Redshift has two payment options – Pay by hours (On-demand basis) or on the contract (reserve instances). For obvious reasons, the on-demand pricing is higher than reserved instance pricing. Based on your requirement, you can choose the pricing option and can effectively select your architecture.
- Node Type: As you have seen in the above table, how the pricing differs with different node types. You can check whether you need a Dense Compute system (DC2) or Dense Storage system (DS2), or new generation RA3 type instances for your project. Carefully chosen nodes can effectively reduce your overall costs.
- Node Size: Node comes in Large or Extra Large size, and you can select based on the computing power, memory, and I/O speed you require.
- Number of Nodes: The minimum number of nodes that you can have is 1, and it can grow as large as a cluster of 128 nodes. Depending upon your computation requirement, you can select the number of nodes needed.
- AWS Region: AWS Region can add a high cost to your project. Nodes in Different regions cost differently. Carefully choose the storage and compute region to optimize these prices.
- Additional costs: There is always an additional cost that depends on how the architect has designed the pipeline, or you might need more storage, compute, and other services from AWS.
In this blog post, we discussed AWS Redshift Node Types, sizing, pricing, and factors affecting node price. AWS Redshift has exceptional capabilities to process petabytes of data, and generate in-depth insights.
AWS Redshift provides out-of-the-box capabilities to process a huge volume of data to generate insights. However, if you’re looking for an easier solution, we recommend you to try – Hevo Data.
Visit our Website to Explore Hevo
Hevo Data is a No-Code Data Pipeline that helps you transfer data from a source of your choice in a fully automated and secure manner without having to write code repeatedly. Hevo, with its strong integration with 100+ sources & BI tools, allows you to export, load, transform & enrich your data & make it analysis-ready in a jiffy.
Want to take Hevo for a spin?
Sign Up and experience the feature-rich Hevo suite first hand. You can also have a look at the unbeatable pricing that will help you choose the right plan for your business needs.
Share your thoughts on Redshift Node Types in the comments below!