AWS Redshift is a pioneer when it comes to completely managed data warehouse services. With its ability to scale on-demand, a comprehensive Postgres compatible querying engine, and multitudes of AWS tools to augment the core capabilities, Redshift provides everything a customer needs to use as the sole data warehouse solution. And with these many capabilities, one would expect Redshift pricing would fall too heavy, but it’s not the case.
In fact, all of these features come at reasonable, competitive pricing. However, the process of understanding Redshift pricing is not straightforward. AWS offers a wide variety of pricing options to choose from, depending upon your use case and budget constraints.
In this post, we will explore the different Redshift pricing options available. Additionally, we will also explore some of the best practices that can help you optimize your organization’s data warehousing costs, too.
Table of Contents
- What is Redshift
- Factors that affect Amazon Redshift Pricing
- Effect of Node Type on Redshift Pricing
- Effect of Regions on Redshift Pricing
- On-demand vs Reserved Instance Pricing
- Redshift’s Pricing for Additional Features
- Tools for keeping your Redshift’s Spending Under Control
- Optimizing Redshift ETL Cost
What is Redshift
Amazon Redshift is a fully-managed petabyte-scale cloud-based data warehouse, designed to store large-scale data sets and perform insightful analysis on them in real-time.
It is highly column-oriented & designed to connect with SQL-based clients and business intelligence tools, making data available to users in real-time. Supporting PostgreSQL 8, Redshift delivers exceptional performance and efficient querying. Each Amazon Redshift data warehouse contains a collection of computing resources (nodes) organized in a cluster, each having an engine of its own and a database to it.
For further information on Amazon Redshift, you can check the official site here.
Hevo Data: A Smart Alternative for Redshift ETL
Hevo is fully-managed and completely automates the process of not only exporting data from your desired source but also enriching the data and transforming it into an analysis-ready form without having to write a single line of code. Its fault-tolerant architecture ensures that the data is handled in a secure, consistent manner with zero data loss.
It provides a consistent & reliable solution to manage data in real-time and always have analysis-ready data in your desired destination. It allows you to focus on key business needs and perform insightful analysis using BI tools such as Tableau and many more.Get Started with Hevo for Free
Check out some amazing features of Hevo:
- Completely Managed Platform: Hevo is fully managed. You need not invest time and effort to maintain or monitor the infrastructure involved in executing codes.
- Live Monitoring: Hevo allows you to monitor the data flow and check where your data is at a particular point in time.
- Data Transformation: It provides a simple interface to perfect, modify, and enrich the data you want to export.
- Secure: Hevo has a fault-tolerant architecture that ensures that the data is handled in a secure, consistent manner with zero data loss.
- Schema Management: Hevo takes away the tedious task of schema management & automatically detects schema of incoming data and maps it to the destination schema.
- Minimal Learning: Hevo with its simple and interactive UI, is extremely simple for new customers to work on and perform operations.
Factors that affect Amazon Redshift Pricing
Amazon Redshift Pricing is broadly affected by four factors:
- The node type that the customer chooses to build his cluster.
- The region where the cluster is deployed.
- Billing strategy – on-demand billing or a reserved pricing strategy.
- Use of Redshift Spectrum.
You can learn about these factors in-depth, in the following sections:
- Effect of Node Type on Redshift Pricing
- Effect of Regions on Redshift Pricing
- On-demand vs Reserved Instance Pricing
- Amazon Redshift Spectrum Pricing
Effect of Node Type on Redshift Pricing
Redshift follows a cluster-based architecture with multiple nodes allowing it to massively parallel process data. (You can read more on Redshift architecture here). This means Redshift performance is directly correlated to the specification and number of nodes that form the cluster. It offers multiple kinds of nodes from which the customers can choose based on the computing and storage requirements.
- Dense compute nodes: These nodes are optimized for computing and offer SSDs up to 2.5 TB and physical memory up to 244 GB. Redshift pricing will also depend on the region in which your cluster will be located. The price of the lowest spec dc2.large instance varies from .25 to .37 $ per hour depending on the region. There is also a higher spec version available which is called dc2.8xlarge which can cost anywhere from 4.8 to 7 $ per hour depending on region.
- Dense storage nodes: These nodes offer higher storage capacity per node, but the storage hardware will be HDDs.Dense storage nodes also allow two versions – a basic version called ds2.large which offers HDDs up to 2 TB and a higher spec version that offers HDDs up to 16 TB per node. Price can vary from .85 to 1.4 $ per hour for the basic version and 6 to 11 $ per hour for the ds2.8xlarge version.
As mentioned in the above sections, Redshift pricing varies on a wide range depending on the node types. One another critical constraint is that your cluster can be formed only using the same type of nodes. So you would need to find the most optimum node type based on specific use cases.
As a thumb rule, AWS itself recommends a dense compute type node for use cases with less than 500 GB of data. There is a possibility of using previous generation nodes for a further decrease in price, but we will not recommend them since they miss out on the critical elastic resize feature. This means scaling could go into hours when using such nodes.
Effect of Regions on Redshift Pricing
Since Redshift pricing varies, from costs for running their data centers in different parts of the world to the pricing of nodes depending on the region where the cluster is to be deployed.
Let’s deliberate on some of the factors that may affect the decision of which region to deploy the cluster.
- While choosing regions, it may not be sensible to choose the regions with the cheapest price, because the data transfer time can vary according to the distance at which the clusters are located from their data source or targets. It is best to choose a location that is nearest to your data source.
- In specific cases, this decision may be further complicated by the mandates to follow data storage compliance, which requires the data to be kept in specific country boundaries.
- AWS deploys its features in different regions in a phased manner. While choosing regions, it would be worthwhile to ensure that the AWS features that you intend to use outside of Redshift are available in your preferred region.
In general, US-based regions offer the cheapest price while Asia-based regions are the most expensive ones.
On-demand vs Reserved Instance Pricing
Amazon offers discounts on Redshift pricing based on its usual rates if the customer is able to commit to a longer duration of using the clusters. Usually, this duration is in terms of years. Amazon claims a saving of up to 75 percent if a customer uses reserved instance pricing.
When you choose reserved pricing, irrespective of whether a cluster is active or not for the particular time period, you still have to pay the predefined amount. Redshift currently offers three types of reserved pricing strategies:
- No upfront: This is offered only for a one-year duration. The customer gets a 20 percent discount over existing on-demand prices.
- Partial upfront: The customer needs to pay half of the money up front and the rest in monthly installments. Amazon assures up to 41 % discount on on-demand prices for one year and 71% over 3 years. This can be purchased for a one to three-year duration.
- Full payment upfront: Amazon claims a 42 % discount over a year period and a 75 % discount over three years if the customer chooses to go with this option.
Even though the on-demand strategy offers the most flexibility — in terms of Redshift pricing — a customer may be able to save quite a lot of money if they are sure that the cluster will be engaged over a longer period of time.
Redshift’s concurrency scaling is charged at on-demand rates on a per-second basis for every transient cluster that is used. AWS provides 1 hour of free credit for concurrency scaling for every 24 hours that a cluster remains active. The free credit is calculated on a per-hour basis.
Amazon Redshift Spectrum Pricing
Redshift Spectrum is a querying engine service offered by AWS allowing customers to use only the computing capability of Redshift clusters on data available in S3 in different formats. This feature enables customers to add external tables to Redshift clusters and run complex read queries over them without actually loading or copying data to Redshift.
Pricing of Redshift Spectrum is based on the amount of data scanned by each query and is fixed at 5$ per TB of data scanned. The cost is calculated in terms of the nearest megabyte with each megabyte costing .05 $. There is a minimum limit of 10 MB per query. Only the read queries are charged and the table creation and other DDL queries are not charged.
Redshift Pricing for Additional Features
Redshift offers a variety of optional functionalities if you have more complex requirements. Here are a handful of the most commonly used Redshift settings to consider adding to your configuration. They may be a little more expensive, but they could save you time, hassle, and unforeseen budget overruns.
1) RedShift Spectrum and Federated Query
One of the most inconvenient aspects of creating a Data Warehouse is that you must import all of the data you intend to utilize, even if you will only use it seldom. However, if you keep a lot of your data on AWS, Redshift can query it without having to import it:
- Redshift Spectrum: Redshift may query data in Amazon S3 for a fee of $5 per terabyte of data scanned, plus certain additional fees (for example, when you make a request against one of your S3 buckets).
- Federated Query: Redshift can query data from Amazon RDS and Aurora PostgreSQL databases via federated queries. Beyond the fees for using Redshift and these databases, there are no additional charges for using Federated Query.
2) Concurrency Scaling
Concurrency Scaling allows you to build up your data warehouse to automatically grab extra resources as your needs spike, and then release them when they are no longer required.
Concurrency Scaling price on AWS Redshift is a little complicated. Every day of typical usage awards each Amazon Redshift cluster one hour of free Concurrency Scaling, and each cluster can accumulate up to 30 hours of free Concurrency Scaling usage. You’ll be charged for the additional cluster(s) for every second you utilize them if you go over your free credits.
3) Redshift Backups
Your data warehouse is automatically backed up by Amazon Redshift for free. However, taking a snapshot of your data at a specific point in time can be valuable at times. This additional backup storage will be charged at usual Amazon S3 prices for clusters using RA3 nodes. Any manual backup storage that takes up space beyond the amount specified in the rates for your DC nodes will be paid in clusters employing DC nodes.
4) Reserve Instance
Redshift offers Reserve Instances in addition to on-demand prices, which offer a significant reduction if you commit to a one- or three-year term. “Customers often purchase Reserved Instances after completing experiments and proofs-of-concept to validate production configurations,” according to the Amazon Redshift pricing page, which is a wise strategy to take with any long-term Data Warehouse contracts.
Tools for keeping your Redshift’s Spending Under Control
Since many aspects of AWS Redshift pricing are dynamic, there’s always the possibility that your expenses will increase. This is especially important if you want your Redshift Data Warehouse to be as self-service as feasible. If one department goes overboard in terms of how aggressively they attack the Data Warehouse, your budget could be blown.
Fortunately, Amazon has added a range of features and tools over the last year to help you put a lid on prices and spot surges in usage before they spiral out of control. Listed below are a few examples:
- You can limit the use of Concurrency Scaling and Redshift Spectrum in a cluster on a daily, weekly, and/or monthly basis. And you can set it up so that when the cluster reaches those restrictions, it either disables the feature momentarily, issues an alarm, or logs the alert to a system table.
- Redshift pricing now includes Query Monitoring, which makes it simple to see which queries are consuming the most CPU time. This enables you to address possible issues before they spiral out of control. For Example, rewriting a CPU-intensive query to make it more efficient.
- Schemas, which are a way for constructing a collection of Database Objects, can have storage restrictions imposed. Yelp, for example, introduced the ‘tmp’ schema to allow staff to prototype Database tables. Yelp used to have a problem where staff experimentation would use up so much storage that the entire Data Warehouse would slow down. Yelp used these controls to solve the problem after Redshift added controls for Defining Schema Storage Limitations.
Optimizing Redshift ETL Cost
Now that we have seen the factors that broadly affect the Redshift pricing let’s look into some of the best practices that can be followed to keep the total cost of ownership down.
- Data Transfer Charges: Amazon charges for data transfer also and these charges can put a serious dent to your resources if not careful enough. Data transfer charges are applicable for intra-region transfer and every transfer involving data movement from or to the locations outside AWS. It is best to keep all your deployment and data in one region as much as possible. That said this is not always practical and customers need to factor in data transfer costs while finalizing the budget
- Tools: In most cases, Redshift will be used with the AWS Data pipeline for data transfer. AWS data pipeline only works for AWS-specific data sources and for external sources you may have to use other ETL tools which may also cost money. As a best practice, it is better to use a fuss-free ETL tool like Hevo Data for all your ETL data transfer rather than separate tools to deal with different sources. This can help save some budget and offer a clean solution.
- Vacuuming Tables: Redshift needs some housekeeping activities like VACUUM to be executed periodically for claiming the data back after deleting. Even though it is possible to automate this to execute on a fixed schedule, it is a good practice to run it after large queries that use delete markers. This can save space and thereby cost.
- Archival Strategy: Follow a proper archival strategy that removes less used data into a cheaper storage mechanism like S3. Make use of the Redshift spectrum feature in rare cases where this data is required.
- Data Backup: Redshift offers backup in the form of snapshots. Storage is free for backups up to 100 percent of the Redshift cluster data volume and using the automated incremental snapshots, customers can create finely-tuned backup strategies.
- Data Volume: While fixing node types, it is great to have a clear idea of the total data volume right from the start itself. dc2.8xlarge systems generally offer better performance than a cluster of eight dc2.xlarge nodes.
- Encoding Columns: AWS recommends customers use data compression as much as possible. Encoding the columns can not only make a difference to space but also can improve performance.
In this article, we discussed, in detail, about Redshift pricing model and some of the best practices to lower your overall cost of running processes in Amazon Redshift. Hence, let’s conclude by proving an extra council to control costs and increase the bottom line.
- Always use reserved instances, think for the long-term, and try to predict your needs and instances where saving over on-demand is high.
- Manage your snapshots well by deleting orphaned snapshots like any other backup.
- Make sure to schedule your Redshift clusters and define on/off timings because they are not needed 24×7.
That said, Amazon Redshift is great for setting up Data Warehouses without spending a load amount of money on infrastructure and its maintenance.
And, if your organization has multiple data sources, wants a straightforward solution for Amazon Redshift ETL use cases, Hevo Data can help. Hevo is a code-free ETL tool that can take care of your data transfer needs from most standard sources to Amazon Redshift and beyond with a few clicks.Visit our Website to Explore Hevo
Hevo Data with its strong integration with 100+ data sources (including 40+ free sources) allows you to not only export data from your desired data sources & load it to the destination of your choice such as Amazon Redshift, but also transform & enrich your data to make it analysis-ready so that you can focus on your key business needs and perform insightful analysis using BI tools.
Also, why don’t you share your reading experience of our in-detail blog and how it helped you choose or optimize Redshift pricing for your organization? We would love to hear from you!