Amazon Redshift clusters have a capacity of 16 nodes only. While the nodes are sufficient for running a considerable number of queries or hosting users, they could be inadequate for growing organizations to run massive queries.
If you try to query excess data on a cluster, Amazon Redshift will process the regular number of queries and put the rest in a queue. This may increase the time the system takes to provide query responses. But what if you want your Amazon Redshift account to process massive data queries in the same response time as regular ones? Then, you’ll need to activate the Amazon Redshift Concurrency Scaling feature.
The Amazon Redshift Concurrency Scaling enables you to query large quantities of data that have outsized your main cluster in concurrent clusters. In this article, you will get insight into the Amazon Redshift Concurrency Scaling feature, how you can enable it from your AWS console, and other useful information on Amazon Redshift Concurrency Scaling.
Table of Contents
Introduction to Amazon Redshift
Image Source
Amazon Redshift is Amazon’s cloud-based Data Warehouse for Big Data. This storage service offers data clusters where users can easily hold and query their data. This Data Warehouse is notable for its fast performance with regard to data queries. It uses techniques like Columnar Data Storage and Massive Parallel Processing (MPP) to generate query responses quickly.
Columnar Data Storage distributes data in columns rather than rows in order to reduce the storage space that a database occupies. As such, clusters with more storage capacity tend to work faster. On the other hand, Massive Parallel Processing shares the data query workload across the nodes in a cluster. As a result, all the nodes work simultaneously and provide the query response in record time.
Apart from Columnar Data Storage and Massive Parallel Processing, there is another feature in Amazon Redshift that facilitates quick query response. This technique is concurrent scaling, and it offers extra clusters for querying data that is larger than the main cluster’s capacity.
Key Features of Amazon Redshift
A few features of Amazon Redshift are listed below:
- Fault-Tolerant: Amazon Redshift continuously monitors the health of the cluster. It automatically replicates data regularly to avoid any data loss at the time of disaster.
- Data Sharing: Amazon Redshift allow users to save cost and improve performance by data sharing from single cluster to multi-cluster.
- Redshift ML: Amazon Redshifts allow Data Scientists, Data Analysts, Developers, and Business Professionals to create, test and deploy models using SQL.
To learn more about Amazon Redshift, click here.
Introduction to Concurrency Scaling
Image Source
Concurrency Scaling is an Amazon Redshift feature that offers extra clusters to users whose main clusters are insufficient for their data queries. Usually, when a user runs a set of data that exceeds the capacity of a cluster, Amazon Redshift assembles the excess queries in a queue. As such, massive queries take longer to process than the average number of queries.
However, a Redshift account holder may want to run a large number of queries within the regular response time. This is where concurrency scaling comes in. When you activate concurrency scaling, the system will provide you with concurrent clusters. You can use these extra clusters to run excess queries at the same time as the queries in your main cluster.
Activating the concurrency scaling feature attracts an additional cost on your monthly Amazon Redshift subscription fee. Nevertheless, the system will only charge you for the period in which your concurrent clusters were actively running queries.
To sum it up, Amazon Redshift allows the user to enjoy control over the concurrency scaling feature through WLM (Workload Management) queues. With the WLM queues, you can determine the queries you want to process in the concurrent clusters.
Hevo Data, a No-code Data Pipeline helps to load data from any data source such as Databases, SaaS applications, Cloud Storage, SDK,s, and Streaming Services and simplifies the ETL process. It supports 100+ data sources and is a 3-step process by just selecting the data source, providing valid credentials, and choosing the destination. Hevo not only loads the data onto the desired Data Warehouse/destination but also enriches the data and transforms it into an analysis-ready form without having to write a single line of code.
Get Started with Hevo for Free
Its completely automated pipeline offers data to be delivered in real-time without any loss from source to destination. Its fault-tolerant and scalable architecture ensures that the data is handled in a secure, consistent manner with zero data loss and supports different forms of data. The solutions provided are consistent and work with different BI tools as well.
Check out why Hevo is the Best:
- Secure: Hevo has a fault-tolerant architecture that ensures that the data is handled in a secure, consistent manner with zero data loss.
- Schema Management: Hevo takes away the tedious task of schema management & automatically detects the schema of incoming data and maps it to the destination schema.
- Minimal Learning: Hevo, with its simple and interactive UI, is extremely simple for new customers to work on and perform operations.
- Hevo Is Built To Scale: As the number of sources and the volume of your data grows, Hevo scales horizontally, handling millions of records per minute with very little latency.
- Incremental Data Load: Hevo allows the transfer of data that has been modified in real-time. This ensures efficient utilization of bandwidth on both ends.
- Live Support: The Hevo team is available round the clock to extend exceptional support to its customers through chat, E-Mail, and support calls.
- Live Monitoring: Hevo allows you to monitor the data flow and check where your data is at a particular point in time.
Sign up here for a 14-Day Free Trial!
Cluster Requirements for Concurrency Scaling
As noted by intermix.io, your Redshift cluster will only support concurrency scaling if it fulfills these three requirements:
- It must have at least two nodes. The maximum node limit for clusters that support concurrency scaling is 32.
- The cluster’s node type must be either ra3.16xlarge, ds2.8large, ds2.xlarge, dc2.large, ra3.4xlarge, or dc2.8large.
- The cluster must be an EC2 – VPC platform: This means that only users who run their queries in a virtual private cloud (VPC) can use concurrency scaling. An EC2-VPC platform is a private cloud network that is isolated to a single AWS account. EC2 -VPC platforms must be distinguished from EC2 – Classic platform where an AWS user runs their account on a shared network.
How to Enable Amazon Redshift Concurrency Scaling
Follow these steps to activate Amazon Redshift Concurrency Scaling on a cluster:
- Log on to the AWS Redshift Console. Then, select ‘Workload Management’ from the navigation bar on the left side of the screen.
- You’ll see a drop-down menu containing the WLM parameter groups for different clusters you’ve created. Click on the cluster you are using presently.
- Once you select the cluster’s WLM parameter group, a new column titled ‘Concurrency Scaling Mode’ will appear under each queue in the group.
Image Source
- Change the mode from ‘Off’ to ‘Auto’.
- Now, you can send queries to concurrent clusters whenever you want.
How to Monitor Amazon Redshift Concurrency Scaling
While processing concurrent queries, you may want to check whether the system has provided a response for the query. Take the steps below to view a concurrent query and monitor Amazon Redshift Concurrency Scaling:
- Go to the Amazon Redshift Console.
- Click on ‘Cluster’ on the navigation bar and choose an active cluster.
- Next, tap the ‘Queries’ tab under the chosen cluster. You’ll see a column titled ‘Executed on’.
- Browse the values in the ‘Executed on’ column to find your query.
Geographical Locations for Amazon Redshift Concurrency Scaling
Amazon Redshift Concurrency Scaling is only supported in the following AWS regions:
- Paris Region (eu-west-3)
- Canada Central Region (ca-central-1)
- Ireland Region (eu-west-1)
- Ohio Region (us-east-2)
- Singapore Region (ap-southeast-1)
- Oregon Region (us-west-2)
- Mumbai Region (ap-southeast-1)
- North Virginia Region (us-east-1)
- Sydney Region (ap-southeast-2)
- Tokyo Region (ap-northeast-1)
- Frankfurt Region (eu-central-1)
- North California Region (us-west-1)
- Seoul Region(ap-northeast-2)
- Sao Paulo Region (sa-east-1)
- London (eu-west-2)
Amazon Redshift Concurrency Scaling Compatibility
Amazon Redshift Concurrency Scaling is compatible with the following operations:
- Read queries such as dashboard queries.
- Basic write queries, like INSERT, COPY, UPDATE, and DELETE.
Amazon Redshift Concurrency Scaling Limitations
Amazon Redshift Concurrency Scaling does not work with the following operations:
- Temporary Tables: A temporary table is created only for usage during an active session. Once the session is over, the temporary table will no longer be visible.
- Tables with Interleaved Sort Keys: Tables with interleaved soft keys arrange data in columns, such that each column is just as important as the other.
- The ‘ANALYZE’ function is under the ‘COPY’ command.
- Running write queries on tables that contain identity columns.
- Running write queries on tables where DISSTYLE is activated.
- DDL operations, like ALTER TABLE and CREATE TABLE.
- COPY queries from Amazon EMR or Amazon Redshift Spectrum.
- Queries for PostgreSQL catalog tables, system tables, and no-backup tables.
- Queries with Python User-Defined Functions (UDFs).
- Queries that require access to external resources secured by a VPC.
Conclusion
In this article, you learnt about Amazon Redshift Concurrency Scaling – how and where it works, what it is used for as well as its limitations. Next time you want to run massive queries, don’t waste your time waiting for the system to run the queries in queues. Instead, activate the Amazon Redshift Concurrency Scaling feature, and complete your queries in record time. Amazon Redshift is widely used by organizations to satisfy their business requirements and engage with data-driven business solutions.
Visit our Website to Explore Hevo
Amazon Redshift stores data from multiple sources and every source follows a different schema. Instead of manually writing scripts for every source to perform the ETL (Extract Transform Load) process, one can automate the whole process. Hevo Data is a No-code Data pipeline solution that can help you transfer data from 100+ sources to Amazon Redshift or other Data Warehouse of your choice. Its fault-tolerant and user-friendly architecture fully automates the process of loading and transforming data to destination without writing a single line of code.
Want to take Hevo for a spin? Sign Up here for a 14-day free trial and experience the feature-rich Hevo suite first hand.
Share your experience of learning about the Amazon Redshift Concurrency Scaling in the comments section below!