Press "Enter" to skip to content

Redshift Architecture and Its Components

amazon redshift Architecture

Amazon Redshift is a fully managed highly scalable data warehouse service in AWS. You can start using Redshift with even a few GigaBytes of data and scale it to PetaBytes or more. In this article, we will talk about Amazon Redshift architecture and its components, at a high level. Here is the list of features covered: 

  1. Leader Node
  2. Compute Node
  3. Node Slices
  4. Massively Parallel Processing
  5. Columnar Data Storage
  6. Data Compression
  7. Query Optimizer

Redshift Architecture

Redshift is meant to work in a Cluster formation. A typical Redshift Cluster has two or more Compute Nodes which are coordinated through a Leader Node. All client applications communicate with the cluster only with the Leader Node. 

#1 – Leader Node

The Leader Node in an Amazon Redshift Cluster manages all external and internal communication. It is responsible for preparing query execution plans whenever a query is submitted to the cluster. Once the query execution plan is ready, the Leader Node distributes query execution code on the compute nodes and assigns slices of data to each to compute node for computation of results.

Leader Node distributes query load to compute node only when the query involves accessing data stored on the compute nodes. Otherwise, the query is executed on the Leader Node itself. There are several functions in Redshift architecture which are always executed on the Leader Node. You can read SQL Functions Supported on the Leader Node for more information on these functions.

#2 – Compute Nodes

Compute Nodes are responsible for the actual execution of queries and have data stored with them. They execute queries and return intermediate results to the Leader Node which further aggregates the results.

There are two types of Compute Nodes available in Redshift architecture:

  • Dense Storage (DS) – Dense Storage nodes allow you to create large data warehouses using Hard Disk Drives (HDDs) for a low price point.
  • Dense Compute (DC) – Dense Compute nodes allow you to create high-performance data warehouses using Solid-State Drives (SSDs).

A more detailed explanation of how responsibilities are divided among Leader and Compute Nodes are depicted in the diagram below:

Redshift Architecture - Leader and Compute Nodes


#3 – Node slices

A compute node consist of slices. Each Slice has a portion of Compute Node’s memory and disk assigned to it where it performs Query Operations. The Leader Node is responsible for assigning a Query code and data to a slice for execution. Slices once assigned query load work in parallel to generate query results.

Data is distributed among the Slices on the basis of Distribution Style and Distribution Key of a particular table. An even distribution of data enables Redshift to assign workload evenly to slices and maximizes the benefit of parallel processing.

Number of Slices per Compute Node is decided on the basis of the type of node. You can find more information on Clusters and Nodes.

#4 – Massively parallel processing (MPP)

Amazon Redshift architecture allows it to use Massively parallel processing (MPP) for fast processing even for the most complex queries and a huge amount of data set. Multiple compute nodes execute the same query code on portions of data to maximize parallel processing.

#5 – Columnar Data Storage

Data in Amazon Redshift data warehouse is stored in a columnar fashion which drastically reduces the I/O on disks. Columnar storage reduces the number of disk I/O requests and minimizes the amount of data loaded into the memory to execute a query. Reduction in I/O speeds up query execution and loading less data means Redshift can perform more in-memory processing.

Redshift uses Sort Keys to sort columns and filter out chunks of data while executing queries. You can read more about Sort Keys in our post on Choosing the best Sort Keys

#6 – Data Compression

Data compression is one of the important factors in ensuring query performance. It reduces storage footprint and enables loading of large amounts of data in the memory fast. Owing to Columnar storage, Redshift can use adaptive compression encoding depending on the column data type. Read more about using compression encodings in Compression Encodings in Redshift.

#7 – Query Optimizer

Redshift’s Query Optimizer generate query plans that are MPP-aware and takes advantage of Columnar Data Storage. Query Optimizer uses analyzed information about tables to generate efficient query plans for execution. Read more about Analyze to know how to make the best of Query Optimizer.

We, at Hevo (explore our 14-day free trial), provide an ETL solution which can help bring your data from various sources to Redshift in real-time. You can reach out to us if you need help in setting up your Redshift clusters or connecting your data sources to Amazon Redshift data warehouse. Read more about Redshift benefits and evaluate its pros and cons.

ETL Data to Redshift, Bigquery, Snowflake

Move Data from any Source to Warehouse in Real-time

Sign up today to get $500 Free Credits to try Hevo!
Start Free Trial
  • Shobhit Singh

    We have a large team of analysts that will simutaneously use Redshift. Was wondering how many queries can we run on a Redshift cluster? Will more people querying Redshift slow down the queries?

    • Sourabh

      Thanks Shobhit for the question. There are 2 parts to your questions.

      How many queries can we run on a Redshift cluster?

      Queries in Amazon Redshift are routed to query queues. Each query queue contains a number of query slots, and can run upto 50 queries concurrently.
      Amazon Redshift allocates, by default, an equal, fixed share of available memory to each queue, and an equal, fixed share of a queue’s memory to each query slot in the queue.
      As a best practice, AWS recommends using a concurrency level of 15 or lower.

      Will more people querying Redshift slow down the queries?

      In case, you are running small queries (which have to wait for the long-running queries), it is advisable to create a separate queue with a higher concurrency level and assign the smaller queries to that queue.
      A queue with a higher concurrency level has less memory allocated to each query slot, but the smaller queries require less memory.

      If you have multiple queries that each access data on a single slice, you should set up a separate WLM queue to execute those queries concurrently. Amazon Redshift assigns concurrent queries to separate slices, which allows multiple queries to execute in parallel on multiple slices. You can read more about it in the official AWS doc here –

  • Lalit Prakash

    Hi Sourabh, We are looking to use Redshift for taking daily snapshots of our transaction database by copying the tables through a nightly job. Does Redshift automatically reclaim space when we delete or update old rows? Or will he have to run vacuum after every dump?