It is common-place for businesses or organizations to seek ways of reducing the cost of operations and managing resources. One of such ways is to be able to resize your data by scaling it up or down depending on what is required then to save man-hours, monetary value, and provide resources for other purposes. Enter Redshift Elastic Resize.
Using Redshift Elastic Resize operation, data can be saved in the cloud or on-premise but the primary difference between an on-premise storage location and cloud-based storage is elasticity. The disadvantage with on-premise storage is that it cannot be scaled down to some nodes from hundreds or scaled up again either, it cannot also be scaled on a scheduled basis, hence, everyone is migrating data to the cloud to manage resources easily where resizing is possible to help improve your business budget.
This piece aims to look at how you can use Amazon Redshift, a cloud-based data warehousing platform to rescale your data through elastic resize. On Amazon Redshift, a node is a collection of computing resources that are organized into a group called a cluster. Each cluster runs an Amazon Redshift engine and contains one or more databases.
This article will further show key considerations to make before working with Redshift Elastic Resize, the best practices involved, and ultimately, showing that elastically resizing clusters can result in huge cost savings.
Table of Content
- What is Amazon Redshift?
- Overview of Redshift Elastic Resize: Elastic, Classic, and Snapshot, Restore, Resize.
- Working with Redshift Elastic Resize
- Conclusion
What is Amazon Redshift?
Amazon Redshift is a cloud-based warehouse where all your data can be saved or managed, it is one of the most popular and frequently used data-related services typically used for high volume data aggregations. Redshift can be described as a columnar data warehouse service on AWS cloud that can scale to petabytes of storage, and the infrastructure for hosting this warehouse is fully managed by AWS cloud.
Redshift operates a clustered model with a leader node, and multiple worker nodes, like any other clustered or distributed database models in general. It is based on Postgres, so it shares a lot of similarities with Postgres, including the query language, which is near identical to Structured Query Language (SQL). It also supports the creation of almost all the major database objects like Databases, Tables, Views, and even Stored Procedures. In this article, we will explore how to create your first Redshift cluster on AWS.
Redshift is easy to operate as you can query and combine exabytes of structured and semi-structured data across various data warehouses, operational databases, and data lakes, allowing you to perform large-scale database migrations. Redshift also lets you save the results of your query to your S3 data lake using open formats like Apache Parquet where additional analysis can be done from EMR, Athena, and SageMaker.
Overview of Redshift Elastic Resize: Elastic, Classic, and Snapshot, Restore, Resize.
Resizing your clusters on Redshift becomes important as your needs change or grow to manage your data warehousing capacity and make good use of the computing and storage units available on offer by Amazon Redshift.
You can resize your cluster by using either of the three methods listed and explained below:
- Redshift Elastic Resize: This is used to change the node type, number of nodes, or both of your cluster by redistributing data slices, therefore, requiring fewer resources. This article will dwell mainly on elastic resize, hence, it will be discussed further in the next section.
- Classic Resize: This option is mostly used when you are trying to resize to a configuration that is not available through the elastic resize as it typically takes a long time to resize your clusters than the elastic resize because it involves provisioning a new cluster and copying data blocks to it. It is also used to change the node type, number of nodes, or to do both and this can take up to two or more hours to achieve. In some cases, this may even take up to several days depending on the size of your data and during the operation, your data from the source cluster is in read-only mode.
- Snapshot, Restore, and Resize: This method is used when you want to keep your cluster available during a classic resize. This is done by taking a snapshot of the data, making a copy of the existing cluster then resizing the new cluster. This approach requires that any data that is written to the source cluster after a snapshot is taken must be copied manually to the target cluster after the switch.
Hevo Data, a No-code Data Pipeline, helps you directly transfer data from 100+ data sources to Data Warehouses, BI tools, or a destination of your choice in a completely hassle-free & automated manner.
Hevo is fully managed and completely automates the process of not only loading data from your desired source but also enriching the data and transforming it into an analysis-ready format without having to write a single line of code. Its fault-tolerant architecture ensures that the data is handled in a secure, consistent manner with zero data loss.
Get Started with Hevo for Free
Check out why Hevo is the Best:
- Secure: Hevo has a fault-tolerant architecture that ensures that the data is handled in a secure, consistent manner with zero data loss.
- Schema Management: Hevo takes away the tedious task of schema management & automatically detects the schema of incoming data and maps it to the destination schema.
- Minimal Learning: Hevo, with its simple and interactive UI, is extremely simple for new customers to work on and perform operations.
- Hevo Is Built To Scale: As the number of sources and the volume of your data grows, Hevo scales horizontally, handling millions of records per minute with very little latency.
- Incremental Data Load: Hevo allows the transfer of data that has been modified in real-time. This ensures efficient utilization of bandwidth on both ends.
- Live Support: The Hevo team is available round the clock to extend exceptional support to its customers through chat, E-Mail, and support calls.
- Live Monitoring: Hevo allows you to monitor the data flow and check where your data is at a particular point in time.
Sign up here for a 14-Day Free Trial!
Working with Redshift Elastic Resize
As stated in the previous chapter, this part of the write-up will dive further into Redshift Elastic Resize by defining what elastic resize means, stating the stages involved in resizing, giving you key points to consider before carrying out the operations, and finally, the step by step process to achieving this.
Meaning and Definition
Redshift Elastic Resize is the fastest method you can use in resizing clusters in Amazon Redshift as this can be used to change, add, or remove nodes and change node types in less than no time. Redshift Elastic resize can be carried out in two ways: when the target node type is the same as the existing node type, and when the target node type is different from the existing node type.
When you resize your cluster using elastic resize with the same node type as the target node, it automatically redistributes the data to the new nodes as it does not need to create a new cluster, therefore, making the resize operation complete quickly.
The elastic resize of your cluster using the same node type as the target node occurs in the following stages:
- Elastic resize takes a cluster snapshot that does not include backup tables.
- The cluster becomes temporarily unavailable as cluster metadata is being migrated. During this stage, Amazon Redshift holds session connections making queries to be queued.
- Session connections are reinstated and queries resumed.
- Data is redistributed to the node slices in the background after which the cluster becomes available for reading and write operations.
When elastic resize is done to resize a cluster with a different node type from the target and existing node, a snapshot is created as a new cluster with the latest data from the snapshot is provisioned for you. The original cluster will be temporarily unavailable for writes as the data is being transferred to the new cluster and the new cluster is populated in the background after which queries should reach optimal levels.
Amazon Redshift sends you an event notification from which you can connect to your new cluster to resume running read and write queries. The following best describes the resize process when the target cluster has a different node type from the source:
- When the resize process is initiated, Amazon Redshift sends an event notification that acknowledges the resize request and starts to provision the new target cluster.
- Amazon sends an event notification after provisioning the target cluster that the resize has commenced then restarts the existing cluster in read-only mode and terminates all existing connections to the cluster.
- Copying of data starts from the source cluster to the new cluster.
- When the resize operation nears completion, the endpoint of the new cluster is updated and all connections to the original cluster are terminated by Amazon Redshift.
- An event notification is sent by Amazon Reshift to indicate that the resize process is completed so you can connect to the target or new cluster to begin running read and write queries.
Key Considerations Before Carrying Out Elastic Resize
- Redshift Elastic resize usually requires less time to complete than the classic resize.
- Redshift Elastic resize can not be used on a single-node cluster, consider performing the Classic resize only if you are resizing a single-node cluster.
- Redshift Elastic resize does not sort tables or reclaim disk space. To avoid this, it is advisable to run VACCUM to sort tables and reclaim disk space.
- After the process of an elastic resize has been started in Amazon Redshift, the operation can not be canceled so you have to wait for completion before embarking on another resize operation or cluster reboot.
- The new or target node configuration must have enough storage for existing data.
- Perform a classic resize when you observe data skew in your Amazon Reshift cluster as performing an elastic resize can cause data to skew between nodes from uneven distribution of slices.
- Elastic resize is only available for the Amazon Redshift clusters that use the EC2-VPC platform.
Best Practices for Redshift Elastic Resize
The following are best practices for Redshift Elastic Resize operation:
- Take a manual snapshot before starting your elastic resize process especially when you are resizing a new cluster. Amazon Redshift also takes a periodic automated snapshot but deletes it at the end of a retention period but the manually taken snapshot is retained indefinitely even after the cluster is deleted.
- Get possible node configurations for the resize operation by using the describe-node-configuration-options AWS CLI command and this is best done by using the most recent version of the CLI.
- It is important to VACUUM the cluster before resizing as elastic resize does not automatically delete rows that are marked for deletion or sort tables.
- Node configuration changes are easily specified using the resize-cluster command.
- Snapshots can be managed using the console or Amazon Redshift CLI and API.
Steps Involved in Setting up Redshift Elastic Resize
When a cluster is resized, you specify the number of nodes or node type if it differs from the existing cluster node type as has been explained in the preceding chapters. This elastic resize can be done on Amazon Redshift’s New console or its Original console based on the console you have, the New console instructions are usually the default, and the steps on setting up an elastic resize are listed below:
- Sign in to the AWS Management Console and open the Amazon Redshift console using https://console.aws.amazon.com/redshift/
- Choose CLUSTERS found on the navigation menu.
- Select the cluster to be resized.
- For Actions, select Resize to bring up the Resize cluster page.
Image Source: Amazon Redshift
- Select Elastic resize and follow the instructions stated on the page to resize your cluster to your needed specifications.
Image Source: Amazon Redshift
- You can choose to resize the cluster now, once at a time, or increase and decrease the size of your cluster on a schedule.
Image Source: Amazon Redshift
- Choose Resize now or Schedule resize based on your needs.
If you are using the Original console, do the following:
- Sign in to the AWS Management Console and open the Amazon Redshift console using https://console.aws.amazon.com/redshift/
- Choose Clusters in the navigation pane and then select the cluster to be resized.
Image Source: Amazon Redshift
- Choose Cluster on the Configuration tab of the cluster details page, then choose Resize.
- In the Resize Clusters window, set up the resize parameters such as Node Type, Cluster Type, and Number of Nodes, then select Resize.
Image Source: Amazon Redshift
Conclusion
In this article, you have learned how to scale data by elastically resizing Amazon Redshift clusters. It mentioned different options to perform resizing of clusters on Redshift other than Elastic resize such as Classic resize, Snapshot, Restore, and Resize.
It went further to show that elastic resize can be done on clusters with the same or different node types, on-demand or on a scheduled basis using schedule resize at a later time or schedule recurring resize events.
Alternatively, to effectively manage your data by scaling them to what is required, Hevo Data is an efficient platform that can help you to manage and scale your data without much hassle and also provide support for you throughout your data lifecycle.
Hevo Data is a no-code data pipeline platform that helps new-age businesses integrate their data from multiple sources systems to a data warehouse and plug this unified data into any BI tool or Business Application. The platform provides 100+ ready-to-use integrations with a range of data sources and is trusted by hundreds of data-driven organizations from 30+ countries.
Visit our Website to Explore Hevo
Want to take Hevo for a spin? Sign Up for a 14-day free trial and experience the feature-rich Hevo suite first hand. You can also have a look at the unbeatable pricing that will help you choose the right plan for your business needs.
And, don’t forget to share your take on the Redshift Elastic Resize in the comments section below!
Ofem is a freelance writer specializing in data-related topics, who has expertise in translating complex concepts. With a focus on data science, analytics, and emerging technologies.