AuroraDB is a relational database engine that comes as one of the options in the AWS Relational Database as a service. Amazon Redshift, on the other hand, is another completely managed database service from Amazon that can scale up to petabytes of data. Even though the ultimate aim of both these services is to let customer store and query data without getting involved in the infrastructure aspect, these two services are different in a number of ways. In this post, we will explore Amazon Redshift Vs Aurora – how these two databases compare with each other in case of various elements and which one would be the ideal choice in different kinds of use cases.
Understanding Amazon Redshift and Aurora
AWS Redshift Features
Redshift is a completely managed database service that follows a columnar data storage structure. Redshift offers ultra-fast querying performance over millions of rows and is tailor-made for complex queries over petabytes of data. Redshift’s querying language is similar to Postgres with a smaller set of datatype collection.
With Redshift, customers can choose from multiple types of instances that are optimized for performance and storage. Redshift can scale automatically in a matter of minutes in the case of the newer generation nodes. Automatic scaling is achieved by adding more nodes. A cluster can only be created using the same kind of nodes. All the administrative duties are automated with little intervention from the customer needed. You can read more on Redshift Architecture here.
Redshift uses a multi-node architecture with one of the nodes being designated as a leader node. The leader node handles client communication, assigning work to other nodes, query planning, and query optimization. Redshift’s pricing combines storage and computing with the customers and does not have the pure serverless capability. Redshift offers a unique feature called Redshift spectrum which basically allows the customers to use the computing power of Redshift cluster on data stored in S3 by creating external tables.
AWS Aurora Features
AuroraDB is a MySQL and Postgres compatible database engine; which means if you are an organization that uses either of these database engines, you can port your database to Aurora without changing a line of code. Aurora is enterprise-grade when it comes to performance and availability. All the traditional database administration tasks like hardware provisioning, backing up data, installing updates and the likes are completely automated.
Aurora can scale up to a maximum of 64 TB. It offers replication across multiple availability zones through what Amazon calls as multiAZ deployment. Customers can choose from multiple types of hardware specifications for their instances depending on the use cases. Aurora also offers a serverless feature that enables a completely on-demand experience where the database will scale down automatically in case of lower loads and vice-versa. In this mode, customers only need to pay for the time the database is active, but it comes at the cost of a slight delay in response to requests that comes during the time database is completely scaled down.
Amazon offers a replication feature through its multiAZ deployment strategy. This means your data is going to be replicated across multiple regions automatically and in case of a problem with your master instance, Amazon will switch to one among the replicas without affecting any loads.
Aurora architecture works on the basis of a cluster volume that manages the data for all the database instances in that particular cluster. A cluster volume spans across multiple availability zones and is effectively virtual database storage. The underlying storage volume is on top of multiple cluster nodes which are distributed across different availability zones. Separate from this, Aurora database can also have read-replicas. Only one instance usually serves as the primary instance and it supports reads as well as writes. The rest of the instances serve as read-replicas and load balancing needs to be handled by the user. This is different from the multiAZ deployment, where instances are located across the availability zone and support automatic failover.
Amazon Redshift Vs Aurora – Comparison
Amazon Redshift Vs Aurora – Scaling
Redshift offer scaling by adding more number of nodes or upgrading the nodes. Redshift scaling can be done automatically, but the downtime in case of Redshift is more than that of Aurora. Redshift’s concurrency scaling feature deserves a mention here. This feature is priced separately and allows a virtually unlimited number of concurrent users with the same performance if the budget is not a problem.
Aurora enables scaling vertically or horizontally. Vertical scaling is through upgrading instance types and in case of multiAZ deployment, there is minimal downtime associated with this. Otherwise, the scaling can be scheduled during the maintenance time window of the database. Aurora horizontal scaling is through read-replicas and an aurora database can have at most 15 read-replicas at the same time. Aurora compute scaling is different from storage scaling and what we mentioned above is only about compute scaling. Aurora storage scaling is done by changing the maximum allocated storage space or storage hardware type like SSD or HDD.
Amazon Redshift Vs Aurora – Storage Capacity
Redshift can practically scale to petabytes of data and run complex queries out of them. Redshift can support up to 60 user-defined databases per cluster. Aurora, on the other hand, has a hard limit at 64 TB and the number of database instances is limited at 40.
Amazon Redshift Vs Aurora – Data Loading
Redshift ETL also supports the COPY command for inserting data. It is recommended to insert data split into similar sized chunks for better performance. In the case of data already existing in Redshift, you may need to use temporary tables since Redshift does not ensure unique key constraints. A detailed account of how to do ETL on Redshift can be found here.
Data loading in Aurora will depend on the type of instance type that is being used. In the case of MySQL compatible instances, you would need to use the mysqlimport command or LOAD DATA IN FILE command depending on whether the data is from a MySQL table or file. Aurora with Postgres can load data with the COPY command.
An alternative to this custom script based ETL is to use a hassle-free Data Pipeline Platform like Hevo which can offer a very smooth experience implementing ETL on Redshift or Aurora with support for real-time data sync, in-flight data transformations and much more.
Hevo – A Simple Method to Load Data into Redshift or Aurora
An easy and hassle-free method to move data from any data source to Redshift or Aurora would be to use a Data Pipeline Platform like Hevo Data (14-Day Free Trial).
Hevo can help you move data from 100s of data sources, in real-time, without writing any code.
Hevo’s complex AI-based algorithms and modern architecture make it the most reliable platform to move data to Aurora or Redshift in a secure and consistent fashion.
Amazon Redshift Vs Aurora – Data Structure
Aurora follows row-oriented storage and supports the complete data types in both MySQL and Postgres instance types. Aurora is also ACID complaint. Redshift uses columnar storage structure and is optimized for column level processing than complete row level processing.
Redshift’s Postgres like querying layer misses out on many data types which are supported by Aurora’s Postgres instance type. Redshift does not support consistency among the ACID properties and only exhibits eventual consistency. It does not ensure referential integrity and unique key constraints.
Amazon Redshift Vs Aurora – Performance
Redshift offers fast read performance and over a larger amount of data when compared to Aurora. Redshift excels specifically in the case of complicated queries spanning millions of rows.
Aurora offers better performance than a traditional MySQL or Postgres instance. Aurora’s architecture disables the InnoDB change buffer for distributed storage leading to poor performance in case of write-heavy operations. If your use case updates heavy, it may be sensible to use traditional databases like MySQL or Postgres than Aurora.
Both the services offer performance optimizations using sharding and key distribution mechanisms. Redshift’s SORT KEY and DIST KEY needs to be configured here for improvements in complex queries involving JOINs.
Aurora is optimized for OLTP workloads and Redshift is preferred in case OLAP workloads. Transactional workloads are not recommended in Redshift since it supports only eventual consistency.
Amazon Redshift Vs Aurora – Security
When it comes to Security, there is nothing much to differentiate between the two services. With both being part of the AWS portfolio, they offer the complete set of security requirements and compliance. Data is ensured to be encrypted at rest and motion. There are provisions to establish virtual private clouds and restrict usage based on Amazon’s Identity and Access management. Other than these, customers can also use the specific security features that are part of Postgres and MySQL instance types with Aurora.
Amazon Redshift Vs Aurora – Maintenance
Both Aurora and Redshift are completely managed services and required very little maintenance. Redshift because of its delete marker-based architecture needs the VACUUM command to be executed periodically to reclaim the space after entries are deleted. These can be scheduled periodically, but it is a recommended practice to execute this command in case of heavy updates and delete workload. Redshift also needs the ANALYZE command to be executed periodically to keep the metadata up to data for query planning.
Amazon Redshift Vs Aurora – Pricing
Redshift pricing is including storage and compute power. Redshift starts at .25$ per hour for the dense compute instance types per node. Dense compute is the recommended instance type for up to 500 GB of data. For the higher-spec dense storage instance types, pricing starts at .85$. It is to be noted that these two services are designed for the different use case and pricing can not be compared independent of the customer use cases.
Aurora MySQL starts with .041$ per hour for its lowest spec instance type. Aurora Postgres starts at .082$ per hour for the same type of instance. The memory-optimized instance types with higher performance start for .29$ for both MySQL and Postgres instance types. Aurora’s serverless instances are charged based on ACU hours and start at .06$ per ACU hour. Storage and IO are charged separately for Aurora. It costs .1 $ per GB per month and .2$ per 1 million requests. Aurora storage pricing is based on the maximum storage ever used by the cluster and it is not possible to reclaim space after deleted without re instantiating the database.
An obvious question after such a long comparison is about how to decide when to use Redshift and Aurora for your requirement. The following section summarizes the scenarios in which using one of them may be beneficial over the other.
Amazon Redshift Vs Aurora – Use cases
When to use Amazon Redshift?
- The requirement is an Online analytical processing workload and not transactional.
- You have a high analytical workload and running on your transactional database will hurt the performance.
- Your data volume is in hundreds of TBs and you anticipate more data coming in.
- You are willing to let go of the consistency compliance and will ensure the uniqueness of your keys on your own.
- You are ready to put your head into designing SORT KEYS and DIST KEYS to extract the maximum performance.
When to use Aurora?
- You want to relieve yourself of the administrative tasks of managing a database but want to stick with MySQL or Postgres compatible querying layer.
- You want to stay with traditional databases like MySQL or Postgres but want better read performance at the cost of slightly lower write and update performance.
- Your storage requirements are only in the TBs and do not anticipate 100s of TBs of data in the near future.
- You have an online transactional processing use case and want quick results with a smaller amount of data.
- Your OLTP workloads are not interrupted by analytical workloads
- Your analytical workloads do not need to process millions of rows of data.
What use case are you evaluating these platforms for? Let us know in the comments. We would be happy to help solve your dilemma.