Completely managed databases offered as cloud services have transformed the application design landscape to the extent that even the smallest of organizations can now enjoy the scalability and reliability without being bound to costly licenses and infrastructure maintenance. In this post, we compare two such popular databases offered as a cloud service within the AWS suite – Amazon Redshift Vs DynamoDB.
Amazon Redshift is a completely managed data warehouse service with a Postgres compatible querying layer. DynamoDB is a NoSQL database offered as a service with a proprietary query language. Now that we have established the only common attribute between these two services is the ‘database’ part, let’s go into more details on the difference between Redshift and DynamoDB.
Amazon Redshift Vs DynamoDB – Features
AWS DynamoDB Features
DynamoDB is a key-value and document database. Every record in a DynamoDB database can be considered as a map of attributes and values with a primary key to retrieve the record. As with all NoSQL databases, the table structure does not mandate any particular structure for data. A non-key attribute that is present in one document could be absent in the next document and the database will not complain; which means DynamoDB can very well accommodate semi-structured data as well.
Amazon provides a proprietary query language that can retrieve rows based on the primary key and other key-value pairs in the data. It has the capacity to autoscale by dynamically changing the provisioned capacity without affecting the query loads. This coupled with the pricing strategy based on the number of requests and occupied storage makes it a very economical option for NoSQL use cases.
Architecturally, DynamoDB is organized into nodes and slices of data with each node handling a range of primary keys. When a request comes, the capacity of only the node with that particular primary key range gets utilized which makes it very suitable for workloads distributed uniformly across primary keys. DynamoDB also offers a unique capability called streams which provides an ordered log of change events in a table. This can be used as a trigger for any related data processing.
Amazon Redshift Features
Amazon Redshift is a data warehouse offered as a service. A data warehouse is different from a database in the sense that it is more optimized for analytical queries rather than transactional queries. Redshift has a PostgreSQL compatible querying layer that can handle very complex queries to return lightning-fast results even in case of scans spanning over millions of rows. Redshift allows its customers to choose from instances optimized for performance and storage and offers a pricing strategy combining both storage and compute resources. You can read more about Redshift pricing here.
Redshift architecture involves a cluster of nodes with one of them being designated as a leader node. The leader node handles all query optimization, client communication, execution plan creation and task assignment to individual nodes. A detailed note on Redshift architecture can be found here.
Similar to DynamoDB, Redshift also can scale at will and offers a feature called elastic resize in its newer generation nodes. Elastic resize can help the customers to scale their cluster in a matter of minutes by adding more nodes. Scaling can also be accomplished by upgrading the existing nodes.
Amazon Redshift Vs DynamoDB – Scaling
DynamoDb offers two modes of operation for its customers.
- On-demand mode where the pricing will be based on the actual read and write requests. In on-demand mode, Scaling happens seamlessly with DynamoDb automatically ramping resources up and down.
- Provisioned capacity mode where the customers can specify a specific capacity request rate according to their utilization. Autoscaling works in this mode as well but within the minimum and maximum range specified by the administrator.
For Redshift, scaling can be done by either upgrading the nodes, adding more nodes or both. Redshift’s elastic resize feature can accomplish this in a matter of minutes. Redshift also has a concurrency scaling feature, which if enabled can automatically scale the resources as per the need up to a maximum cluster size limit specified by the user.
Amazon Redshift Vs DynamoDB – Storage capacity
Redshift has its storage capacity limit at 2 PB size when the ds2.8x large storage type instance is used.
For DynamoDb, Amazon does not officially provide any limit for the maximum table size. But there is a limit of 400 KB for each item in the table. An item size includes both the attribute name size and attribute value size since the DynamoDB is based on a key-value structure.
Amazon Redshift Vs DynamoDB – Data replication
Data loading to Redshift is done by first copying the data to S3 and then using the COPY command to load it into tables. This manual way of loading could pose problems in case the target table already has data in it. Using a staging table can mitigate this.
DynamoDB also can load data in the form of JSON from S3. AWS data pipeline offers built-in templates for loading data to DynamoDB as well. AWS Data migration service is another option that can be considered.
AWS also provides services for loading and updating data to these databases. However, these add on services are designed by keeping AWS source systems in mind.
A customer with various source systems spanning across the cloud ecosphere, may not have the best of experience while using these services. Using a Data Integration platform like Hevo that can integrate with a multitude of data sources and targets destinations can be a better option.
Hevo Data – An Easy and Reliable way to Move Data to Redshift
Hevo can load data from any source into Amazon Redshift in real-time, without having to write any code.
Hevo’s AI-powered architecture seamlessly does all the grunt-work right from mapping complex schema to handling data type conversions – ensuring a smooth and secure data load experience for you. This will empower you to focus on delivering insights to your team faster than ever.
Sign up for Hevo’s 14-day free trial to experience a hassle-free data migration to Amazon Redshift.
Amazon Redshift Vs DynamoDB – Pricing
The difference in structure and design of these database services extends to the pricing model also. Redshift pricing is defined in terms of instances and hourly usage, while DynamoDB pricing is defined in terms of requests and capacity units.
AWS Redshift Pricing
Redshift starts at .25$ per hour for the lowest specification current generation dense compute instance. It offers a second type of instance using SSDs called dense storage instances starting at .85$. Redshift offers one hour of concurrency scaling for every 24 hours of the cluster staying operational. Redshift spectrum which allows the customers to use only the compute engine of Redshift is priced on a per-query basis with a standard rate of 5$ per TB of data scanned.
You can read more about Amazon Redshift pricing here.
AWS DynamoDB Pricing
DynamoDB offers two types of pricing strategies. In the on-demand model, pricing is defined in terms of the number of requests with the write requests priced at $1.25 per million requests. The read requests are priced at $.25 per million requests.
In the provisioned capacity mode, pricing is defined in terms of Read and Write Capacity Units (RCU and WCU). DynamoDB has different kinds of reads – strongly consistent, eventually consistent and transactional. Each type of read requires a different amount of RCU. Strongly consistent reads require 1 RCU, eventually consistent read requires half RCU and transactional Read required 2 RCUs. One RCU is sufficient up to 4KB of data read. One WCU is good enough for up to 1 KB of data write. WCUs are priced at $.00065 and RCUs are priced at $.00013.
A challenge with the provisioned mode is that the capacity units provisioned are shared across all the nodes. Since DynamoDB works on the basis of nodes and primary key partitions, if one of your nodes has a primary key with very high demand, the capacity has to be increased for all nodes. This can result in high costs in the on-demand and auto-scaling modes.
Amazon Redshift Vs DynamoDB – Performance
Since both the databases are designed for different kinds of storage, comparing performance is not a straight forward job. Redshift offers great performance when it comes to complex queries scanning millions of rows. Redshift performance can be further optimized by using SORT KEYS and DIST KEYS.
DynamoDB has a limitation when it comes to complex queries and there is no scope for executing queries containing multiple search criteria and sorting based on different columns. But when it comes to simple queries spanning across a large number of rows, DynamoDB offers good performance with the ability to handle up to 20 million requests per second. DynamoDB allows the use of a PRIMARY KEY which is a combination of a PARTITION KEY and SORT KEY to optimize the read request latency.
Amazon Redshift Vs DynamoDB – Data structure
Redshift is a relational data warehouse service that uses columnar storage to optimize the analytical workloads where queries involve selection and aggregation based on columns. Even though Redshift is known to be a relational database, it lacks the ability to enforce unique key constraints.
DynamoDB is a NoSQL database, which means data is referred to in terms of records that do not need to conform to any structure other than having the primary key value. This difference is structure also means DynamoDB does not have the ability to execute JOIN queries. DynamoDB also has a problem in queries that involve sorting based on fields that are not designated as SORT KEYs.
Amazon Redshift Vs DynamoDB – Use cases
DynamoDB and Redshift use entirely different data structures and are optimized for different kinds of applications. The following section intends to enumerate the various use cases one of them fits better than the other from our experience.
Amazon Redshift Use cases
- You want a petabyte-scale data warehouse and do not want to spend time and effort on maintaining an elaborate infrastructure.
- The use case is an online analytical processing workload involving complex queries that span across a large number of rows.
- The analytical workload is heavy enough to be separated such that if executed in your OLTP database, it can cause problems in transaction processing.
- Maintaining unique key constraints can be done at the application level and database level validation is not required.
- The customer is well versed with DIST KEYs and SORT KEYs to extract maximum performance out of Redshift.
AWS DynamoDB Use cases
- The use case is an online transaction processing workload
- The use case does not need a structured database or in other words, the customer is fine with the overhead of storing keys with every value in records.
- The use case does not involve complex queries or the customer is ready to implement logic in the application layer to refine the query results.
- The customer is willing to spend time and effort on carefully designing a primary key with emphasis on partitioning and sorting according to the use case.
- The application involves a primary key whose demand for access is uniformly distributed. Primary keys with high access demand in specific ranges can increase the workload of specific nodes resulting in high capacity unit usage.
Hope this guide helps you with the right inputs to choose between AWS Redshift vs DynamoDB. Before signing up for one of these, do compare the alternatives: Redshift Vs Snowflake and Redshift Vs BigQuery
Are there any other factors that you would like to compare between the two? Let us know in the comments.