At this point in the evolution of cloud infrastructures, every company is contemplating whether moving to a completely managed database service makes sense instead of using an on-premise database. The argument in favour of completely managed services is the obvious cost savings and relief from offloading all the nitty-gritty related to maintaining a highly reliable database service. Here, we will compare two of the most popular databases as a service from Amazon – AWS Redshift vs RDS, to see how they stack up to each other. Towards the end, we also in detail mention under what circumstances/use cases you should opt for one of these two.
Understanding Redshift and RDS
AWS RDS Features
AWS RDS offers a fully managed relational database as a service. It allows a customer to choose from six different database engines – MySQL, MariaDB, PostgreSQL, AWS Aurora, Oracle Database, and SQL Server. Customers can also choose between different types of hardware through AWS instance types – optimized for performance, memory or IO. All the typical administrative tasks related to running a database – Hardware provisioning, database setup, patching updates, backing up data, etc. are completely automated and the customers can focus only on their mission-critical business logic.
The full suite of security and compliance comes built-in with RDS along with encryption. AWS also offers a high availability option in the form of multiAZ deployment. This means a replica of your database is automatically maintained in another region and AWS will manage a completely seamless switch in the unfortunate case of something going wrong with your database.
Architecturally, RDS works on top of virtualized instances. There is no concept of cluster or nodes when it comes to RDS and these individual virtualized instances can be scaled for performance or storage with just a few clicks. The scaling normally takes a few minutes and it can go up to a maximum capacity of 32 vCPUs and 244GB of RAM. Storage scaling will depend on the type of database engine that is being used and the maximum it can go up is up to 64 TB for AWS Aurora database engine.
AWS Redshift Features
Redshift is a completely managed data warehouse as a service and can scale up to petabytes of data while offering lightning-fast querying performance. The querying engine is PostgreSQL complaint with small differences in data types and the data structure is columnar. Redshift allows the customers to choose from different types of instances optimized for performance or storage. Scaling in the case of newer generation instances can happen in a matter of minutes using the elastic resize feature. While for older generation instances that do not support elastic resize, scaling can only happen in a few hours. Nevertheless, all the administrative tasks are automated here as well and customers can focus only on their core business logic.
Redshift manages this optimum mix of scalability and performance through a cluster-based architecture with multiple nodes. Scaling is accomplished by upgrading the nodes, adding more nodes or both. Among the nodes, one of the nodes is designated as a leader node and this node is responsible for client communication, query optimization, execution plan creation, and sending tasks to individual nodes for execution. Except for some administration queries, nothing gets executed on the leader node and the real work is delegated to member nodes.
Like RDS, Redshift pricing is also including storage and compute resources and customers can choose to pay only for what they use. Redshift is mainly optimized for large complex analytical workloads spanning across millions of rows, but can also support OLTP workloads if necessary; though it is not the recommended practice. A detailed review of Redshift and its architecture can be found in one of our previous blogs here.
Redshift Vs RDS – Comparison
We have now covered the basics of AWS Relational database service and Redshift. Let’snow compare these managed services considering the most critical factors based on which a data architect will choose one of these.
Redshift Vs RDS: Scaling
The ability to scale is one of the most critical factors to consider when making a choice between different databases. Both Redshift and RDS allows the customers to scale as per their budget and performance requirements. Since RDS is based on virtualized instances, its scaling is done by reconfiguring the virtual instance capabilities. Scaling takes only a few minutes and can be done in a few clicks in the AWS console. Redshift is based on a more complex architecture and it means scaling is not as seamless as it is in RDS. Redshift instances with support for elastic resize can do it in a few minutes, but the database unavailable time window is certainly higher than that of RDS. That said, the limit of scaling is higher for Redshift when it comes to storage. Redshift also has an option called concurrency scaling which can be used to support a virtually unlimited number of concurrent users without a drop in querying performance.
Redshift Vs RDS: Storage Capacity
The biggest differentiator between Redshift and RDS is the storage capacity and the limit to which it can be scaled. With Redshift, the storage can be scaled up to petabytes of data. The limit of AWS Redshift is 2 PB with its ds2.8xlarge type instance. With RDS, since it works with individual virtualized instances, the storage limit is in the range of TBs and will vary according to the chosen database engine. For SQL server the storage capacity is limited at 16 TB, while the aurora engine can scale up to 64 TB. All the other engine types can scale up to 32 TB of data.
Redshift Vs RDS: Data Replication
A major workload in any ETL will be the data load from different sources. Since these services have entirely different architecture, the procedure to load is also different. With RDS, this is closely tied to the underlying database engine that is being used. Importing the data will use the engine specific commands. Similarly, the tools for exporting will also depend on the source and target engine types like mysqldump for MySQL or pg_dump for Postgres.
For Redshift, importing data will involve copying the complete data to S3 and loading it using the COPY command. You may require temporary tables if your Redshift tables already contain data. A detailed blog on how to do ETL in Redshift can be found here.
What you should note while moving data to Redshift or RDS:
First thing, you should remember that you are looking to move data into Amazon Redshift or RDS for key business processes and insights. Hence, it becomes crucial to set up a robust system that can send data to Redshift/RDS in an accurate, reliable and secure manner.
A Reliable Method to Move Data to Redshift or RDS
Using Hevo, an enterprise-grade data pipelines as a service, you can move data in real-time from any of your data sources to any destination without writing any code. Built to handle large scale of data, Hevo ensures that your data is moved to the destination in a reliable, accurate and consistent manner.
Explore a 14-day free trial to experience a hassle-free data moving experience from any source to RDS or Redshift.
Redshift Vs RDS: Pricing
Pricing in both RDS and Redshift is including both storage and compute. Both the services allow paying only for your use. AWS RDS is obviously cheaper because of its simplicity and lower scaling capabilities. RDs starts as low .017 $ per hour for MySQL, Postgres and Maria DB instances. Oracle has its license fees included with the pricing and comes at 0.038 $ per hour for the smallest instance possible. AWS also has a bring your own license policy for Oracle with lower rates, in case the customer wants to use his already existing Oracle databases.
With Redshift, pricing is a bit more steep with the lowest current generation dense compute dc2.large starting at .25 $ per hour. Redshift also offers another type of instance called dense storage that comes with higher storage capacity using HDDs instead of SSDs. The pricing is higher than dense compute with the lowest per hour price at .85 $. With Redshift, customers typically use multiple nodes and the price will go up with the number of nodes.
Redshift Vs RDS: Performance
In a nutshell, RDS offers better performance when it comes to queries that do not test its limits; To be specific, queries that do not span across millions of rows offer better performance in RDS. The primary reason for this is that Redshift has a very sophisticated query optimizer and execution planner at work before the actual query execution. For simpler or low data scan queries, this is an overkill since query optimization in most cases takes more time than the execution.
The story changes when it comes to queries that need to scan millions of rows and aggregate them. Redshift is designed for scenarios like this and excels here, offering a comparable or even better performance in such cases. So the argument of performance to differentiate between them is tied to the actual storage use case and should not be considered independently.
Additionally both the databases offer performance improvement through key distribution mechanisms.RDS offers sharding capability and with carefully designed keys, customers can extract more performance. Redshift has options for SORT KEY and DIST KEY, which if used correctly can aid in performance improvements in joins and complex queries.
Redshift Vs RDS: Maintenance
RDS is low on maintenance compared to Redshift because of its simpler architecture. All the administrative tasks are automated and there is nothing much the end-users need to do to maintain it.
Redshift needs some administrative tasks to be executed manually by the cluster administrator. It uses delete markers for DELETE and UPDATE queries. This means there needs to be an archival process for the actual deletions and this is to be done using the VACUUM command. This command should be executed by the cluster administrator. Redshift also recommends executing the ANALYZE command periodically to ensure all metadata and table statistics are kept updated.
Redshift Vs RDS: Data Structure
Since RDS is basically a relational data store, it follows a row-oriented structure. Redshift, on the other hand, has a columnar structure and is optimized for fast retrieval of columns. RDS querying may vary according to the engine used and Redshift conforms to Postgres standard.
Redshift does not do a good job when it comes to enforcing unique constraints in insertion keys and it is expected that the end-users will manage it themselves. RDS offers support for unique key constraints in all the database engines.
Redshift Vs RDS: Security
Both RDS and Redshift offers the full suite of security and compliance. They ensure data is encrypted at rest and in transit. It is also possible to isolate the instances using a virtual private cloud network. AWS identity and access management allow close control of permissions enabling the customers to decide who can do what in the instance types. Support for SSL is also standard in both cases.
For RDS, the database engine may provide additional security capabilities other than standard AWS features and it is the user’s responsibility to manage them. The case in point here is additional security settings like Oracle native network encryption and Oracle transparent data encryption, which are valid only for oracle database engines.
Redshift Vs RDS – Use cases
Even though both RDS and Redshift offer database as a service, they are different in many ways as depicted in the earlier sections. Both are designed for different use cases and excels at the said use cases. We will try to enumerate the different scenarios in which these services work best in the below sections.
When to use Redshift?
- You want a petabyte-scale data warehouse and is not happy with traditional database engines
- Your analytical and reporting workload is heavy and can interfere with your OLTP database.
- Your queries span across millions of rows and you anticipate even more complex queries
- You anticipate a constant query workload and your cluster will be running for the most part of the day.
- You are ready to manage the uniqueness of your insertion keys yourselves and do not expect the database to ensure it.
- You have a willing team to put their head into DIST KEYS and SORT KEYS and structure data so that best performance is extracted.
When to use RDS?
- You want to use traditional databases in the cloud and the only requirement is to offload the database management.
- Your data volume is in TBs and you do not anticipate a large increase in the near future. RDS hits its storage limits at 64 TB.
- You have an online transaction processing use case and want instant results with lesser data.
- You don’t have queries that span across millions of rows and the query complexity is limited.
- Your reporting and analytical workloads are minimal and do not interfere with your OLTP workloads.
- Your budget is tighter and you have no intention to spend money anticipating future astronomical workloads.
Now that you have a fair idea of the strengths and limitations of both Redshift and RDS, what do you think suits your use case better? Let us know in the comments?