As analytics in your company graduates from a MySQL/PostgreSQL/SQL Server, a pertinent question that you need to answer is which data warehouse is best suited for you.
At Hevo, we make it easier for our customers to bring all their data to the data warehouse of their choice. Naturally, our customers come to us seeking our recommendations on choosing a data warehouse. Our customers want to know which data warehouse will give them faster query times, how much data will it be able to handle and what will it cost. The answer depends on various inputs like the size of data, the nature of use and the technical capability of users managing the warehouse.
In this post, we are going to talk about the two most popular data warehouses: Amazon Redshift and Google BigQuery. Honestly, in the Redshift vs BigQuery comparison, similarities are greater than the differences. Still, there are nuanced differences that you need to be aware of while making a choice.
On many head-to-head tests, Redshift has proved to show better query times when configured and tweaked correctly. There are several benchmarks available over the internet.
Manageability and Usability
Redshift gives you a lot more flexibility on how you want to manage your resources. This means that you get more control at the cost of some management overhead. To operate a decently sized Redshift cluster efficiently, you need a deep understanding and skill-set around warehousing concepts. For example, Redshift will expect you know about how to distribute your data across nodes and will require you to do vacuuming operations on a periodic basis.
BigQuery, on the other hand, does not expect you to manage your resources. It abstracts away the details of the underlying hardware, database, and all configurations. It mostly works out of the box.
In the case of Redshift, you need to predetermine the size of your cluster. That means you are billed irrespective of whether you query your data on not. Shutting down clusters when not needed is left to the user. Billing is done on an hourly usage of the cluster. This makes Redshift more costly when your query volumes are low. But, if your query volumes are higher, predictable and uniformly distributed over time Redshift may turn out to be a lot cheaper. Also, the costs are more predictable because you always know the size of your cluster.
BigQuery, on the other hand, has segregated compute resources from storage. Thus, you are only charged when you are running queries. Billing is done on the amount of data processed during queries. On the surface this pricing might seem to be cheaper but, this approach makes costs for BigQuery unpredictable and it will turn out to be more expensive than Redshift when query volumes are high.
Ecosystems around both Amazon Redshift and Google BigQuery are buzzing. They are being actively promoted by their respective companies and both the products work as marketed. You wouldn’t be too wrong for choosing either of them. Still, we recommend one over the other in the following scenarios:
- Redshift. When you are okay spending some time optimizing your data for fast queries- when your resource utilization is going to be fairly distributed across time and a large proportion of data being actually queried rather than just sitting in the database.
- BigQuery. When you want something that just works and don’t want to spend time tuning the database when you are okay having query response times of a few minutes and you have a lot of data that is being queried rarely.