This post aims to discuss the differences between the above two Google Cloud solutions that is, Google Cloud Spanner vs BigQuery.
We will also discuss their similarities towards the end and assist you in discerning which solution fits your needs. So that you get a good eagle’s eye view of how things stand between the two.
What is Google Cloud Spanner?
Google Cloud Spanner is a relational DBMS that follows NewSQL in its approach. It is well suited for OLTP (online transaction processing) and guarantees ACID (atomicity, consistency, isolation, and durability) compliance. Still, it’s highly scalable like NoSQL systems and offers scale-out architecture. In scale-out architecture, new nodes can be easily added to an existing cluster to spread the storage+compute of data, thereby achieving scalability. Google Cloud Spanner offers the benefits of both NoSQL as well as NewSQL.
At the core of Google Spanner’s consistency is Google’s globally synchronized clock – TrueTime. TrueTime is a highly available and distributed global clock, developed by Google, which is available to all Google cloud offerings/servers. TrueTime guarantees that a new timestamp generated, let’s say T1, will always be greater than any timestamp T2, as long as T2’s generation finished before T1’s generation started. This gives True-time the ability to generate monotonically increasing timestamps, i.e. these timestamps will always be increasing throughout their entire domain.
This, in turn, allows applications to assign unique increasing timestamps to their progressively occurring transactions. Using TrueTime, Google Cloud Spanner assigns a timestamp to every data that is written to it, this timestamp is globally consistent.
So every transaction stored in Spanner will have a unique timestamp attached to it, this timestamp stores the moment in time when Spanner processed this transaction. Using these timestamps, Google Cloud Spanner can provide consistent reads across global locations, without blocking any writes. Also, Google Cloud Spanner can provide stringent concurrency-control guarantees for all its transactions.
For an external user, all transactions seem as if they occurred sequentially, one after the other, even though Google Cloud Spanner might have executed( and/or replicated) them across different geographical locations. So, for the external user, Google Cloud Spanner performs like a single machine database.
Google Cloud Spanner provides a global timestamp order, which the transactions can use for further operations/queries. In earlier database solutions, users had to choose between either high performance+weaker guarantees OR slow performance+stronger guarantees. But, Google Cloud Spanner can provide strong guarantees and high transaction fidelity with faster performance.
Developers need not worry about any clash/race between their different transactions and need to focus only on the correctness of each of their transactions, and their application logic.
What is Google BigQuery?
Google BigQuery is an enterprise-level, petabyte-scale data warehouse that allows storing and querying huge volumes of data.
It eases the storage of nested and aggregated data in records while maintaining complex relationships. Google BigQuery also allows for querying the data and creating meaningful visualizations from it.
Google Bigquery allows the creation of real-time and predictive analytics, using its built-in Machine Language and Artificial Intelligence integrations.
Google Cloud Spanner vs Google BigQuery Key Differences
Here are the 9 major differences that must be noted!
Use Cases
Google BigQuery supports many different data formats and allows you to analyze the data and create visualizations. Hence, Google BigQuery is more suited for Big Data, Social networking applications, and the likes of IoT. Google BigQuery does not support transactions and does not allow updating of existing records.
On the other hand, Google Cloud Spanner supports OLTP along with scalability and high availability. Hence, Cloud Spanner is more suited for E-commerce systems, Core Banking, Gaming, Telecom, etc.
It allows updating to existing records and appending data to existing tables.
Architecture
Google BigQuery separates storage and computes in such a way that users need to pay for computing only when they are running their analytics. Google BigQuery Compute is realized by using Dremel ( a query system that combines columnar data layout with multi-level execution trees).
Google Cloud Spanner uses the Paxos consensus algorithm to split databases across many nodes, for sharding(partitioning) data. To ensure global consistency, it uses GPS and Atomic clocks, provided by TrueTime.
Access Control
Google BigQuery arranges data in datasets, and permissions are given to own/read/write a dataset(s). It also allows for access controls at column and row levels.
Cloud Spanner on the other hand gives you fine-grained access control. It uses Google Cloud Identity and Access Management (IAM) settings, to provide differential access to individual tables.
Way of Approach
Cloud Spanner and Google Bigquery attempt to solve different problems.
Since Google Bigquery is a data warehouse with analytics and visualization capabilities, it integrates well with other offerings like Data Studio, Looker, BI engine, Apache big data ecosystem, etc.
Google Cloud Spanner, on the other hand, is a versatile and powerful DBMS, hence it integrates well with offerings like Cloud Storage, Cloud Run, Terraforms Liquibase, etc.
Location/Geographical Placement
Cloud Spanner is known to replicate your data across multiple geographical locations, and still allow for consistent reads, without blocking any writes. Google Cloud Spanner is offered as a globally distributed DBMS that follows the NewSQL paradigm.
In Google BigQuery, a dataset is created in a geographical location and always exists there only, you can only create a copy in another location. Within a location, Google BigQuery could internally duplicate some data, but that is not something that is divulged or advertised to users.
Pricing
Since both software solves a different set of problems, they are priced differently.
Google Cloud Spanner will charge you for Network usage, Compute Capacity ( or many node-hours), and some cost for storage. Compute capacity depends on the number of nodes used or a fraction of it ( e.g. 1300 processing units), multiplied by the number of hours it was used. The minimum chargeable time unit is 1 hour, after the initial 1 hour, the billing for compute capacity is prorated.
So, an instance with computes capacity of let’s say 400 processing units that ran for 3.5 hours would be charged as (400 processing units × 3.5 hours) ÷ 1000 processing units, which calculates to 1.4 node hours.
Cost per node per hour is least for US regions (0.90-0.99) and appreciably higher for south America-east ($1.35) and Asia-south($1.26 ). Storage charges are calculated using the volume of storage used by your database+backups, multiplied by the monthly rate. The storage unit here is 1 GB or 2^30 bytes, charged per month, ranging from $0.33-$0.70 per GB per month.
There are no Network or bandwidth charges for Google Cloud Spanner replication or network ingress traffic, but you will be charged for some egress traffic as the network reads. Egress charges between regions within the US (per GB) are $0.01.
Google BigQuery will charge you some amount for storage but the main expenditure is on analysis, i.e. querying, DDL/DML, scripts, etc. that scan the data tables. This charging is based on the total data processed in the columns selected, which in turn depends on the type of data ( blob or integer etc.) in that column. There are some charges for streaming inserts and using their Google BigQuery Storage API. Failed queries and those that return data from cache are not charged.
For querying, you can opt for one of the following 2 charging streams:-
- On-demand pricing: Here, the first 1 TB of query data processed per month is free. After that, the charges depend on the number of bytes processed by each query.
These range from US$ 5.00-9.00 USD per tebibyte, for 1 tebibyte and above, per month per account
- Flat rate pricing: Let’s you buy compute capacity in units of virtual CPUs/slots, with annual/monthly/per the minute commitment.
The commitments are priced starting from $2000-$2920 per 100 slots monthly and reduced to $1700-$2420 per 100 slots yearly.
* Above charges are as advertised by Google, at the time of writing this post.
Streaming inserts are charged depending on the region, ranging from the US $0.01-0.0136 per 200 mebibytes. Storage pricing ranges from US $0.020-0.23 per GB for Active storage and the US $0.010-0.016 per GB for Long-term storage.
Return on Investment
Return on Investment on such products can be variable, depending on your data and processing needs, the region where services were availed, and how well your analytics succeeded in achieving desired results, like an increase in sales.
Also, there will be a break-even period, only after which you start experiencing tangible results/improvements in your business processes. Based on some veritable experiences shared by actual users, ROI on Google Bigquery could range from 30 percent in the short term, to more than 300 percent in the long term( 3 years or more).
For Google Cloud Spanner, ROI comes from the fact that it scales faster than most other counterparts, and still provides ACID compliance. Also, everything is managed by Google, so Spanner’s total cost of ownership(TCO) is almost 78% lower than on-premises solutions and 37% lower than other cloud SAAS offerings.
Despite the many variables in its pricing, if Google Cloud Spanner suits your needs specifically, the ROI could range from 45% in short term, to more than 200% in the longish term.
Replication Methods
Replicas of data are created to ensure data availability, serve data via the nearest geographical location, ease of development of Apps and provide a single database experience.
Google Cloud Spanner replicas can be of 3 types:-
- Read-Write Replicas: These can be read and written upon, so they maintain a full copy of your data.
They can vote, can become a leader, and can serve reads.
- Read Replicas maintain a full copy of your data, can only serve reads, cannot vote for the leader, and cannot commit reads.
Read-only replicas can exist only in multi-region locations.
- Witness Replicas do not support reads but participate in voting to commit reads( choose a leader), but they cannot become a leader as there are no writes on them.
Witness replicas exist only in multi-region locations.
Google BigQuery replicates data on its own and remembers a week of change history. Your Google Bigquery dataset can either be in a single geographical location or multi-region locations.
In both cases, Google Bigquery will replicate your datasets in two different cloud zones.
Not only storage but Google Bigquery will also replicate your compute slots across multiple availability zones. This two-pronged replication allows Google Bigquery to ensure High Availability and Durability. In case of Machine-Level Failure, Google Bigquery can continue processing your queries with just a few milliseconds of delay. A soft regional failure will not result in loss of your data, but a hard regional failure could result in loss of data.
Hence, Google Bigquery allows you to create cross-region dataset copies , to protect your data from hard regional failures.
Programming Languages
Google BigQuery has API client libraries for C#, Go, Java, Node.js, PHP, Python, and Ruby. Also, there are third-party libraries available for Python, R, and Scala.
Cloud Spanner also has APIs for C++, C#, Go, Java, Node.js, PHP, Python, and Ruby. But the PostgreSQL interface is not supported by C#, Node.js, and Ruby.
Some Common Characteristics
Though both the solutions are quite different, they still have some common characteristics.
Cloud-based SAAS offerings Updates and Upgrades to the bare metal infrastructure running them, are transparent and done behind the scenes.
Use modern technology to provide Scalability, Automatics Sharding, Increased Availability through replication, and automatic failure recovery. Both enable Fast Transactions and Fast Querying.
Both use Colossus, the advanced successor to the Google File System (GFS) for storage.
Conclusion
To conclude, Google Cloud Spanner and Google BigQuery are two different cloud SAAS offerings that aim to solve a very different set of problems, and are largely as diverse as apples and oranges.
Hevo Data, a No-code Data Pipeline provides you with a consistent and reliable solution to manage data transfer between a variety of sources and a wide variety of Desired Destinations such as Google BigQuery with a few clicks.
Visit our Website to Explore Hevo
Hevo Data with its strong integration with 150+ Data Sources (including 40+ Free Sources) allows you to not only export data from your desired data sources & load it to the destination of your choice such as Google BigQuery, but also transform & enrich your data to make it analysis-ready so that you can focus on your key business needs and perform insightful analysis using BI tools.
Sign Up for a 14-day free trial and experience the feature-rich Hevo suite first hand. You may also have a look at the amazing pricing, which will assist you in selecting the best plan for your requirements.
Share your experience of understanding Google Cloud Spanner vs BigQuery in the comment section below! We would love to hear your thoughts.
Pratik Dwivedi is a seasoned expert in data analytics, machine learning, AI, big data, and business intelligence. With over 18 years of experience in system analysis, design, and implementation, including 8 years in a Techno-Managerial role, he has successfully managed international clients and led teams on various projects. Pratik is passionate about creating engaging content that educates and inspires, leveraging his extensive technical and managerial expertise.