BigQuery is a Cloud Storage Platform developed by Google to store data, analyze, and create reports. There are many benefits of using Google Bigquery, such as providing Accelerated Insights, Scaling Seamlessly, Running Queries Faster, etc. It also uses mechanisms that help keep the data secure and safe in the cloud. One can access the stored in Google Bigquery at any time or place. Thus, the BigQuery Data Warehouse provides improved access to its users. Also, the platform helps convert data into tables (rows and columns), charts, graphs, and other visual formats for easy understanding.
When using Google BigQuery, you can get summarised data via simple Aggregate Functions. One such important BigQuery Count Unique values Aggregate function is Distinct Count. This simple BigQuery Count Unique values function lets you quickly get the number of Distinct Values in the selected Data.
In this article, you will learn how to efficiently use the BigQuery Count Unique Value function i.e. COUNT DISTINCT.
Table of Contents
What is Google BigQuery?
Image Source
Google BigQuery is a Multi-Cloud Data Warehousing Solution developed by Google to help businesses function faster and make better decisions. It has a Serverless Architecture and uses Google’s Infrastructure processing powers. The in-built Machine Learning functionality of Google BigQuery helps users understand the data in a better way.
You can import data to Google BigQuery, process it, and export it for further analysis. Also, users can run interactive queries with Google Queries. It has become one of the trusted Data Warehousing Solutions for various reasons. Google Bigquery is a great tool for Real-Time Data Analytics, Creating Visually Stunning Dashboards and Insightful Reports.
Key Features of Google BigQuery
- Google BigQuery ML: This feature helps create, guide, and execute Machine Learning models to users via Standard SQL.
- Integrations: Google BigQuery supports integration with various Google Products. It makes it easier for developers to use API for creating integration.
- Fault-tolerant Structure: This feature helps protect your data and delivers Real-Time Logs for errors during the ETL procedure.
- Automatic high availability: Google Bigquery provides Automatic Storage Scalability without any additional charge or setup.
- Backup and Restores Automatically: This Data Warehousing Solution creates Backup and Restores Data automatically. Thus, with BigQuery, all your data stays secure.
What are BigQuery Aggregate Functions?
An Aggregate Function uses the SELECT statement and SQL clauses to compute a set of values and returns only one value as the result. It requires users to add multiple rows of data to run a query and return a single value. Aggregate Functions often avoid null values.
There are various Aggregate Functions supported by Google BigQuery that help in better understanding the stored data. They help Summarize the Data in Multiple Rows into a single value. Google BigQuery supports a wide range of Aggregate Functions. ANY_VALUE, ARRAY_AGG, AVG, BIT_AND, COUNT, MIN, MAX are a few common Aggregate Functions used in the Google BigQuery.
What is the use of the DISTINCT Clause?
Image Source
The DISTINCT clause is used to Eliminate any Duplicate Rows in the table which helps return a single value from the remaining rows. Also, the term DISTINCT is often used with the SELECT keyword.
Using the DISTINCT clause enables users to fetch only the unique values from the table. However, in some cases, the query will return unique combinations when multiple expressions are provided to the DISTINCT clause.
Hevo Data is a No-code Data Pipeline that offers a fully managed solution to set up Data Integration for 100+ Data Sources (Including 40+ Free Sources) and will let you directly Load Data to a Data Warehouse like Google BigQuery or a destination of your choice. It will automate your data flow in minutes without writing any line of code. Its fault-tolerant architecture makes sure that your data is secure and consistent. Hevo provides you with a truly efficient and fully automated solution to manage data in real-time and always have analysis-ready data.
Get Started with Hevo for Free
Let’s look at some of the salient features of Hevo:
- Fully Managed: It requires no management and maintenance as Hevo is a fully automated platform.
- Data Transformation: It provides a simple interface to perfect, modify, and enrich the data you want to transfer.
- Real-Time: Hevo offers real-time data migration. So, your data is always ready for analysis.
- Schema Management: Hevo can automatically detect the schema of the incoming data and maps it to the destination schema.
- Transformations: Hevo provides preload transformations through Python code. It also allows you to run transformation code for each event in the Data Pipelines you set up. You need to edit the event object’s properties received in the transform method as a parameter to carry out the transformation. Hevo also offers drag and drop transformations like Date and Control Functions, JSON, and Event Manipulation to name a few. These can be configured and tested before putting them to use.
- Connectors: Hevo supports 100+ integrations to SaaS platforms, files, Databases, analytics, and BI tools. It supports various destinations including Google BigQuery, Amazon Redshift, Firebolt, Snowflake Data Warehouses; Amazon S3 Data Lakes; and MySQL, SQL Server, TokuDB, DynamoDB, PostgreSQL Databases to name a few.
- Secure: Hevo has a fault-tolerant architecture that ensures that the data is handled in a secure, consistent manner with zero data loss.
- Hevo Is Built To Scale: As the number of sources and the volume of your data grows, Hevo scales horizontally, handling millions of records per minute with very little latency.
- Live Monitoring: Advanced monitoring gives you a one-stop view to watch all the activities that occur within pipelines.
- Live Support: Hevo team is available round the clock to extend exceptional support to its customers through chat, email, and support calls.
Sign up here for a 14-Day Free Trial!
How to perform BigQuery Count Unique Values Job using COUNT DISTINCT?
Google Bigquery involves massive quantities of data input. The purpose of using the DISTINCT clause is to return value from an exact number of DISTINCT items which will help provide better performance and scalability. However, the returned value is not guaranteed to be exact in BigQuery’s implementation. Let’s have a look at the following aspects to understand how to perform BigQuery Count Unique Values Job using COUNT DISTINCT.
1) BigQuery Count Unique Values Job: COUNT DISTINCT Syntax
Count Function Syntax:
COUNT(*) [OVER (...)]
This function helps receive a single value from the number of rows in the input.
But, if you want to perform the BigQuery Count Unique Values Job using COUNT DISTINCT., follow the below-shared syntax:
COUNT(
[DISTINCT]
expression
)
[OVER (...)]
The COUNT DISTINCT syntax for performing the BigQuery Count Unique Values Job helps receive a single value from the number of rows with expression (any data type) evaluated from values apart from NULL.
The use of the COUNT function in Bigquery is to return a single value from the number of rows in the input. The DISTINCT clause with COUNT is used only to eliminate any duplicate row in the table. Apart from the DISTINCT clause, one can also use the OVER clause, which is optional and states a window.
2) BigQuery Count Unique Values Job: COUNT DISTINCT Example Queries
The COUNT function helps return a single value from the number of rows in the table or distinct values of any data type. Check out the below-listed examples to better understand how to perform BigQuery Count Unique Values Job using COUNT DISTINCT.
- BigQuery Count Unique Values Job Example 1:
SELECT
COUNT(*) AS count_star,
COUNT(DISTINCT x) AS count_dist_x
FROM UNNEST([1, 4, 4, 5]) AS x;
OUTPUT:
+----------------+------------------+
| count_star | count_dist_x |
+----------------+------------------+
| 4 | 3 |
+---------------+-------------------+
- BigQuery Count Unique Values Job Example 2:
SELECT
x,
COUNT(*) OVER (PARTITION BY MOD(x, 3)) AS count_star,
COUNT(DISTINCT x) OVER (PARTITION BY MOD(x, 3)) AS count_dist_x
FROM UNNEST([1, 4, 4, 5]) AS x;
OUTPUT:
+--------+------------+--------------+
| x | count_star | count_dist_x |
+--------+------------+--------------+
| 1 | 3 | 2 |
| 4 | 3 | 2 |
| 4 | 3 | 2 |
| 5 | 1 | 1 |
+--------+------------+--------------+
- BigQuery Count Unique Values Job Example 3:
SELECT
x,
COUNT(*) OVER (PARTITION BY MOD(x, 3)) AS count_star,
COUNT(x) OVER (PARTITION BY MOD(x, 3)) AS count_x
FROM UNNEST([1, 4, NULL, 4, 5]) AS x;
OUTPUT:
+-----------+----------------+-----------+
| x | count_star | count_x |
+-----------+----------------+-----------+
| NULL | 1 | 0 |
| 1 | 3 | 3 |
| 4 | 3 | 3 |
| 4 | 3 | 3 |
| 5 | 1 | 1 |
+----------+-----------------+------------+
For counting distinct values of any data type (x):
- BigQuery Count Unique Values Job Example 4:
SELECT COUNT(DISTINCT IF(x > 0, x, NULL)) AS distinct_positive
FROM UNNEST([1, -2, 4, 1, -5, 4, 1, 3, -6, 1]) AS x;
+------------------------+
| distinct_positive |
+------------------------+
| 3 |
+------------------------+
- BigQuery Count Unique Values Job Example 5:
This example demonstrates how to count distinct dates of a specific event.
WITH Events AS (
SELECT DATE '2021-01-01' AS event_date, 'SUCCESS' AS event_type
UNION ALL
SELECT DATE '2021-01-02' AS event_date, 'SUCCESS' AS event_type
UNION ALL
SELECT DATE '2021-01-02' AS event_date, 'FAILURE' AS event_type
UNION ALL
SELECT DATE '2021-01-03' AS event_date, 'SUCCESS' AS event_type
UNION ALL
SELECT DATE '2021-01-04' AS event_date, 'FAILURE' AS event_type
UNION ALL
SELECT DATE '2021-01-04' AS event_date, 'FAILURE' AS event_type
)
SELECT
COUNT(DISTINCT IF(event_type = 'FAILURE', event_date, NULL))
AS distinct_dates_with_failures
FROM Events;
OUTPUT:
+---------------------------------------+
| distinct_dates_with_failures |
+---------------------------------------+
| 2 |
+---------------------------------------+
How to use the EXACT_COUNT_DISTINCT Function in Google BigQuery?
Apart from the COUNT DISTINCT function used for performing the BigQuery Count Unique Values Job, you can use the EXACT_COUNT_DISTINCT function to Compute Exact Distinct Values. There are cases when you do not have to worry about the query performance or the processing time rather ensure that exact quantities are returned from the input. In such cases, make sure to use the EXACT_COUNT_DISTINCT function.
For example, let’s aggregate the total DISTINCT authors, publishers, and titles from all books between 1920 and 1929 and use the EXACT_COUNT_DISTINCT function.
SELECT
EXACT_COUNT_DISTINCT(BookMeta_Author) AS authors,
EXACT_COUNT_DISTINCT(BookMeta_Publisher) AS publishers,
EXACT_COUNT_DISTINCT(BookMeta_Title) AS titles
FROM
TABLE_QUERY([gdelt-bq:hathitrustbooks],
'REGEXP_MATCH(table_id , r"^192[d]")')
In this query, we have not manually used any threshold value. Thus, the expected result will match the query and deliver the exact value. We are sure that the return value will be identical.
OUTPUT:
[
{
"authors": "48642",
"publishers": "35140",
"titles": "77155"
}
]
Conclusion
In this article, you learned about Google BigQuery, its key features, and Bigquery Count Unique Values Aggregate Functions. Google BigQuery, the Data Warehousing Solution, is a cost-effective, easy to install and use solution for businesses. It helps users to import data, process it, and create dashboards and reports for further analysis. The Data Warehousing solution assists businesses to understand their data, analyze and make better decisions. BigQuery Aggregate Functions help save time and run queries faster. The purpose to use Aggregate functions in Bigquery is to run queries for massive datasets and return a single value with a meaning. You can quickly carry out the BigQuery Count Unique Values job by using the Distinct clause in the Count function.
As your business grows, the data associated with your customers, products, and services are generated at an astonishing rate. BigQuery’s Scalability and Best-in-Class Performance help in handling all types of varying workloads. Though, regularly updating and transferring all the data from various applications used across your business to your BigQuery Data Warehouse is a tedious task. You will be required to invest a part of your engineering bandwidth to Integrate, Clean, Transform and Load your data to BigQuery for further analysis. This can be easily automated by using a Cloud-Based ETL Tool like Hevo Data.
Visit our Website to Explore Hevo
Hevo Data will effectively transfer your data, allowing you to focus on important aspects of your business like Analytics, Customer Management, etc. This platform allows you to seamlessly transfer data from a vast sea of sources to a Data Warehouse like Google BigQuery or a destination of your choice to be visualised in a BI Tool. It is a reliable, secure, and fully automated service that doesn’t require you to write any code!
If you are using Google Big Query as a Data Warehousing and Analysis platform for your business and looking for a No-fuss alternative to Manual Data Integration, then Hevo can efficiently automate this for you. Hevo, with its strong integration with 100+ sources & BI tools, allows you to not only export & load data but also transform & enrich your data & make it analysis-ready in a jiffy.
Want to take Hevo for a ride? Sign Up for a 14-day free trial and simplify your Data Integration process. Do check out the pricing details to understand which plan meets all your business needs.
Share with us your experience of performing the BigQuery Count Unique Values Job using COUNT DISTINCT function. Let us know in the comments section below!