BigQuery is a Cloud Storage Platform developed by Google to store data, analyze, and create reports. There are many benefits of using Google Bigquery, such as providing Accelerated Insights, Scaling Seamlessly, Running Queries Faster, etc. It also uses mechanisms that help keep the data secure and safe in the cloud. One can access the stored in Google Bigquery at any time or place. Thus, the BigQuery Data Warehouse provides improved access to its users. Also, the platform helps convert data into tables (rows and columns), charts, graphs, and other visual formats for easy understanding. 

When using Google BigQuery, you can get summarised data via simple Aggregate Functions. One such important BigQuery Count Unique values Aggregate function is Distinct Count. This simple BigQuery Count Unique values function lets you quickly get the number of Distinct Values in the selected Data.

In this article, you will learn how to efficiently use the BigQuery Count Unique Value function i.e. COUNT DISTINCT.

What are BigQuery Aggregate Functions?

An Aggregate Function uses the SELECT statement and SQL clauses to compute a set of values and returns only one value as the result. It requires users to add multiple rows of data to run a query and return a single value. Aggregate Functions often avoid null values.

There are various Aggregate Functions supported by Google BigQuery that help in better understanding the stored data. They help Summarize the Data in Multiple Rows into a single value. Google BigQuery supports a wide range of Aggregate Functions. ANY_VALUE, ARRAY_AGG, AVG, BIT_AND, COUNT, MIN, MAX are a few common Aggregate Functions used in the Google BigQuery. 

Simplify BigQuery ETL and Analysis Using Hevo’s No-code Data Pipeline

Looking for a solution to centralize your data to BigQuery? Hevo Data is a No-code Data Pipeline that offers a fully managed data pipeline platform to set up Data Integration for 150+ Data Sources (Including 40+ Free Sources) and will let you directly Load Data to a Data Warehouse like Google BigQuery or a destination of your choice.

Get Started with Hevo for Free

What is the use of the DISTINCT Clause?

BigQuery Count Unique - Distinct Keyword
Image Source

The DISTINCT clause is used to Eliminate any Duplicate Rows in the table which helps return a single value from the remaining rows. Also, the term DISTINCT is often used with the SELECT keyword. 

Using the DISTINCT clause enables users to fetch only the unique values from the table. However, in some cases, the query will return unique combinations when multiple expressions are provided to the DISTINCT clause.

How to perform BigQuery Count Unique Values Job using COUNT DISTINCT?

Google Bigquery involves massive quantities of data input. The purpose of using the DISTINCT clause is to return value from an exact number of DISTINCT items which will help provide better performance and scalability. However, the returned value is not guaranteed to be exact in BigQuery’s implementation. Let’s have a look at the following aspects to understand how to perform BigQuery Count Unique Values Job using COUNT DISTINCT.

1) BigQuery Count Unique Values Job:  COUNT DISTINCT Syntax

Count Function Syntax:

COUNT(*) [OVER (...)]

This function helps receive a single value from the number of rows in the input.

But, if you want to perform the BigQuery Count Unique Values Job using COUNT DISTINCT., follow the below-shared syntax:

COUNT(
  [DISTINCT]
  expression
)
[OVER (...)]

The COUNT DISTINCT syntax for performing the BigQuery Count Unique Values Job helps receive a single value from the number of rows with expression (any data type) evaluated from values apart from NULL.

The use of the COUNT function in Bigquery is to return a single value from the number of rows in the input. The DISTINCT clause with COUNT is used only to eliminate any duplicate row in the table. Apart from the DISTINCT clause, one can also use the OVER clause, which is optional and states a window.

2) BigQuery Count Unique Values Job: COUNT DISTINCT Example Queries

The COUNT function helps return a single value from the number of rows in the table or distinct values of any data type. Check out the below-listed examples to better understand how to perform BigQuery Count Unique Values Job using COUNT DISTINCT

  • BigQuery Count Unique Values Job Example 1:
SELECT
 COUNT(*) AS count_star,
 COUNT(DISTINCT x) AS count_dist_x
FROM UNNEST([1, 4, 4, 5]) AS x;

OUTPUT:

+----------------+------------------+
| count_star     | count_dist_x     |
+----------------+------------------+
| 4              | 3                |
+---------------+-------------------+
  • BigQuery Count Unique Values Job Example 2:
SELECT
  x,
  COUNT(*) OVER (PARTITION BY MOD(x, 3)) AS count_star,
  COUNT(DISTINCT x) OVER (PARTITION BY MOD(x, 3)) AS count_dist_x
FROM UNNEST([1, 4, 4, 5]) AS x;

OUTPUT:

+--------+------------+--------------+
| x      | count_star | count_dist_x |
+--------+------------+--------------+
| 1      | 3          | 2            |
| 4      | 3          | 2            |
| 4      | 3          | 2            |
| 5      | 1          | 1            |
+--------+------------+--------------+
  • BigQuery Count Unique Values Job Example 3:
SELECT
  x,
  COUNT(*) OVER (PARTITION BY MOD(x, 3)) AS count_star,
  COUNT(x) OVER (PARTITION BY MOD(x, 3)) AS count_x
FROM UNNEST([1, 4, NULL, 4, 5]) AS x;

OUTPUT:

+-----------+----------------+-----------+
| x         | count_star     | count_x   |
+-----------+----------------+-----------+
| NULL      | 1              | 0          |
| 1         | 3              | 3          |
| 4         | 3              | 3          |
| 4         | 3              | 3          |
| 5         | 1              | 1          |
+----------+-----------------+------------+

For counting distinct values of any data type (x):

  • BigQuery Count Unique Values Job Example 4:
SELECT COUNT(DISTINCT IF(x > 0, x, NULL)) AS distinct_positive
FROM UNNEST([1, -2, 4, 1, -5, 4, 1, 3, -6, 1]) AS x;
+------------------------+
| distinct_positive      |
+------------------------+
| 3                      |
+------------------------+
  • BigQuery Count Unique Values Job Example 5:

This example demonstrates how to count distinct dates of a specific event.

WITH Events AS (
  SELECT DATE '2021-01-01' AS event_date, 'SUCCESS' AS event_type
  UNION ALL
  SELECT DATE '2021-01-02' AS event_date, 'SUCCESS' AS event_type
  UNION ALL
  SELECT DATE '2021-01-02' AS event_date, 'FAILURE' AS event_type
  UNION ALL
  SELECT DATE '2021-01-03' AS event_date, 'SUCCESS' AS event_type
  UNION ALL
  SELECT DATE '2021-01-04' AS event_date, 'FAILURE' AS event_type
  UNION ALL
  SELECT DATE '2021-01-04' AS event_date, 'FAILURE' AS event_type
)
SELECT
  COUNT(DISTINCT IF(event_type = 'FAILURE', event_date, NULL))
    AS distinct_dates_with_failures
FROM Events;

OUTPUT:


+---------------------------------------+
| distinct_dates_with_failures          |
+---------------------------------------+
| 2                                     |
+---------------------------------------+

Also learn about Google BigQuery COUNT IF and COUNT Functions: A Comprehensive Guide 101

How do you Increase the Distinct Approximation Threshold? 

To increase the approximation threshold for DISTINCT queries, use the second numeric parameter (n) in the COUNT(DISTINCT [field], n) function call. This will force BigQuery to return an exact count for any number at or below that threshold

For example, let’s create a query with a threshold of 50,000:

SELECT
  COUNT(DISTINCT BookMeta_Author, 50000) AS authors,
  COUNT(DISTINCT BookMeta_Publisher, 50000) AS publishers,
  COUNT(DISTINCT BookMeta_Title, 50000) AS titles
FROM
  TABLE_QUERY([gdelt-bq:hathitrustbooks],
              'REGEXP_MATCH(table_id , r"^192[\d]")')

We assume the first two values, authors and publishers, to have accurate counts because they are below our threshold:

[
  {
    "authors": "48642",
    "publishers": "35140",
    "titles": "76938"
  }
]

To confirm that the threshold is working, we can do one final test, increasing the threshold once more to surpass all three amounts, this time to 80,000:

SELECT
  COUNT(DISTINCT BookMeta_Author, 80000) AS authors,
  COUNT(DISTINCT BookMeta_Publisher, 80000) AS publishers,
  COUNT(DISTINCT BookMeta_Title, 80000) AS titles
FROM
  TABLE_QUERY([gdelt-bq:hathitrustbooks],
              'REGEXP_MATCH(table_id , r"^192[\d]")')

We expected the first two numbers for authors and publishers to be identical to the results of our 50,000 threshold query, and they are.

As a result, we may conclude that these three figures represent accurate counts of the DISTINCT quantities for each field across all 1920s tables in the dataset.

[
  {
    "authors": "48642",
    "publishers": "35140",
    "titles": "77155"
  }
]

How to use the EXACT_COUNT_DISTINCT Function in Google BigQuery?

Apart from the COUNT DISTINCT function used for performing the BigQuery Count Unique Values Job, you can use the EXACT_COUNT_DISTINCT function to Compute Exact Distinct Values. There are cases when you do not have to worry about the query performance or the processing time rather ensure that exact quantities are returned from the input. In such cases, make sure to use the EXACT_COUNT_DISTINCT function. 

For example, let’s aggregate the total DISTINCT authors, publishers, and titles from all books between 1920 and 1929 and use the EXACT_COUNT_DISTINCT function.

SELECT
  EXACT_COUNT_DISTINCT(BookMeta_Author) AS authors,
  EXACT_COUNT_DISTINCT(BookMeta_Publisher) AS publishers,
  EXACT_COUNT_DISTINCT(BookMeta_Title) AS titles
FROM
  TABLE_QUERY([gdelt-bq:hathitrustbooks],
              'REGEXP_MATCH(table_id , r"^192[d]")')

In this query, we have not manually used any threshold value. Thus, the expected result will match the query and deliver the exact value. We are sure that the return value will be identical.

OUTPUT:

[
  {
    "authors": "48642",
    "publishers": "35140",
    "titles": "77155"
  }
]

Conclusion

In this article, you learned about Google BigQuery, its key features, and Bigquery Count Unique Values Aggregate Functions. Google BigQuery, the Data Warehousing Solution, is a cost-effective, easy to install and use solution for businesses. It helps users to import data, process it, and create dashboards and reports for further analysis.

BigQuery Aggregate Functions help save time and run queries faster. The purpose to use Aggregate functions in Bigquery is to run queries for massive datasets and return a single value with a meaning. You can quickly carry out the BigQuery Count Unique Values job by using the Distinct clause in the Count function. 

As the data volume increases, you will be required to invest a part of your engineering bandwidth to Integrate, Clean, Transform and Load your data to BigQuery for further analysis. This can be easily automated by using a Cloud-Based ETL Tool like Hevo Data.

Visit our Website to Explore Hevo

Want to take Hevo for a ride? Sign Up for a 14-day free trial and simplify your Data Integration process. Do check out the pricing details to understand which plan meets all your business needs.

Share with us your experience of performing the BigQuery Count Unique Values Job using COUNT DISTINCT function. Let us know in the comments section below! 

Hitesh Jethva
Freelance Technical Content Writer, Hevo Data

Hitesh is skilled in freelance writing within the data industry. He creates engaging and informative content on various subjects like data analytics, machine learning, AI, big data, and business intelligence byusing his analytical thinking and problem solving ability.

No-code Data Pipeline for Google BigQuery