BigQuery is a Cloud Storage Platform developed by Google to store data, analyze, and create reports. There are many benefits of using Google Bigquery, such as providing Accelerated Insights, Scaling Seamlessly, Running Queries Faster, etc. It also uses mechanisms that help keep the data secure and safe in the cloud. One can access the stored in Google Bigquery at any time or place. Thus, the BigQuery Data Warehouse provides improved access to its users. Also, the platform helps convert data into tables (rows and columns), charts, graphs, and other visual formats for easy understanding. 

When using Google BigQuery, you can get summarised data via simple Aggregate Functions. One such important BigQuery Count Unique values Aggregate function is Distinct Count. This simple BigQuery Count Unique values function lets you quickly get the number of Distinct Values in the selected Data.

In this article, you will learn how to efficiently use the BigQuery Count Unique Value function i.e. COUNT DISTINCT.

What are BigQuery Aggregate Functions?

An Aggregate Function uses the SELECT statement and SQL clauses to compute a set of values and returns only one value as the result. It requires users to add multiple rows of data to run a query and return a single value. Aggregate Functions often avoid null values.

There are various Aggregate Functions supported by Google BigQuery that help in better understanding the stored data. They help Summarize the Data in Multiple Rows into a single value. Google BigQuery supports a wide range of Aggregate Functions. ANY_VALUE, ARRAY_AGG, AVG, BIT_AND, COUNT, MIN, MAX are a few common Aggregate Functions used in the Google BigQuery. 

Simplify Google BigQuery ETL and Analysis with Hevo!

Hevo is a fully managed, no-code data pipeline platform that effortlessly integrates data from various sources into a data warehouse such as BigQuery. With its minimal learning curve, Hevo can be set up in just a few minutes. Its features include: 

  • Connectors: Hevo supports 150+ integrations to SaaS platforms, files, Databases, analytics, and BI tools. It supports various destinations, including Google BigQuery, Amazon Redshift, and Snowflake.
  • Transformations: A simple Python-based drag-and-drop data transformation technique that allows you to transform your data for analysis.
  • Real-Time Data Transfer: Hevo provides real-time data migration, so you can always have analysis-ready data.
  • 24/7 Live Support: The Hevo team is available 24/7 to provide exceptional support through chat, email, and support calls.

Try Hevo today to experience seamless data transformation and migration.

Get Started with Hevo for Free

What is the use of the DISTINCT Clause?

The DISTINCT clause is used to Eliminate any Duplicate Rows in the table which helps return a single value from the remaining rows. Also, the term DISTINCT is often used with the SELECT keyword. 

Using the DISTINCT clause enables users to fetch only the unique values from the table. However, in some cases, the query will return unique combinations when multiple expressions are provided to the DISTINCT clause.

How to perform BigQuery Count Unique Values Job using COUNT DISTINCT?

Google Bigquery involves massive quantities of data input. The purpose of using the DISTINCT clause is to return value from an exact number of DISTINCT items which will help provide better performance and scalability. However, the returned value is not guaranteed to be exact in BigQuery’s implementation. Let’s have a look at the following aspects to understand how to perform BigQuery Count Unique Values Job using COUNT DISTINCT.

1) BigQuery Count Unique Values Job:  COUNT DISTINCT Syntax

Count Function Syntax:

COUNT(*) [OVER (...)]

This function helps receive a single value from the number of rows in the input.

But, if you want to perform the BigQuery Count Unique Values Job using COUNT DISTINCT., follow the below-shared syntax:

COUNT(
  [DISTINCT]
  expression
)
[OVER (...)]

The COUNT DISTINCT syntax for performing the BigQuery Count Unique Values Job helps receive a single value from the number of rows with expression (any data type) evaluated from values apart from NULL.

The use of the COUNT function in Bigquery is to return a single value from the number of rows in the input. The DISTINCT clause with COUNT is used only to eliminate any duplicate row in the table. Apart from the DISTINCT clause, one can also use the OVER clause, which is optional and states a window.

Integrate Mailchimp to BigQuery
Integrate MariaDB to BigQuery
Integrate Linkedin Ads to BigQuery

2) BigQuery Count Unique Values Job: COUNT DISTINCT Example Queries

The COUNT function helps return a single value from the number of rows in the table or distinct values of any data type. Check out the below-listed examples to better understand how to perform BigQuery Count Unique Values Job using COUNT DISTINCT

  • BigQuery Count Unique Values Job Example 1:
SELECT
 COUNT(*) AS count_star,
 COUNT(DISTINCT x) AS count_dist_x
FROM UNNEST([1, 4, 4, 5]) AS x;

OUTPUT:

+----------------+------------------+
| count_star     | count_dist_x     |
+----------------+------------------+
| 4              | 3                |
+---------------+-------------------+
  • BigQuery Count Unique Values Job Example 2:
SELECT
  x,
  COUNT(*) OVER (PARTITION BY MOD(x, 3)) AS count_star,
  COUNT(DISTINCT x) OVER (PARTITION BY MOD(x, 3)) AS count_dist_x
FROM UNNEST([1, 4, 4, 5]) AS x;

OUTPUT:

+--------+------------+--------------+
| x      | count_star | count_dist_x |
+--------+------------+--------------+
| 1      | 3          | 2            |
| 4      | 3          | 2            |
| 4      | 3          | 2            |
| 5      | 1          | 1            |
+--------+------------+--------------+
  • BigQuery Count Unique Values Job Example 3:
SELECT
  x,
  COUNT(*) OVER (PARTITION BY MOD(x, 3)) AS count_star,
  COUNT(x) OVER (PARTITION BY MOD(x, 3)) AS count_x
FROM UNNEST([1, 4, NULL, 4, 5]) AS x;

OUTPUT:

+-----------+----------------+-----------+
| x         | count_star     | count_x   |
+-----------+----------------+-----------+
| NULL      | 1              | 0          |
| 1         | 3              | 3          |
| 4         | 3              | 3          |
| 4         | 3              | 3          |
| 5         | 1              | 1          |
+----------+-----------------+------------+

For counting distinct values of any data type (x):

  • BigQuery Count Unique Values Job Example 4:
SELECT COUNT(DISTINCT IF(x > 0, x, NULL)) AS distinct_positive
FROM UNNEST([1, -2, 4, 1, -5, 4, 1, 3, -6, 1]) AS x;
+------------------------+
| distinct_positive      |
+------------------------+
| 3                      |
+------------------------+
  • BigQuery Count Unique Values Job Example 5:

This example demonstrates how to count distinct dates of a specific event.

WITH Events AS (
  SELECT DATE '2021-01-01' AS event_date, 'SUCCESS' AS event_type
  UNION ALL
  SELECT DATE '2021-01-02' AS event_date, 'SUCCESS' AS event_type
  UNION ALL
  SELECT DATE '2021-01-02' AS event_date, 'FAILURE' AS event_type
  UNION ALL
  SELECT DATE '2021-01-03' AS event_date, 'SUCCESS' AS event_type
  UNION ALL
  SELECT DATE '2021-01-04' AS event_date, 'FAILURE' AS event_type
  UNION ALL
  SELECT DATE '2021-01-04' AS event_date, 'FAILURE' AS event_type
)
SELECT
  COUNT(DISTINCT IF(event_type = 'FAILURE', event_date, NULL))
    AS distinct_dates_with_failures
FROM Events;

OUTPUT:


+---------------------------------------+
| distinct_dates_with_failures          |
+---------------------------------------+
| 2                                     |
+---------------------------------------+

Also learn about Google BigQuery COUNT IF and COUNT Functions: A Comprehensive Guide 101

How do you Increase the Distinct Approximation Threshold? 

To increase the approximation threshold for DISTINCT queries, use the second numeric parameter (n) in the COUNT(DISTINCT [field], n) function call. This will force BigQuery to return an exact count for any number at or below that threshold

For example, let’s create a query with a threshold of 50,000:

SELECT
  COUNT(DISTINCT BookMeta_Author, 50000) AS authors,
  COUNT(DISTINCT BookMeta_Publisher, 50000) AS publishers,
  COUNT(DISTINCT BookMeta_Title, 50000) AS titles
FROM
  TABLE_QUERY([gdelt-bq:hathitrustbooks],
              'REGEXP_MATCH(table_id , r"^192[\d]")')

We assume the first two values, authors and publishers, to have accurate counts because they are below our threshold:

[
  {
    "authors": "48642",
    "publishers": "35140",
    "titles": "76938"
  }
]

To confirm that the threshold is working, we can do one final test, increasing the threshold once more to surpass all three amounts, this time to 80,000:

SELECT
  COUNT(DISTINCT BookMeta_Author, 80000) AS authors,
  COUNT(DISTINCT BookMeta_Publisher, 80000) AS publishers,
  COUNT(DISTINCT BookMeta_Title, 80000) AS titles
FROM
  TABLE_QUERY([gdelt-bq:hathitrustbooks],
              'REGEXP_MATCH(table_id , r"^192[\d]")')

We expected the first two numbers for authors and publishers to be identical to the results of our 50,000 threshold query, and they are.

As a result, we may conclude that these three figures represent accurate counts of the DISTINCT quantities for each field across all 1920s tables in the dataset.

[
  {
    "authors": "48642",
    "publishers": "35140",
    "titles": "77155"
  }
]

How to use the EXACT_COUNT_DISTINCT Function in Google BigQuery?

Apart from the COUNT DISTINCT function used for performing the BigQuery Count Unique Values Job, you can use the EXACT_COUNT_DISTINCT function to Compute Exact Distinct Values. There are cases when you do not have to worry about the query performance or the processing time rather ensure that exact quantities are returned from the input. In such cases, make sure to use the EXACT_COUNT_DISTINCT function. 

For example, let’s aggregate the total DISTINCT authors, publishers, and titles from all books between 1920 and 1929 and use the EXACT_COUNT_DISTINCT function.

SELECT
  EXACT_COUNT_DISTINCT(BookMeta_Author) AS authors,
  EXACT_COUNT_DISTINCT(BookMeta_Publisher) AS publishers,
  EXACT_COUNT_DISTINCT(BookMeta_Title) AS titles
FROM
  TABLE_QUERY([gdelt-bq:hathitrustbooks],
              'REGEXP_MATCH(table_id , r"^192[d]")')

In this query, we have not manually used any threshold value. Thus, the expected result will match the query and deliver the exact value. We are sure that the return value will be identical.

OUTPUT:

[
  {
    "authors": "48642",
    "publishers": "35140",
    "titles": "77155"
  }
]

Conclusion

In this article, you learned about Google BigQuery, its key features, and Bigquery Count Unique Values Aggregate Functions. Google BigQuery, the Data Warehousing Solution, is a cost-effective, easy to install and use solution for businesses. It helps users to import data, process it, and create dashboards and reports for further analysis.

BigQuery Aggregate Functions help save time and run queries faster. The purpose to use Aggregate functions in Bigquery is to run queries for massive datasets and return a single value with a meaning. You can quickly carry out the BigQuery Count Unique Values job by using the Distinct clause in the Count function. 

As the data volume increases, you will be required to invest a part of your engineering bandwidth to Integrate, Clean, Transform and Load your data to BigQuery for further analysis. This can be easily automated by using a Cloud-Based ETL Tool like Hevo Data. Sign up for a 14-day free trial and experience the feature-rich Hevo suite firsthand.

Frequently Asked Questions

1. Does distinct work with count () in SQL?

Yes, DISTINCT works with COUNT() in SQL.

2. How do you count distinct data?

To count distinct data in SQL, you can use the COUNT(DISTINCT column_name) function.

3. How to use count function in BigQuery?

The COUNT function in BigQuery is versatile, allowing you to count all rows, count distinct values, or apply conditions using COUNTIF().

Hitesh Jethva
Technical Content Writer, Hevo Data

Hitesh is a skilled freelance writer in the data industry, known for his engaging content on data analytics, machine learning, AI, big data, and business intelligence. With a robust Linux and Cloud Computing background, he combines analytical thinking and problem-solving prowess to deliver cutting-edge insights. Hitesh leverages his Docker, Kubernetes, AWS, and Azure expertise to architect scalable data solutions that drive business growth.