In a business, data created and gathered can cascade to huge volumes in no time. In the last decade, businesses have used the collected data to obtain value with analytics tools. Today, data is being utilized to forecast the future as well as report on the past. With all of the Big Data projects now underway in businesses, data has become the focal point of all major digital transformations. However, traditional On-Premise Data Warehouse systems are becoming increasingly difficult to expand for a variety of reasons, including frequent hardware updates.

As the amount of data captured and housed in most businesses is expanding at an exponential rate, managing infrastructure fees are rising in conjunction with upgrades and operations, which may be a costly affair. As a result, businesses are leaning toward Cloud Data Warehouses like Google BigQuery that support SQL queries like Google BigQuery COUNT IF and COUNT Function, etc. With Cloud Data Warehouse, you don’t have to worry about managing infrastructure, and can focus on finding valuable insights using basic SQL without the help of a database administrator.

Managed Cloud Data Warehouses are also cost-effective because you pay for what you consume in terms of processing and storage. Google BigQuery is one of the best options for businesses of all sizes searching for cost-effective Data Warehouse Solutions to hold their data for analysis and reporting. This article will introduce you to Google BigQuery and provide you a detailed overview of 2 of the most important Aggregated Analytic Functions viz. Google BigQuery COUNT IF and COUNT Function.

Table of Contents

Prerequisites

Listed below are the prerequisites of using Google BigQuery COUNT IF and COUNT Functions:

  • A Google BigQuery account.
  • Basic understanding of SQL.
  • Basic understanding of databases.

What is Google BigQuery?

Google BigQuery logo
Image Source

Google BigQuery is a highly scalable Data Warehouse with a reputation for quickly storing and querying data. A Data Warehouse gathers data from many sources and runs analytics on it in order to provide value to company operations through insights. It enables organizations to be situationally aware of and react to real-time business events by providing quicker insights. In a Cloud-first era, Google BigQuery meets similar demands of Data-Driven companies. It also distributes computing resources automatically as required, thus you do not need to create instances or Virtual Machines to use it.

Download the Cheatsheet on How to Set Up High-performance ETL to BigQuery
Download the Cheatsheet on How to Set Up High-performance ETL to BigQuery
Download the Cheatsheet on How to Set Up High-performance ETL to BigQuery
Learn the best practices and considerations for setting up high-performance ETL to BigQuery

Based on Dremel, Google BigQuery has the Massively Parallel Processing (MPP) architecture, which allows it to query data by reading 1000s of rows in a single second. In this design, data is stored in replicated, distributed units and processed by Compute Clusters made up of Nodes in the Shared-Nothing nomenclature. As a result, the Google BigQuery framework is highly versatile, allowing users to move their data to a Data Warehouse and begin analyzing it using normal SQL queries.

It is compatible with Google Cloud Storage (GCS) and provides RESTful Web Services. Data Management, Data Querying, Access Control, and Machine Learning are just a few of its features.

Key Features of Google BigQuery

Listed below are the key features of Google BigQuery:

  • Management: Google BigQuery takes care of complex maintenance and setup procedures like Server/VM Administration, Server/VM Sizing, Memory Management, and more by providing serverless execution.
  • Data Loading: Google BigQuery utilizes classic ELT/ETL Batch Data Loading methods with normal SQL dialect and Data Streaming to load data row by row with Streaming APIs.
  • Backup: Data is automatically replicated using Google BigQuery, so even if your data is lost, you’ll always have a backup. Users can query point-in-time snapshots from 7 days of data changes using Google BigQuery’s data backup and disaster recovery techniques.
  • Federated Query: This is a method of sending a query statement to an external database and receiving the response as a temporary table. If your data is in Bigtable, GCS, or Google Drive, you may query that data directly from Google BigQuery by using the Google BigQuery Connection API to create a connection with the external database.
  • Detailed Insights: Many popular analytics tools, such as Looker and Google Data Studio can be seamlessly integrated with BigQuery. This makes it simple to comprehend your information.
  • Storage: Google BigQuery takes advantage of Colossus, Google’s global storage system, to store and optimize your data for free and with no downtime. Google BigQuery employs the opinionated Capacitor format in Colossus to house data where Capacitor performs several improvements behind the scenes, burning a significant amount of CPU/RAM in the process yet without impacting query performance or placing a bill limitation.

For more information on Google BigQuery, click here.

Simplify ETL Process using Hevo’s No-code Data Pipeline

Hevo Data, a No-code Data Pipeline helps to Load Data from any data source such as Databases, SaaS applications, Cloud Storage, SDKs, and Streaming Services and simplifies the ETL process. It supports 150+ data sources and is a 3-step process by just selecting the data source, providing valid credentials, and choosing the destination. Hevo loads the data onto the desired Data Warehouse such as Google BigQuery, etc., enriches the data, and transforms it into an analysis-ready form without writing a single line of code.

Its completely automated pipeline offers data to be delivered in real-time without any loss from source to destination. Its fault-tolerant and scalable architecture ensure that the data is handled in a secure, consistent manner with zero data loss and supports different forms of data. The solutions provided are consistent and work with different Business Intelligence (BI) tools as well.

Get Started with Hevo for Free

Check out why Hevo is the Best:

  • Secure: Hevo has a fault-tolerant architecture that ensures that the data is handled in a secure, consistent manner with zero data loss.
  • Schema Management: Hevo takes away the tedious task of schema management & automatically detects the schema of incoming data and maps it to the destination schema.
  • Minimal Learning: Hevo, with its simple and interactive UI, is extremely simple for new customers to work on and perform operations.
  • Hevo Is Built To Scale: As the number of sources and the volume of your data grows, Hevo scales horizontally, handling millions of records per minute with very little latency.
  • Incremental Data Load: Hevo allows the transfer of data that has been modified in real-time. This ensures efficient utilization of bandwidth on both ends.
  • Live Support: The Hevo team is available round the clock to extend exceptional support to its customers through chat, email, and support calls.
  • Live Monitoring: Hevo allows you to monitor the data flow and check where your data is at a particular point in time.
Sign up here for a 14-Day Free Trial!

Analytic Functions in Google BigQuery

Analytic Functions in Google BigQuery image
Image Source

An Analytic Function in Google BigQuery delivers a single output for each row after computing values over a set of rows. This differs from an Aggregate Function, which gives a single result for a set of rows.

An Analytic Function contains an OVER Clause that specifies a window of rows surrounding the row being evaluated. The output of the Analytic Function is computed for each row, using the specified window of rows as input and potentially aggregation. The Analytic Functions allow you to do things like compute moving averages, rank objects, calculate cumulative sums, and do a lot more. Given below is the syntax of the Analytic Function:

analytic_function_name ( [ argument_list ] ) OVER over_clause

over_clause:
  { named_window | ( [ window_specification ] ) }

window_specification:
  [ named_window ]
  [ PARTITION BY partition_expression [, ...] ]
  [ ORDER BY expression [ { ASC | DESC }  ] [, ...] ]
  [ window_frame_clause ]

window_frame_clause:
  { rows_range } { frame_start | frame_between }

rows_range:
  { ROWS | RANGE }

The parameters used in an Analytic Function are described below:

  • analytic_function_name: The function that performs an Analytic Operation is called the analytic_function_name. 
  • argument_list: Arguments unique to the Analytic Function are listed in an argument_list. Some functions include them, whereas others do not.
  • OVER: Prior to the OVER Clause, the keyword OVER is necessary for the Analytic Function syntax.
  • over_clause: The OVER Clause refers to a window that defines a set of rows in a table for which an Analytic Function should be applied.
  • window_specification: Defines the window’s specifications.
  • window_frame_clause: Defines the window’s window frame.
  • rows_range: A window frame’s physical rows or logical range is defined by rows_range.

NOTE: Here are certain things to keep in mind (they are true unless explicitly indicated otherwise) while working with Analytic Functions:

  • If one operand is a floating-point operand and the other operand is another numeric type, both operands are transformed to FLOAT64 before the function is evaluated for functions that take numeric types.
  • With the exception of the IS operator, if an operand is NULL, the result is NULL.
  • If no time zone is given for functions that are time zone sensitive (as mentioned in the function description), the default time zone, UTC, is enforced.

Aggregate Analytic Functions in Google BigQuery

A Function that performs a computation on a set of values is known as an Aggregate Analytic Function. Analytic Functions can employ most Aggregate Functions. Here, the OVER clause is attached to the Aggregate Function call with aggregate analytic functions; the function call syntax is otherwise unchanged. These Analytic Functions, like their Aggregate Function equivalents, execute aggregations, but only across the applicable window frame for each row. These Analytic Functions’ outcome data types are the same as those of their Aggregate Function equivalents.

At present, Google BigQuery supports the following Aggregate Functions as Analytic Functions:

  • ANY_VALUE: Returns expression for some row chosen from the group. 
  • ARRAY_AGG: Returns an array of expression values.
  • AVG: Gives the average of the non-NULL input values.
  • CORR: Returns the Pearson coefficient of correlation for a set of number pairs.
  • COUNT: Returns the number of [distinct] elements in expression.
  • BigQuery COUNT IF: Google BigQuery COUNT IF Function returns the count of True values for expression.
  • COVAR_POPCOVAR_POP: Returns the population covariance of a set of numbers.
  • COVAR_SAMP: Returns the sample covariance of a set of numbers.
  • MAX: Returns the maximum non-NULL value of an expression.
  • MIN: Returns the minimum non-NULL value of an expression.
  • ST_CUSTERDBSCAN: Performs DBSCAN clustering on a column of geographies.
  • STDEV_POP: Returns population standard deviation of values.
  • STDEV_SAMP: Returns sample standard deviation of values.
  • STRING_AGG: Returns a value obtained by concatenating all the non-null values.
  • SUM: Returns the sum of all non-null values.
  • VAR_POP: Returns the population variance of results.
  • VAR_SAMP: Returns the sample variance of results.

Let’s walk through some of the most important Aggregated Analytic Functions:

1) Google BigQuery COUNT Function

Google BigQuery COUNT Function image
Image Source

The Google BigQuery COUNT Function returns the number of rows in the input. Given below is the standard syntax of Google BigQuery COUNT Function:

COUNT(*) [OVER (...)]

There is another syntax of Google BigQuery COUNT Function that returns the number of rows with expression evaluated to any value other than NULL.

sql COUNT( [DISTINCT] expression ) [OVER (...)]

Here, expression can be any data type (Nullable, Orderable, Groupable, Comparable). If DISTINCT is present, expression can only be a data type that is groupable. For example,

For a comprehensive overview of Google BigQuery COUNT IF Function, keep reading!

2) Google BigQuery COUNT IF Function

Google BigQuery COUNT IF Function image
Image Source

The COUNT IF BigQuery Function returns the number of TRUE values for expression. If there are no input rows, or if the expression evaluates to FALSE or NULL for all rows, Google BigQuery COUNT IF Function returns 0. Given below is the syntax of Google BigQuery COUNT IF Function:

COUNTIF(
  expression
)
[OVER (...)]

Here is an example of Google BigQuery COUNT IF Function:

The form of Google BigQuery COUNT IF(DISTINCT…) is not supported since the expression must be a BOOL. This would be useless because TRUE has a single unique value. When Google BigQuery COUNT IF and DISTINCT are used together, the goal is usually to count the number of distinct values of an expression that satisfy a specific condition of Google BigQuery COUNT IF Function.

One way to achieve this is by using the COUNT Function instead of Google BigQuery COUNT IF Function (as illustrated in the previous section)

COUNT(DISTINCT IF(condition, expression, NULL))

For more information on Aggregated Analytic Functions like Google BigQuery COUNT IF and COUNT function.

Conclusion

This article introduced you to Google BigQuery and its key features. It explained several key ideas related to Aggregate Analytic Functions, with an emphasis on the Google BigQuery COUNT IF and COUNT Functions. To assist you in grasping more detailed information about these functions, the article provided syntax, explanation, and examples of both Google BigQuery COUNT IF and COUNT Function. In case, you want to transfer your data from multiple data sources to Google BigQuery and use SQL queries like Google BigQuery COUNT IF Function, etc., then you can explore Hevo Data.

Visit our Website to Explore Hevo

Want to take Hevo for a spin? Sign Up for a 14-day free trial and experience the feature-rich Hevo suite first hand.

Share your experience of understanding Google BigQuery COUNT IF and COUNT Functions in the comments section below!

Preetipadma Khandavilli
Freelance Technical Content Writer, Hevo Data

Preetipadma is passionate about freelance writing within the data industry, expertly delivering informative and engaging content on data science by incorporating her problem-solving skills.

Get Started with Hevo