Working With BigQuery Parameterized Queries: Made Easy 101

on Data Integration, Data Warehouse, ETL Tutorials, Google BigQuery, SQL • February 15th, 2022 • Write for Hevo

Google BigQuery Parameterized Queries - Featured Image

As a Data-Centric Organization, there is the constant need to seek methods in simplifying your SQL scripts to run them smoothly without obstructions, if not, you could end up running unwanted scripts with the associated waste of resources or even malicious scripts.

This write-up is going to look at Google BigQuery Parameterized Queries, explaining how it works to show that using parameters in your query will help eliminate the stress of executing unwanted scripts by ensuring the input is cleaned of any unwanted characters.

The way Google BigQuery Parameterized Queries work is that whenever the SQL query is sent the database knows exactly what this query will do and only then will it insert the username and passwords merely as values thereby not affecting the query. 

Before diving into it fully, Google BigQuery will be defined and a general overview of Google BigQuery Parameterized Queries will be touched on. 

Table of Contents

What is Google BigQuery?

BigQuery Parameterized Queries - BigQuery Logo
Image Source

Google BigQuery is a cloud-based data warehouse that offers a big data analytic web service for processing very large datasets over petabytes of data. It is a serverless data warehouse and supports the querying of data using ANSI SQL designed for analyzing data on a large scale ranging to billions of rows.

Google BigQuery automatically allocates computing resources whenever needed so you do not need to provide instances or virtual machines to use BigQuery and it is built to process read-only data. Google BigQuery is becoming increasingly popular, and many businesses, including Twitter, use it to forecast the exact volume of packages for their various offerings.

The platform utilizes columnar storage that makes Data Querying and Aggregation of results easier and more efficient, therefore, leading to an Agile Business Model.

Key Features of Google BigQuery

BigQuery Parameterized Queries - Google BigQuery Features
Image Source

Here are some of the most notable key features of Google BigQuery:

  • Architecture that is scalable: Google BigQuery has a scalable architecture and a petabyte-scale system that users can scale up and down based on load.
  • Processing time is reduced: Google BigQuery can process petabytes of data in less time than many traditional systems due to its scalable architecture. Users can use Google BigQuery to analyze millions of rows of data without worrying about scalability.
  • Completely Managed: Because Google BigQuery is a product of Google Cloud Platform, it offers fully managed and serverless systems.
  • Security: Google BigQuery provides the highest level of security, safeguarding data both at rest and in transit.
  • Ingestion of Real-Time Data: Google BigQuery has become well-known across all IoT and Transaction platforms due to its ability to perform real-time data analysis.
  • Tolerance for Mistakes: You can replicate data across multiple zones or regions using Google BigQuery. When a region/zone goes down, it ensures that data is always available.
  • Auto-Backup: Google BigQuery automatically creates backup and recovery options to secure data.

Simplify Google BigQuery ETL & Analysis with Hevo’s No-code Data Pipeline

A fully managed No-code Data Pipeline platform like Hevo Data helps you integrate and load data from 100+ Different Sources (including 40+ Free Sources) to a Data Warehouse such as Google BigQuery or Destination of your choice in real-time in an effortless manner. Hevo with its minimal learning curve can be set up in just a few minutes allowing the users to load data without having to compromise performance. Its strong integration with umpteenth sources allows users to bring in data of different kinds in a smooth fashion without having to code a single line. 

Get Started with Hevo for Free

Check out some of the cool features of Hevo:

  • Completely Automated: The Hevo platform can be set up in just a few minutes and requires minimal maintenance.
  • Transformations: Hevo provides preload transformations through Python code. It also allows you to run transformation code for each event in the Data Pipelines you set up. Hevo also offers drag and drop transformations like Date and Control Functions, JSON, and Event Manipulation to name a few. These can be configured and tested before putting them to use.
  • Connectors: Hevo supports 100+ integrations to SaaS platforms, files, Databases, analytics, and BI tools. It supports various destinations including Google BigQuery, Amazon Redshift, Snowflake Data Warehouses; Amazon S3 Data Lakes; MySQL, SQL Server, TokuDB, DynamoDB, PostgreSQL Databases to name a few.  
  • Real-Time Data Transfer: Hevo provides real-time data migration, so you can have analysis-ready data always.
  • 100% Complete & Accurate Data Transfer: Hevo’s robust infrastructure ensures reliable data transfer with zero data loss.
  • Scalable Infrastructure: Hevo has in-built integrations for 100+ sources (including 40+ free sources) that can help you scale your data infrastructure as required.
  • 24/7 Live Support: The Hevo team is available round the clock to extend exceptional support to you through chat, email, and support calls.
  • Schema Management: Hevo takes away the tedious task of schema management & automatically detects the schema of incoming data and maps it to the destination schema.
  • Live Monitoring: Hevo allows you to monitor the data flow so you can check where your data is at a particular point in time.
Sign up here for a 14-Day Free Trial!

What is Parameterized Queries?

A Parameterized Query is a query in which placeholders are used for parameters and the parameters’ values are supplied at the execution time, it is one of the simplest and most useful advanced queries you can create. This allows you to place parameters in an SQL query instead of putting in a constant or fixed value. 

A parameter takes a value only when the query is executed making it possible for the query to be used at any time with different values and for different purposes. The benefit of this is, you can prepare parameterized queries ahead of time and use them when needed as well as use them for similar applications without having to create distinct SQL queries every time. 

Google BigQuery Parameterized Queries are also useful with dynamic data where the values are not known until the statement is executed as it will automatically pass the parameter data along with the named query when executing the statement.

The most important reason that Parameterized Queries are used is, it prevents SQL Injection by carrying out a proper substitution of arguments before running the SQL query. If the input contains unnecessary SQL, using parameterized queries results in the removal of these unwanted SQL inputs. These inputs can alter the meaning of the query if not removed and the SQL will not be executed as it will not be injected into the resulting statement.

Working With Google BigQuery Parameterized Queries

BigQuery Parameterized Queries - BigQuery Parameterized Queries
Image Source

Google BigQuery supports query parameters to help prevent SQL Injection that could be used to attack data-driven applications where malicious SQL statements are inserted into entry fields for execution when queries are constructed using user input.

With Google, BigQuery Parameterized Queries can be used as substitutes for arbitrary expressions though they can not be used as substitutes for identifiers, column names, tables names, or other parts of the query. Google BigQuery Parameterized Queries are not supported by the Google Cloud Console and it is only available with standard SQL syntax. Here are the three applications on working with Google BigQuery Parameterized Queries

A parameter query can either be a named or positional query but not both at once. To specify a named parameter on Google BigQuery, you must use the @ character followed by the identifier which are names associated with objects such as column, table, and other database objects like @param_name.

For example, the code below shows a simple syntax of a Parameter Query that contains a parameter for the collector name.

SELECT
  *
FROM
  ihtags
WHERE
  collectorname=?
ORDER BY
  tagname

For positional parameters, use the placeholder value? to specify a positional parameter.

Google BigQuery Parameterized Queries can be used in the following ways:

  • The bq command-line tool, bq query command.
  • The API
  • The client libraries of C#, GO, Java, Node.js, and Python.

In this piece, we will be looking at the bq query command and API command only. For the client libraries and other information about Google BigQuery parameterized queries, visit Google’s official documentation here.

Google BigQuery Parameterized Queries: Passing Parameter Values to a Parameterized Query 

The examples below show how to pass parameter values to a parameterized query using both the bq query command and API command. 

For bq query, use –parameter to provide values for parameters in the form name:type:value. An empty name produces a positional parameter. The type may be omitted to assume STRING.

The –parameter flag must be used in conjunction with the flag –use_legacy_sql=false to specify standard SQL syntax. You can also specify your location using the –location flag though it is optional. The example below shows how to pass a parameter to count words in a sample Shakespeare book Romeo and Juliet.

bq query 
--use_legacy_sql=false 
--parameter=corpus::romeoandjuliet 
--parameter=min_word_count:INT64:250 
'SELECT
  word, word_count
FROM
  `bigquery-public-data.samples.shakespeare`
WHERE
  corpus = @corpus
AND
  word_count >= @min_word_count
ORDER BY
  word_count DESC'

For API, use named parameters to set the parameterMode to NAMED in the query job configuration. Populate queryParameters with the list of parameters in the query job configuration. Set the name of each parameter with the @param_name used in the query and enable standard SQL syntax by setting useLegacySql to false.

The example below also shows how to pass a parameter to count words in a sample Shakespeare book Romeo and Juliet using API.

{
  "query": "SELECT word, word_count FROM `bigquery-public-data.samples.shakespeare` WHERE corpus = @corpus AND word_count >= @min_word_count ORDER BY word_count DESC;",
  "queryParameters": [
    {
      "parameterType": {
        "type": "STRING"
      },
      "parameterValue": {
        "value": "romeoandjuliet"
      },
      "name": "corpus"
    },
    {
      "parameterType": {
        "type": "INT64"
      },
      "parameterValue": {
        "value": "250"
      },
      "name": "min_word_count"
    }
  ],
  "useLegacySql": false,
  "parameterMode": "NAMED"
}

To use positional parameters, set the parameterMode to POSITIONAL in the query job configuration.

Google BigQuery Parameterized Queries: Using Arrays in Parameterized Queries

To use array type for Google BigQuery Parameterized Queries, you will have to set the type to ARRAY<T>. T is defined as the type of elements you want in the array and the array is constructed as a list of elements enclosed in square brackets separated by commas like [1, 2, 3].

For the bq query command, the example below shows the selection of the most popular names for baby boys starting with the letter E from a public data set in the US.

bq query 
--use_legacy_sql=false 
--parameter='gender::M' 
--parameter='states:ARRAY<STRING>:["EA", "EI", "EV", "EY"]' 
'SELECT
  name,
  SUM(number) AS count
FROM
  `bigquery-public-data.usa_names.usa_1910_2022`
WHERE
  gender = @gender
  AND state IN UNNEST(@states)
GROUP BY
  name
ORDER BY
  count DESC
LIMIT
  10'

To use an array-valued parameter set the parameterType to ARRAY in the query job configuration. If the array values are scalars, set the parameterType to the type of the values, such as STRING. If the array values are structures, set this to STRUCT and add the needed field definitions to structTypes

For example, the query below selects the most popular names for baby boys born in US states starting with the letter W.

{
 "query": "SELECT name, sum(number) as countnFROM `bigquery-public-data.usa_names.usa_1910_2022`nWHERE gender = @gendernAND state IN UNNEST(@states)nGROUP BY namenORDER BY count DESCnLIMIT 10;",
 "queryParameters": [
  {
   "parameterType": {
    "type": "STRING"
   },
   "parameterValue": {
    "value": "M"
   },
   "name": "gender"
  },
  {
   "parameterType": {
    "type": "ARRAY",
    "arrayType": {
     "type": "STRING"
    }
   },
   "parameterValue": {
    "arrayValues": [
     {
      "value": "WA"
     },
     {
      "value": "WI"
     },
     {
      "value": "WV"
     },
     {
      "value": "WY"
     }
    ]
   },
   "name": "states"
  }
 ],
 "useLegacySql": false,
 "parameterMode": "NAMED"
}

Google BigQuery Parameterized Queries: Using Timestamps in Parameterized Queries

To use a timestamp in a query parameter, the value type of the REST API is TIMESTAMP in the format YYYY-MM-DD HH:MM:SS.DDDDDD time_zone

The query example below shows how to add time intervals to the timestamp parameter value. It adds an hour to the timestamp.

 bq query 
--use_legacy_sql=false 
--parameter='ts_value:TIMESTAMP:2022-2-07 08:00:00' 
'SELECT
  TIMESTAMP_ADD(@ts_value, INTERVAL 1 HOUR)'

For API, to use a timestamp parameter you will have to set the parameterType to TIMESTAMP in the query job configuration. Just like the example in the bq query, the query below adds an hour to the timestamp parameter value using API.

{
  "query": "SELECT TIMESTAMP_ADD(@ts_value, INTERVAL 1 HOUR);",
  "queryParameters": [
    {
      "name": "ts_value",
      "parameterType": {
        "type": "TIMESTAMP"
      },
      "parameterValue": {
        "value": "2022-2-07 08:00:00"
      }
    }
  ],
  "useLegacySql": false,
  "parameterMode": "NAMED"
}

Conclusion

This article has given you an overview of Parameterized Queries in Google BigQuery. It explained briefly what a Google BigQuery Parameterized Queries is all about before elaborating on how it can be used in Google BigQuery to pass parameters, construct arrays, and set timestamps with examples using bq query and API commands.

It also stated that you can work with Parameterized Queries in Google BigQuery using libraries written in C#, Node.js, GO, Java, and Python.

Now, to run SQL queries or perform Data Analytics on your MySQL data, you first need to export this data to a Data Warehouse. This will require you to custom code complex scripts to develop the ETL processes. Hevo Data can automate your data transfer process, hence allowing you to focus on other aspects of your business like Analytics, Customer Management, etc.

Visit our Website to Explore Hevo

Companies store valuable data from multiple data sources into Google BigQuery. The manual process to transfer data from source to destination is a tedious task. Hevo Data is a No-code Data Pipeline that can help you transfer data from any data source to desired Google BigQuery. It fully automates the process to load and transform data from 100+ sources to a destination of your choice without writing a single line of code. 

Want to take Hevo for a spin? Sign Up here for a 14-day free trial and experience the feature-rich Hevo suite first hand.

Share your experience of Working With Google BigQuery Parameterized Queries in the comments section below!

No-code Data Pipeline for Google BigQuery