As a Data-Centric Organization, there is the constant need to seek methods in simplifying your SQL scripts to run them smoothly without obstructions, if not, you could end up running unwanted scripts with the associated waste of resources or even malicious scripts.
This write-up is going to look at Google BigQuery Parameterized Queries, explaining how it works to show that using parameters in your query will help eliminate the stress of executing unwanted scripts by ensuring the input is cleaned of any unwanted characters.
The way Google BigQuery Parameterized Queries work is that whenever the SQL query is sent the database knows exactly what this query will do and only then will it insert the username and passwords merely as values thereby not affecting the query.
Before diving into it fully, Google BigQuery will be defined and a general overview of Google BigQuery Parameterized Queries will be touched on.
What is Google BigQuery?
Google BigQuery is a cloud-based data warehouse that offers a big data analytic web service for processing very large datasets over petabytes of data. It is a serverless data warehouse and supports the querying of data using ANSI SQL designed for analyzing data on a large scale ranging to billions of rows.
Google BigQuery automatically allocates computing resources whenever needed so you do not need to provide instances or virtual machines to use BigQuery and it is built to process read-only data. Google BigQuery is becoming increasingly popular, and many businesses, including Twitter, use it to forecast the exact volume of packages for their various offerings.
The platform utilizes columnar storage that makes Data Querying and Aggregation of results easier and more efficient, therefore, leading to an Agile Business Model.
Key Features of Google BigQuery
Here are some of the most notable key features of Google BigQuery:
- Architecture that is scalable: Google BigQuery has a scalable architecture and a petabyte-scale system that users can scale up and down based on load.
- Processing time is reduced: Google BigQuery can process petabytes of data in less time than many traditional systems due to its scalable architecture. Users can use Google BigQuery to analyze millions of rows of data without worrying about scalability.
- Completely Managed: Because Google BigQuery is a product of Google Cloud Platform, it offers fully managed and serverless systems.
- Security: Google BigQuery provides the highest level of security, safeguarding data both at rest and in transit.
- Ingestion of Real-Time Data: Google BigQuery has become well-known across all IoT and Transaction platforms due to its ability to perform real-time data analysis.
- Tolerance for Mistakes: You can replicate data across multiple zones or regions using Google BigQuery. When a region/zone goes down, it ensures that data is always available.
- Auto-Backup: Google BigQuery automatically creates backup and recovery options to secure data.
Hevo makes BigQuery ETL effortless with its no-code data pipelines. Automate data ingestion, transformation, and loading seamlessly while enjoying real-time updates and robust data integrity. Transform your ETL process and focus on insights, not complexities.
Here’s why you should choose Hevo:
- Plug-and-play transformations
- Real-time data transfer
- 24/5 Live Support
What is Parameterized Queries?
A Parameterized Query is a query in which placeholders are used for parameters and the parameters’ values are supplied at the execution time, it is one of the simplest and most useful advanced queries you can create. This allows you to place parameters in an SQL query instead of putting in a constant or fixed value.
A parameter takes a value only when the query is executed making it possible for the query to be used at any time with different values and for different purposes. The benefit of this is, you can prepare parameterized queries ahead of time and use them when needed as well as use them for similar applications without having to create distinct SQL queries every time.
Google BigQuery Parameterized Queries are also useful with dynamic data where the values are not known until the statement is executed as it will automatically pass the parameter data along with the named query when executing the statement.
The most important reason that Parameterized Queries are used is, it prevents SQL Injection by carrying out a proper substitution of arguments before running the SQL query. If the input contains unnecessary SQL, using parameterized queries results in the removal of these unwanted SQL inputs. These inputs can alter the meaning of the query if not removed and the SQL will not be executed as it will not be injected into the resulting statement.
Working With Google BigQuery Parameterized Queries
Google BigQuery supports query parameters to help prevent SQL Injection that could be used to attack data-driven applications where malicious SQL statements are inserted into entry fields for execution when queries are constructed using user input.
With Google, BigQuery Parameterized Queries can be used as substitutes for arbitrary expressions though they can not be used as substitutes for identifiers, column names, tables names, or other parts of the query. Google BigQuery Parameterized Queries are not supported by the Google Cloud Console and it is only available with standard SQL syntax. Here are the three applications on working with Google BigQuery Parameterized Queries
- BigQuery Parameterized Queries: Passing Parameter Values to a Parameterized Query
- BigQuery Parameterized Queries: Using Arrays in Parameterized Queries
- BigQuery Parameterized Queries: Using Timestamps in Parameterized Queries
A parameter query can either be a named or positional query but not both at once. To specify a named parameter on Google BigQuery, you must use the @ character followed by the identifier which are names associated with objects such as column, table, and other database objects like @param_name.
For example, the code below shows a simple syntax of a Parameter Query that contains a parameter for the collector name.
SELECT
*
FROM
ihtags
WHERE
collectorname=?
ORDER BY
tagname
For positional parameters, use the placeholder value? to specify a positional parameter.
Google BigQuery Parameterized Queries can be used in the following ways:
- The
bq
command-line tool,bq query
command. - The API
- The client libraries of C#, GO, Java, Node.js, and Python.
In this piece, we will be looking at the bq query command and API command only. For the client libraries and other information about Google BigQuery parameterized queries, visit Google’s official documentation here.
Google BigQuery Parameterized Queries: Passing Parameter Values to a Parameterized Query
The examples below show how to pass parameter values to a parameterized query using both the bq query command and API command.
For bq query, use –parameter to provide values for parameters in the form name:type:value. An empty name produces a positional parameter. The type may be omitted to assume STRING.
The –parameter flag must be used in conjunction with the flag –use_legacy_sql=false to specify standard SQL syntax. You can also specify your location using the –location flag though it is optional. The example below shows how to pass a parameter to count words in a sample Shakespeare book Romeo and Juliet.
bq query
--use_legacy_sql=false
--parameter=corpus::romeoandjuliet
--parameter=min_word_count:INT64:250
'SELECT
word, word_count
FROM
`bigquery-public-data.samples.shakespeare`
WHERE
corpus = @corpus
AND
word_count >= @min_word_count
ORDER BY
word_count DESC'
For API, use named parameters to set the parameterMode to NAMED in the query job configuration. Populate queryParameters with the list of parameters in the query job configuration. Set the name of each parameter with the @param_name used in the query and enable standard SQL syntax by setting useLegacySql to false.
The example below also shows how to pass a parameter to count words in a sample Shakespeare book Romeo and Juliet using API.
{
"query": "SELECT word, word_count FROM `bigquery-public-data.samples.shakespeare` WHERE corpus = @corpus AND word_count >= @min_word_count ORDER BY word_count DESC;",
"queryParameters": [
{
"parameterType": {
"type": "STRING"
},
"parameterValue": {
"value": "romeoandjuliet"
},
"name": "corpus"
},
{
"parameterType": {
"type": "INT64"
},
"parameterValue": {
"value": "250"
},
"name": "min_word_count"
}
],
"useLegacySql": false,
"parameterMode": "NAMED"
}
To use positional parameters, set the parameterMode to POSITIONAL in the query job configuration.
Google BigQuery Parameterized Queries: Using Arrays in Parameterized Queries
To use array type for Google BigQuery Parameterized Queries, you will have to set the type to ARRAY<T>. T is defined as the type of elements you want in the array and the array is constructed as a list of elements enclosed in square brackets separated by commas like [1, 2, 3].
For the bq query command, the example below shows the selection of the most popular names for baby boys starting with the letter E from a public data set in the US.
bq query
--use_legacy_sql=false
--parameter='gender::M'
--parameter='states:ARRAY<STRING>:["EA", "EI", "EV", "EY"]'
'SELECT
name,
SUM(number) AS count
FROM
`bigquery-public-data.usa_names.usa_1910_2022`
WHERE
gender = @gender
AND state IN UNNEST(@states)
GROUP BY
name
ORDER BY
count DESC
LIMIT
10'
To use an array-valued parameter set the parameterType to ARRAY in the query job configuration. If the array values are scalars, set the parameterType to the type of the values, such as STRING. If the array values are structures, set this to STRUCT and add the needed field definitions to structTypes.
For example, the query below selects the most popular names for baby boys born in US states starting with the letter W.
{
"query": "SELECT name, sum(number) as countnFROM `bigquery-public-data.usa_names.usa_1910_2022`nWHERE gender = @gendernAND state IN UNNEST(@states)nGROUP BY namenORDER BY count DESCnLIMIT 10;",
"queryParameters": [
{
"parameterType": {
"type": "STRING"
},
"parameterValue": {
"value": "M"
},
"name": "gender"
},
{
"parameterType": {
"type": "ARRAY",
"arrayType": {
"type": "STRING"
}
},
"parameterValue": {
"arrayValues": [
{
"value": "WA"
},
{
"value": "WI"
},
{
"value": "WV"
},
{
"value": "WY"
}
]
},
"name": "states"
}
],
"useLegacySql": false,
"parameterMode": "NAMED"
}
Google BigQuery Parameterized Queries: Using Timestamps in Parameterized Queries
To use a timestamp in a query parameter, the value type of the REST API is TIMESTAMP in the format YYYY-MM-DD HH:MM:SS.DDDDDD time_zone.
The query example below shows how to add time intervals to the timestamp parameter value. It adds an hour to the timestamp.
bq query
--use_legacy_sql=false
--parameter='ts_value:TIMESTAMP:2022-2-07 08:00:00'
'SELECT
TIMESTAMP_ADD(@ts_value, INTERVAL 1 HOUR)'
For API, to use a timestamp parameter you will have to set the parameterType to TIMESTAMP in the query job configuration. Just like the example in the bq query, the query below adds an hour to the timestamp parameter value using API.
{
"query": "SELECT TIMESTAMP_ADD(@ts_value, INTERVAL 1 HOUR);",
"queryParameters": [
{
"name": "ts_value",
"parameterType": {
"type": "TIMESTAMP"
},
"parameterValue": {
"value": "2022-2-07 08:00:00"
}
}
],
"useLegacySql": false,
"parameterMode": "NAMED"
}
Conclusion
This article has given you an overview of Parameterized Queries in Google BigQuery. It explained briefly what a Google BigQuery Parameterized Queries is all about before elaborating on how it can be used in Google BigQuery to pass parameters, construct arrays, and set timestamps with examples using bq query and API commands.
It also stated that you can work with Parameterized Queries in Google BigQuery using libraries written in C#, Node.js, GO, Java, and Python.
Now, to run SQL queries or perform Data Analytics on your MySQL data, you first need to export this data to a Data Warehouse. This will require you to custom code complex scripts to develop the ETL processes. Hevo Data can automate your data transfer process, hence allowing you to focus on other aspects of your business like Analytics, Customer Management, etc.
Companies store valuable data from multiple data sources into Google BigQuery. The manual process to transfer data from source to destination is a tedious task. Hevo Data is a No-code Data Pipeline that can help you transfer data from any data source to desired Google BigQuery. It fully automates the process to load and transform data from 150+ sources to a destination of your choice without writing a single line of code.
Want to take Hevo for a spin? Explore Hevo’s 14-day free trial and experience the feature-rich Hevo suite firsthand.
Share your experience of Working With Google BigQuery Parameterized Queries in the comments section below!
FAQ on BigQuery Parameterized Queries
Can we create parameterized queries in BigQuery?
Yes, BigQuery supports parameterized queries using query parameters
How do you parameterize a query?
Parameterizing a query in BigQuery involves using placeholders (‘@parameter_name’) for values you want to inject dynamically into the query.
Can we pass parameter to view in BigQuery?
BigQuery does not natively support passing parameters directly to views.
What is the difference between parameterized and dynamic queries?
Parameterized Queries are SQL queries where parameters (placeholders) pass values into the query at runtime, while dynamic queries are constructed dynamically at runtime, often concatenating strings and values to form SQL statements.
How to use parameters in select query?
DECLARE my_param INT64 DEFAULT 123;
SELECT *
FROM project.dataset.table
WHERE column = @my_param;