BigQuery Jobs 101: Key Types, Functions & Easy Steps

on BigQuery Functions, Data Warehouse, Google BigQuery • March 10th, 2022 • Write for Hevo

Bigquery Jobs: FI

BigQuery is a fully managed, serverless data warehouse that allows for scalable data analysis across petabytes. It’s a Platform as a Service (PaaS) that allows for ANSI SQL querying. Machine learning capabilities are also built-in. BigQuery gives you the most flexibility by separating the compute engine that analyses your data from the storage options you choose. BigQuery can be used to store and analyze data or to assess data regardless of where it is stored. Federated queries allow you to read data from multiple sources, while streaming allows you to keep your data updated in real-time. Data can be analyzed and understood using powerful tools like BigQuery ML and BI Engine.

This article talks about BigQuery Jobs and how they are run and how they can be managed. It also gives an overview of BigQuery and its features.

Table Of Contents

What is BigQuery?

Google BigQuery is a Cloud-based Data Warehouse that provides a Big Data Analytic Web Service for processing petabytes of data. It is intended for analyzing data on a large scale. It consists of two distinct components: Storage and Query Processing. It employs the Dremel Query Engine to process queries and is built on the Colossus File System for storage. These two components are decoupled and can be scaled independently and on-demand.

Google BigQuery is fully managed by Cloud service providers. You don’t need to deploy any resources, such as discs or virtual machines. It is designed to process read-only data. Dremel and Google BigQuery use Columnar Storage for quick data scanning, as well as a tree architecture for executing queries using ANSI SQL and aggregating results across massive computer clusters. Furthermore, owing to its short deployment cycle and on-demand pricing, Google BigQuery is serverless and extremely scalable.

Key Features of BigQuery 

  • Scalable Architecture: BigQuery has a scalable architecture and offers a petabyte scalable system that users can scale up and down as per load.
  • Faster Processing: Being a scalable architecture, BigQuery executes petabytes of data within the stipulated time and is more rapid than many conventional systems. BigQuery allows users to run analysis over millions of rows without worrying about scalability.
  • Fully-Managed: BigQuery is a product of Google Cloud Platform, and thus it offers fully managed and serverless systems.
  • Security: BigQuery has the utmost security level that protects the data at rest and in flight. 
  • Real-time Data Ingestion: BigQuery can perform real-time data analysis, thereby making it famous across all the IoT and Transaction platforms.
  • Fault Tolerance: BigQuery offers replication that replicates data across multiple zones or regions. It ensures consistent data availability when the region/zones go down.
  • Pricing Models: The Google BigQuery platform is available in both on-demand and flat-rate subscription models. Although data storage and querying will be charged, exporting, loading, and copying data is free. It has separated computational resources from storage resources. You are only charged when you run queries. The quantity of data processed during searches is billed.

Simplify the Google BigQuery ETL & Analysis with Hevo’s No-code Data Pipeline

A fully managed No-code Data Pipeline platform like Hevo Data helps you integrate and load data from 100+ Data Sources (including 40+ Free Sources) to a Data Warehouse/Destination of your choice such as Google BigQuery in real-time in an effortless manner. Hevo with its minimal learning curve can be set up in just a few minutes allowing the users to load data without having to compromise performance. In addition to the 100+ data sources, Hevo also allows integrating data from non-native sources using Hevo’s in-built Webhooks Connector.

Get Started with Hevo for Free

Its strong integration with umpteenth sources allows users to bring in data of different kinds in a smooth fashion without having to code a single line. 

Check out some of the cool features of Hevo:

  • Completely Automated: The Hevo platform can be set up in just a few minutes and requires minimal maintenance.
  • Connectors: Hevo supports 100+ Integrations to SaaS platforms, Files, Databases, BI tools such as Tableau, and Native REST API & Webhooks Connectors. It supports various destinations including Google BigQuery, Amazon Redshift, Snowflake Data Warehouses; Amazon S3 Data Lakes; MySQL, SQL Server, TokuDB, DynamoDB, PostgreSQL Databases to name a few.  
  • Real-Time Data Transfer: Hevo provides real-time data migration, so you can have analysis-ready data always.
  • 100% Complete & Accurate Data Transfer: Hevo’s robust infrastructure ensures reliable data transfer with zero data loss.
  • Scalable Infrastructure: Hevo has in-built integrations for 100+ sources (Including 40+ Free Sources) that can help you scale your data infrastructure as required.
  • 24/7 Live Support: The Hevo team is available round the clock to extend exceptional support to you through chat, email, and support calls.
  • Schema Management: Hevo takes away the tedious task of schema management & automatically detects the schema of incoming data and maps it to the destination schema.
  • Live Monitoring: Hevo allows you to monitor the data flow so you can check where your data is at a particular point in time.
Sign up here for a 14-Day Free Trial!

What are BigQuery Jobs?

BigQuery Jobs are actions BigQuery automatically performs on your behalf. These actions include Loading, Exporting, Querying, or Copying Data.

Using the Cloud Console or the bq command-line tool, BigQuery job resources are automatically created, scheduled, and run when a data load, export, query, or copy occurs. A load, export, query, or copy job can also be created programmatically. BigQuery schedules and runs your job when you create it programmatically.

BigQuery Jobs run asynchronously to keep track of their progress if they take a long time to complete. The job resource does not manage actions that have a shorter duration, such as listing resources or obtaining metadata.

The Permissions Required

You need the bigquery.jobs.create IAM permission to run BigQuery jobs. By granting this permission, you can run BigQuery jobs programmatically, and it can automatically run jobs for you.

To run BigQuery jobs, you will need the following permissions:

  • roles/bigquery.user
  • roles/bigquery.jobUser
  • roles/bigquery.admin

In addition, you are automatically granted the following permissions when you create a job:

  • bigquery.jobs.get
  • bigquery.jobs.update

How To Run Bigquery Jobs Programmatically? 

Using the REST API or Client Libraries, you can run BigQuery jobs programmatically through these stages:

  1. The job.insert method should be called.
  2. To learn when a job is complete, periodically request the job resource and examine the status property.
  3. Verify that the job was completed.

Running Jobs

To execute BigQuery jobs programmatically, follow these steps:

  • Step 1: Call the jobs.insert method to start the job. You need to include a job resource representation when you call the jobs.insert method.
  • Step 2: In the job resource’s configuration section, add a child property that specifies the job type: load, query, extract, or copy.
  • Step 3: Check the job status using jobs.get with the ID and location using the jobs.insert method. To find out what the job’s current status is, look at the status.state value. A DONE state means the job has stopped running, not that it has been completed successfully. A DONE status indicates that the job is no longer running.
  • Step 4: Check the success of the job. If there is an errorResult property, the job has failed. The status.errorResult property contains information about what went wrong during a failed job. In the absence of status.errorResult, the job completed successfully, although there may have been a few non-fatal errors, such as issues importing a few rows in a load job. Nonfatal errors are displayed in the status.errors list of the job.

Using Client Libraries To Run Jobs

Here is an example of how to create and run BigQuery jobs using the Cloud Client Libraries for BigQuery:

using Google.Cloud.BigQuery.V2;
using System;
using System.Collections.Generic;
public class BigQueryCreateJob
{
    public BigQueryJob CreateJob(string projectId = "your-project-id")
    {
        string query = @"
            SELECT country_name from `bigquery-public-data.utility_us.country_code_iso";
        // Create a client that will be used to send requests to the server.
        BigQueryClient client = BigQueryClient.Create(projectId);
        QueryOptions queryOptions = new QueryOptions
        {
            JobLocation = "us",
            JobIdPrefix = "code_sample_",
            Labels = new Dictionary<string, string>
            {
                ["example-label"] = "example-value"
            },
            MaximumBytesBilled = 1000000
        };
        BigQueryJob queryJob = client.CreateQueryJob(
            sql: query,
            parameters: null,
            options: queryOptions);
        Console.WriteLine($"Started job: {queryJob.Reference.JobId}");
        return queryJob;
    }
}
// [END bigquery_create_job]

How To Run Interactive And Batch Query Bigquery Jobs?

The results of a query are saved in either a temporary or permanent table. Depending on your needs, you can either append or overwrite data in an existing table or create a new one if none exists with the same name.

Running Interactive Queries

BigQuery defaults to running interactive query jobs, which means the query is processed as quickly as possible. These queries count towards your Concurrent Rate Limit and Daily Limit.

The following steps will guide you through creating an interactive query that writes to a temporary table:

  • Step 1: Navigate to the BigQuery page in the Cloud Console.
  • Step 2: Then click Compose New Query.
  • Step 3: A valid BigQuery SQL query should be entered in the Query editor text area.
  • Step 4: Click More and then Query settings to change the data processing location. You can choose the location of your data by clicking Auto-select under Processing location. To save your query settings, click Save.
  • Step 5: Click the Run button.

In this case, the output of the Bigquery jobs is stored in a temporary table.

Running Batch Queries

Batch Queries are also available in BigQuery. You can queue batch queries, and BigQuery will run them on your behalf as soon as idle resources in BigQuery’s shared resource pool become available. Most queries are executed within a minute. BigQuery changes the priority of the job to interactive if it hasn’t started the query within 24 hours.

A batch query does not count against your concurrent rate limit so that you can run many at once. The resources used by batch queries are the same as those used by interactive (on-demand) queries. 

The steps for running a batch query are as follows:

  • Step 1: Navigate to the BigQuery page in the Cloud Console.
  • Step 2: Then click the Compose New Query button.
  • Step 3: In the Query editor text area, enter a valid SQL query.
  • Step 4: Click More, then Query settings.
  • Step 5: In the Job Priority section, select the Batch option.
  • Step 6: This step is optional. For Processing Location, click Unspecified and select the location of your data.
  • Step 7: By clicking Save, you can make changes to the query settings.
  • Step 8: Then click Run.

How to Manage BigQuery Jobs? 

Upon submitting BigQuery jobs, you are allowed to view job details, list jobs, cancel a job, repeat a job, or delete job metadata.

The following states may exist depending on the state of the BigQuery jobs at the time they are submitted:

  • PENDING: The job has been scheduled and is awaiting execution.
  • RUNNING: Work is in progress.
  • DONE: The job has been completed. If the job finishes without errors, this status is reported as SUCCESS.BigQuery marks a job as FAILURE if it encounters an error.

Viewing Job Details

Using the Cloud Console, bq command-line tool, or API, you can view details about BigQuery jobs. Details include data and metadata, including the type and state of the job and the user who created it.

You can view job details by following these steps:

  • Step 1: Visit the BigQuery website.
  • Step 2: Choose the type of job you want to view:
    • To see load jobs, export jobs, or copy jobs, go to Job history.
    • Click Query history to view query jobs.
  • Step 3: Choose the type of job history you would like to view:
    • To see your job history, go to Personal history.
    • To see all of the jobs in the project, go to Project history.
  • Step 4: A job’s details can be viewed by clicking on it.

How To List Jobs In A Project?

For six months, BigQuery saves all of a project’s jobs.

The following methods will allow you to view the BigQuery jobs history:

  • You can use the Google Cloud Console.
  • By using the bq ls command.
  • Using the jobs.list API method.
  • Client libraries are used.

You can list BigQuery jobs in any location.

To list jobs in a project, follow these steps:

  • Step 1: Visit the BigQuery page.
  • Step 2: Choose the type of job you want to list:
    • To see a list of load jobs, export jobs, and copy jobs, go to Job history.
    • Click Query history to list query jobs.
  • Step 3: Choose the type of job history you want to view:
    • To see your previous jobs, click Personal history.
    • To see all of the jobs in the project, go to Project history. You might not have permission to view all jobs for a project if you are not the project owner. The most recent jobs appear first.

How to Cancel Jobs?

Follow these steps to cancel RUNNING or PENDING BigQuery jobs:

You can cancel BigQuery jobs by following these steps:

  • Step 1: Visit BigQuery’s page.
  • Step 2: Choose the type of job you want to view:
    • Click Job history to view load jobs, export jobs, and copy jobs.
    • Click Query history to view query jobs.
  • Step 3: Choose the type of job history you want to view:
    • To see your previous jobs, click Personal history.
    • To see all of the jobs in the project, go to Project history.
  • Step 4: If you want to cancel a job, click the job and click Cancel job.

How to Delete Job Metadata?

The bq command-line tool and the BigQuery API allow you to delete the metadata for a specific job. BigQuery preserves a history of jobs performed in the last six months. If your query statement contains sensitive information, you can remove it using this method. Job metadata cannot be removed once the job is finished. When a job creates child jobs, the child jobs are also deleted. Child jobs cannot be deleted. This can only be done by the parent or the top-level job.

Use the bq rm command with the -j flag and a job ID.

  • Step 1: Either long or short form can be used when supplying the job ID. An example of a fully qualified job ID is a job ID listed in the Cloud Console, which includes the project and location:
my-project-1234:US.bquijob_123x456_123y123z123c
  • Step 2: The short form is used for listing job IDs in the command-line tool. It does not include the project ID or location:
bquijob_123x456_123y123z123c
  • Step 3: Provide the –location flag and set the value to the location of the job. If you use a fully qualified job ID, you do not need to use this flag. When the –location flag is included and the fully qualified job ID is used, the –location flag is ignored.
  • Step 4: BigQuery jobs can be deleted with the following command:
bq --location=location 
-project_id=project_id 
rm -j job_id

Conclusion 

In this article, you have seen what BigQuery Jobs is, how to run BigQuery jobs programmatically, manage BigQuery jobs, etc. To learn more, visit the google cloud blog.

Hevo Data, a No-code Data Pipeline provides you with a consistent and reliable solution to manage data transfer between a variety of sources and a wide variety of Desired Destinations with a few clicks.

Visit our Website to Explore Hevo

Hevo Data with its strong integration with 100+ Data Sources (including 40+ Free Sources) allows you to not only export data from your desired data sources & load it to the destination of your choice such as Google BigQuery but also transform & enrich your data to make it analysis-ready. Hevo also allows integrating data from non-native sources using Hevo’s in-built REST API & Webhooks Connector. You can then focus on your key business needs and perform insightful analysis using BI tools. 

Want to give Hevo a try? Sign Up for a 14-day free trial and experience the feature-rich Hevo suite first hand. You may also have a look at the amazing price, which will assist you in selecting the best plan for your requirements.

No-code Data Pipeline for BigQuery