BigQuery is a low-cost database provided by Google. In BigQuery, you only pay for the resources that you use rather than paying through a subscription-based model. This helps enterprises to save on storage and compute resources. BigQuery allows you to store up to petabytes of data and query it without infrastructure or a database administrator. These features have seen BigQuery adopted by many enterprises for data storage and analytics.
When using BigQuery, you will need to load data from different sources. The sources can be other database management systems, CSV files, Excel files, and more. This should be done with a high degree of accuracy. Luckily, BigQuery provides the bq load command-line tool that can assist you in this. You can use this command-line tool to load data from different sources and in different formats. In this article, we will be discussing the bq load command-line tool.
Table of Contents
This is what you need for this article:
What is Google BigQuery?
Google BigQuery is a highly scalable, serverless data warehouse with a built-in query engine. It was developed by Google, hence, it uses the processing power of Google’s infrastructure. . The query engine can run SQL queries on terabytes of data within seconds, and petabytes within minutes. BigQuery gives you this performance without the need to maintain the infrastructure or rebuild or create indexes.
BigQuery’s speed and scalability make it suitable for use in processing huge datasets. It also comes with built-in machine learning capabilities that can help you to understand your data better.
With BigQuery, you can achieve the following:
- Democratize insights with a scalable and secure platform that comes with machine learning features.
- Improve business decision-making from data using a multi-cloud and flexible analytics solution.
- Adapt to data of any size, from bytes to petabytes, with no operational overhead.
- Run large-scale analytics.
BigQuery also allows you to create dashboards and reports that you can use to analyze your data and gain meaningful insights from it.
It is also a powerful tool for real-time data analytics.
Hevo Data is a No-code Data Pipeline that offers a fully managed solution to set up data integration from (one of the 40+ Free Data Sources) and 100+ Data Sources and will let you directly load data to a Data Warehouse like Google BigQuery or the destination of your choice. It will automate your data flow in minutes without writing any line of code. Its fault-tolerant architecture makes sure that your data is secure and consistent. Hevo provides you with a truly efficient and fully automated solution to manage data in real-time and always have analysis-ready data.
Get Started with Hevo for free
Let’s look at some of the salient features of Hevo:
Sign up here for a 14-day Free Trial!
- Fully Managed: It requires no management and maintenance as Hevo is a fully automated platform.
- Data Transformation: It provides a simple interface to perfect, modify, and enrich the data you want to transfer.
- Real-Time: Hevo offers real-time data migration. So, your data is always ready for analysis.
- Transformations: Hevo provides preload transformations through Python code. It also allows you to run transformation code for each event in the Data Pipelines you set up. You need to edit the event object’s properties received in the transform method as a parameter to carry out the transformation. Hevo also offers drag and drop transformations like Date and Control Functions, JSON, and Event Manipulation to name a few. These can be configured and tested before putting them to use.
- Connectors: Hevo supports 100+ Integrations to SaaS platforms such as files, databases, analytics, and BI tools. It supports various destinations including Amazon Redshift, Firebolt, Google BigQuery, and Snowflake Data Warehouses; Databricks, Amazon S3 Data Lakes; and MySQL, SQL Server, TokuDB, DynamoDB, PostgreSQL databases to name a few.
- Schema Management: Hevo can automatically detect the schema of the incoming data and map it to the destination schema.
- Scalable Infrastructure: Hevo has in-built integrations for 100’s of sources that can help you scale your data infrastructure as required.
- Live Monitoring: Advanced monitoring gives you a one-stop view to watch all the activities that occur within Data Pipelines.
- Live Support: Hevo team is available round the clock to extend exceptional support to its customers through chat, email, and support calls.
Understanding the BQ Command Line Tool Setup
You can use the bq load command to load data into BigQuery via the command line. The tool expects you to specify the values of various parameters and it will load the data into BigQuery tables. In this section, we will be discussing how to load a local data file into BigQuery using the bq load command-line tool.
Step 1: Setting Up BQ Command Line Tool
You should enable BigQuery before doing anything else. You will need a Google account for this. Do the following:
Step 1: Sign in to Google cloud console and open BigQuery. You can also paste the following URL on your web browser:
Step 2: Accept their terms of service.
Step 3: Create a new project by following the on-screen prompts. Note that the name of the project should be unique across all Google Cloud projects.
Next, we should activate the cloud shell, a command-line environment that runs in the Google cloud. Follow the steps given below:
Step 1: Click the “Activate Cloud Shell” icon from the cloud console.
Step 2: If you have never opened it before, you will be presented with an intermediate screen describing what it is. In that case, click the “Continue” button.
It will connect to the cloud shell after some moments.
Step 3: Run the following command on the cloud shell to check whether you are authenticated:
gcloud auth list
The command will return the details of the active account. If prompted to authorize, click the “Authorize” button.
Step 4: Use the following command to check whether the gcloud command is aware of your project:
gcloud config list project
The command will return the list of the available projects. If you don’t see your target project, set it using the following command:
gcloud config set project <PROJECT_ID>
Step 2: Creating a Dataset for BQ Load
A dataset contains tables. Use the bq mk command to create a new dataset named “bq_load_dataset”:
bq mk bq_load_dataset
We will load data from a CSV file using the bq load command into the dataset.
You can confirm whether the dataset was created successfully by viewing its properties. Use the bq show command as shown below:
bq show bq_load_dataset
Step 3: Creating the Data File for BQ Load
Although the bq load command can load data from different data formats, we will be loading data from a CSV file. Let’s use the touch command to create the CSV file:
Let’s open the file in an editor window:
cloudshell edit data.csv
The file will be opened in a code editor window. Enter your data values into the code editor window and separate them by commas:
Step 5: Click “File” and then “Edit” to save the file.
Step 4: Loading the Data for BQ Load
We want to load the data from the CSV file into a BigQuery table. We will use the bq load command for this as shown below:
The “source_format” parameter helped us to specify the type of data format, which is CSV (Comma Separated Values).
The “skip_leading_rows” parameter helped us to skip the first row which is a header row.
We have then specified the file from which the data should be loaded and the data types of the various columns in the file.
Now that the data has been loaded into the table, let us view the table details:
bq show bq_load_dataset.customer_data
The command will return the details of the table including when it was last modified, schema, total rows, and total bytes.
You will have successfully loaded your CSV file into the BigQuery table using the bq load command.
In this article, you’ve learned more about Google BigQuery and the steps required to load data from a CSV file into a BigQuery table using the bq load command-line tool.
Visit our Website to Explore Hevo
Hevo Data provides its users with a simpler platform for integrating data from 100+ sources (including 40+ Free Sources) for Analysis. It is a No-code Data Pipeline that can help you combine data from multiple sources. You can use it to transfer data from multiple data sources into your Data Warehouse, Database, or a destination of your choice like Google BigQuery. It provides you with a consistent and reliable solution to managing data in real-time, ensuring that you always have Analysis-ready data in your desired destination.
Want to take Hevo for a spin? Sign Up for a 14-day free trial and experience the feature-rich Hevo suite first hand. You can also have a look at our unbeatable pricing that will help you choose the right plan for your business needs!