Databases support CSV files on a regular basis. BigQuery, Google’s data warehouse as a service, combines data storage and analytics in one package. BigQuery allows you to conduct real-time SQL queries on billions of records. Stakeholders are always looking to find faster and better ways to get all their data from all their sources into BigQuery.
A very popular and semantically easy data format these days is CSV. It can store data from databases, clickstreams, browsing trails, social media interactions, page views, and a multitude of sources. In this article, you will see 4 ways to move data from CSV to BigQuery. Read along to select the method that works best for your business!
What is BigQuery?
Google BigQuery is a fully managed, serverless cloud data warehouse designed for high performance and scalability. It uses Dremel technology, which optimizes SQL queries through a tree structure and stores data in a columnar format for efficient processing. This architecture allows users to analyze massive datasets quickly without managing infrastructure.
BigQuery supports ANSI SQL, making it easy for users to write queries and integrate with BI tools for data visualization. With BigQuery ML, users can train and run machine learning models using SQL. Its ability to handle petabytes of data makes it ideal for analytics and enterprise-level data processing.
Why move data from CSV to BigQuery?
- Enhances overall efficiency: Uploading CSV files to BigQuery simplifies data management and enhances the efficiency of your analytical workflows, making it easier to handle and analyze large datasets.
- Performance: BigQuery is designed to handle massive volumes of data efficiently, offering quick query execution that reduces the time needed to gain insights from your data.
- Advanced Analytics: BigQuery provides advanced analytics tools, such as ML and spatial data analysis, which deliver deeper insights to inform wise decision-making.
- Cost-Effective: BigQuery’s pay-as-you-go pricing approach ensures you only pay for the storage and queries you use, eliminating the need for expensive hardware or software.
- Scalability: Features like real-time analytics, on-demand storage scaling, BigQuery ML, and optimization tools make it easier to manage and scale your data analysis processes as needed.
Seamlessly migrate your data to BigQuery. Hevo elevates your data migration game with its no-code platform. Ensure seamless data migration using features like:
- Seamless integration with your desired data warehouse, such as BigQuery.
- Transform and map data easily with drag-and-drop features.
- Real-time data migration to leverage AI/ML features of BigQuery.
Still not sure? See how Postman, the world’s leading API platform, used Hevo to save 30-40 hours of developer efforts monthly and found a one-stop solution for all its data integration needs.
Get Started with Hevo for Free
What Are The Methods To Load Data From CSV to BigQuery?
Method 1: Using Hevo Data
Step 1: Configure Google Sheets as your Source.
Step 2: Configure BigQuery as your Destination details
After the pipeline has completed ingesting, you can preview your loaded table by opening BigQuery and previewing it.
Method 2: Using Command Line Interface
Step 1: Google Cloud Setup
Step 1. a) Install the gcloud command line interface
Step 1. b) Authenticate yourself to Google Cloud. For that, run a command:
gcloud auth login
Step 1. c) Sign in with your account and give all permissions to Google Cloud SDK.
After you complete all these steps, this window should appear:
Step 2: Command Prompt Configurations
Step 2. a) Run command:
bq
This will enter you in Google’s big query.
Step 2. b) To check the number of datasets in your projects, run the command:
bq ls <project_id>:
Step 2. c) Now, we need to create a new dataset. In order to do so, you can run the command:
Bq mk <new_dataset_name>
Step 3: Load the data into the dataset
Step 3. a) First, go to your cloud SDK directory and add the CSV file that you want to upload. In my case, this is the directory.
Step 3. b) Now, to load the file, go back to your command line and run the following command:
bq load --source_format=CSV example_cli.bank demo.csv ID: integer,NAME:string,AGE:integer
The output of this command will be Upload Complete.
Step 4: Preview of the data
Step 4. a) To check whether the table has been created, you can use the following command:
bq ls example_cli
Step 4. b) If you want to see the schema of the table, you can do so by using the command:
bq show <dataset_name>.<table_name>
Step 4. c) To preview your table, go back to BigQuery, go to tables, and click on preview.
Integrate Google Sheets to BigQuery
Integrate REST API to BigQuery
Integrate Salesforce to BigQuery
Integrate Webhooks to BigQuery
Method 3: Using BigQuery Web
Step 1: Create a new Dataset
To create a new dataset in BigQuery, go to your BigQuery studio, click on the three dots beside your project ID, and click Create Dataset.
Step 2: Create a Table
To create a new table, click on the three dots next to your dataset name and click on Create Table.
Note:
- Keep the file format as CSV.
- Turn on auto-detect. This will automatically detect the incoming table schema and generate a schema accordingly.
- Change create table from to Upload. Upload the file that you want to load into BigQuery.
Step 3: Preview Table
Step 3. a) You can edit the schema you just created. To do so, click on the table name and click on the schema tab. You can make changes to the schema as you like from here, and finally, click on edit schema.
Step 3. b) To preview the table, click on the preview tab beside the details tab.
Method 4: Using Web API
Step 1: Configure BigQuery API
Go to Google Cloud Console and look for APIs and services. Search for BigQuery and click on BigQuery API.
Note: Make sure that the service is enabled.
Step 2: Configuring the Python script
Open your code editor and type the given Python script.
import time
from google.cloud import bigquery
import os
from google.cloud import bigquery
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = ""
client = bigquery.Client()
job_config = bigquery.LoadJobConfig(
source_format=bigquery.SourceFormat.CSV,
skip_leading_rows=1,
autodetect=True,
)
table_id = ""
with open(r'demo.csv', "rb") as source_file:
job = client.load_table_from_file(source_file, table_id, job_config=job_config)
while job.state != 'DONE':
time.sleep(2)
job.reload()
print(job.state)
print(job.result())
table = client.get_table(table_id)
print(
"Loaded {} rows and {} columns to {}".format(
table.num_rows, len(table.schema), table_id
)
)
Migrate Google Sheets to BigQuery within Minutes!
No credit card required
Note:
- In Google application credentials, provide your service account’s json key.
- In
table_id.
Give a new table name for what you want to create.
Step 4: Preview the table
You can now preview the table you created by following the steps mentioned in this blog.
Limitations of Moving Data from CSV to BigQuery
- Nesting and repetitive data are not supported in CSV files.
- BOM (byte order mark) characters should be removed. They may result in unanticipated consequences.
- BigQuery cannot read the data in parallel if you use gzip compression. Importing compressed CSV data into BigQuery takes longer than loading uncompressed data. See Loading compressed and uncompressed data for further information.
- You can’t use the same load job to load compressed and uncompressed files.
- A gzip file can be up to 4 GB in size.
Learn More on How to Unload and Load CSV to Redshift and Import CSV File Into PostgreSQL Table
Conclusion
Loading CSV data into BigQuery is a powerful way to enhance your data management and analytical capabilities. Whether dealing with large datasets, requiring real-time analytics, or leveraging advanced analytics tools like BigQuery ML, importing your CSV data into BigQuery can significantly streamline your workflows. This article provides a step-by-step guide for setting up an import CSV to BigQuery connection using four different methods.
The first three methods are manual and can become time-consuming. Also, writing custom scripts requires high-level code knowledge, which not everyone may be familiar with. To avoid the manual part and automate the entire process, you can always look up Hevo Data and sign up for a 14-day free trial.
FAQ
How do I append a CSV file to a BigQuery table?
You can append additional data to an existing table by performing a load-append operation.
What is the fastest way to load data into BigQuery?
Bulk Insert into BigQuery is the fastest way to load data.
How to connect data to BigQuery?
On your computer, open a spreadsheet in Google Sheets.
In the menu at the top, click Data Data connectors. Connect to BigQuery.
Choose a project.
Click Connect.
How do I export CSV to storage in BigQuery?
Open the BigQuery page in the Google Cloud console. In the Explorer panel, expand your project and dataset, then select the table. In the details panel, click Export and select Export to Cloud Storage.
Share your thoughts on loading data from CSV to BigQuery in the comments!
Pratik Dwivedi is a seasoned expert in data analytics, machine learning, AI, big data, and business intelligence. With over 18 years of experience in system analysis, design, and implementation, including 8 years in a Techno-Managerial role, he has successfully managed international clients and led teams on various projects. Pratik is passionate about creating engaging content that educates and inspires, leveraging his extensive technical and managerial expertise.