Do you use Google BigQuery to store your data? Do you want to export your data from Excel to BigQuery? If yes, then this blog will answer all your queries. Excel is one of the most common spreadsheet software that supports mathematical and statistical calculations, graphs, charts, tables, etc.
You can load data from multiple sources in Google BigQuery to gain better insights. In this blog, you will learn about Excel, Google BigQuery, and different approaches to connect Excel to BigQuery. You will also go through the limitations of these methods.
Introduction to Excel
Excel is the world’s most popular spreadsheet software for decades over the world. It allows complex mathematical and statistical calculations, drawing of graphs and tables, custom programming, and integration with popular software.
Many companies and individuals have huge volumes of data, accumulated over the years, in Excel. As most of them would like to generate insightful analytics from that data, they need to load all that data into BigQuery.
Introduction to Google BigQuery
Google BigQuery is an enterprise data warehouse that allows storing and querying large volumes of data. It analyzes petabytes of data using ANSI SQL at blazing-fast speeds. It can store data from databases, clickstreams, browsing trails, social media interactions, page views, and a multitude of sources.
BigQuery takes advantage of the advancements in storage and processing power by storing nested and aggregated data inside records themselves to maintain complex relationships and avoid joins. It makes BigQuery very popular. People are keen to find faster and better ways of loading their data from multiple sources, into BigQuery. Some methods to do the same are listed in the blog.
Did you know that 75-90% of data sources you will ever need to build pipelines for are already available off-the-shelf with No-Code Data Pipeline Platforms like Hevo?
Ambitious data engineers who want to stay relevant for the future automate repetitive ELT work and save more than 50% of their time that would otherwise be spent on maintaining pipelines. Instead, they use that time to focus on non-mediocre work like optimizing core data infrastructure, scripting non-SQL transformations for training algorithms, and more.
Step off the hamster wheel and opt for an automated data pipeline like Hevo. With a no-code intuitive UI, Hevo lets you set up pipelines in minutes. Its fault-tolerant architecture ensures zero maintenance. Moreover, data replication happens in near real-time from 150+ sources to the destination of your choice including Snowflake, BigQuery, Redshift, Databricks, and Firebolt.
Start saving those 20 hours with Hevo today.
Get started for Free with Hevo!
Method 1: Load Data from Excel to BigQuery Using CSV
Download the Cheatsheet on How to Set Up High-performance ETL to BigQuery
Learn the best practices and considerations for setting up high-performance ETL to BigQuery
The steps to upload Excel file to Biquery using CSV are listed below:
- You can go to your Web console and click “Create table” and then “Create a table from”.
- Next, you can specify the CSV file, which will act as a source for your new table.
Image Source
- The “Source” dropdown will let you select amongst various sources like Cloud storage.
- In “File format”, select CSV.
- Select a database and give your table a name.
- You can either upload a sample JSON to specify the schema or leave the schema definition to “auto-detect”.
Image Source
Some other configurable parameters are field delimiter/skip header rows/number of errors allowed/jagged rows etc.
- Clicking on “Create Table” will now fetch your CSV, ascertain the schema, create the table, and populate it with the CSV data.
Limitations
Some limitations of using CSV to connect Excel to Google BigQuery are listed below:
- Files must be loaded individually, no wildcards or multiple selections are allowed when you load files from a local data source.
- Excel files loaded from a local data source must be 10 MB or less and must contain fewer than 16,000 rows.
Method 2: Upload XLSX to BigQuery Using BigQuery API
The BigQuery API allows you to store data in the cloud from various sources, including Excel. BigQuery API allows you to upload files via the “Multipart Method”, which is a good fit for smaller files, where an unsuccessful upload starts again from the beginning.
For larger files, the “Resumable Upload” method can be used, which allows you to create a session and resume partially completed uploads from where they were interrupted. The downside of this strategy is that you will need to have developer resources and will need to adjust your programs in the future.
Method 3: Load Data from Excel to BigQuery Using Hevo Data
Hevo is a No-code Data Pipeline. It supports pre-built integration from 150+ data sources. It allows you to load Excel/CSV files from various sources, like S3, FTP/SFTP, Google Drive, Box, etc., and load them into BigQuery. With Hevo, you can easily create an ETL pipeline using multiple Excel sheets and load that data into BigQuery or any warehouse of your choice. Hevo also allows you to perform transformations and preprocess your Excel data, before loading it into BigQuery.
Many limitations listed in the other methods do not exist or are managed easily by Hevo.
GET STARTED WITH HEVO FOR FREE
Some of the key features of Hevo Data are listed below:
- Fully Managed: It requires no management and maintenance as Hevo is a fully automated platform.
- Data Transformation: It provides a simple interface to perfect, modify, and enrich the data you want to transfer.
- Real-Time: Hevo offers real-time data migration. So, your data is always ready for analysis.
- Schema Management: Hevo can automatically detect the schema of the incoming data and maps it to the destination schema.
- Live Monitoring: Advanced monitoring gives you a one-stop view to watch all the activities that occur within pipelines.
- Live Support: Hevo team is available round the clock to extend exceptional support to its customers through chat, email, and support calls.
SIGN UP HERE FOR A 14-DAY FREE TRIAL!
Method 4: Load Data from Excel to BigQuery Using DataPrep
DataPrep is a cloud-based data service to visually explore, prune, pre-process, and prepare data for analysis and machine learning. Cloud Dataprep can directly import Excel workbooks and folders containing workbooks, as datasets.
Steps to load data from Excel to BigQuery using DataPrep are listed below:
- Step 1: Import Data in DataPrep
- Step 2: Load Data from DataPrep to BigQuery
Step 1: Import Data in DataPrep
- On the DataPrep console, inside the Library page, click “Import Data”.
- Select the Excel Workbook that you need to import.
- By default, all worksheets in the workbook are imported as individual datasets. You can change how data imports from the EDIT option (e.g. you can select to import all worksheets to a single dataset).
- After you add the dataset, you can edit the name and description information for them.
Step 2: Load Data from DataPrep to BigQuery
- If you want any data transformations, define a recipe in DataPrep that works on the newly created datasets.
- In the left navigation bar, click the Jobs icon.
- It opens the Jobs Definition page, click the job identifier to open the job on the Job Details page.
- Click the Output Destinations tab and select BigQuery as the destination.
Image Source
Limitations
Some limitations of using DataPrep for loading data from Excel to BigQuery are listed below:
- If your data in Excel cells has quotes, be very particular to have matching terminating quotes, else it could lead to undefined results.
- Compressed and Password protected files are not supported.
- Object and Array data types in DataPrep, are written back to BigQuery as string values. Hence, try not to make them nested, in the source Excel files.
- BigQuery does not support destinations with a dot (.) in the name.
Method 5: Load Data from Excel to BigQuery Using Cloud Storage
You can upload your Excel data to Google Storage Cloud, which will store it in CSV format. Next, you have the following methods to load this data into BigQuery:
- Using the “bq load” command, via the command line.
- Using the Cloud console or WebUI (you can specify a JSON file containing the desired schema definition).
- Using jobs.insert API method.
- Using client libraries (custom programming) for Java/Python/C#/NodeJS etc.
Limitations
Some limitations to take care of would be:-
- All your datum must be singular values, nested or repeated data is not supported.
- You cannot use both uncompressed and compressed files, together in a single load job.
- Your DATE columns must have the “-” separator only, and the only supported format is YYYY-MM-DD (year-month-day).
- Same for TIMESTAMP, additionally, the hh:mm: ss (hour-minute-second) portion of the timestamp must use a colon (:) separator.
Conclusion
In this blog, you have learned about Excel and Google BigQuery. You also learned about five different approaches to load data from Excel to BigQuery. You can use any of the stated methods according to your requirements and business needs. All the methods encounter some limitations. So, if you are looking for a fully automated solution to load data from Excel to BigQuery, then try Hevo.
Hevo is a No-code Data Pipeline. It supports pre-built integration from 150+ data sources. Hevo provides you with a completely automated solution within minutes.
VISIT OUR WEBSITE TO EXPLORE HEVO
Want to take Hevo for a spin?
SIGN UP and experience the feature-rich Hevo suite first hand.
Tell us about your experience of loading data from Excel to BigQuery in the comment section below.