Today, companies generate, store and manage huge volumes of data. Storing and querying such volumes of data can be costly and time-consuming, especially for a company that doesn’t have the appropriate Infrastructure. To overcome this hurdle Google introduced Google BigQuery which is an enterprise Data Warehouse that leverages the processing power of Google’s Infrastructure to enable super-fast SQL queries. It allows you to move data from your database to Google BigQuery for optimized performance.
As JavaScript has influenced software trends over the last decade, JSON continues to get more attention than any other data exchange format which is the reason most of the data stored today by companies are often in JSON format and you might often need to migrate data from JSON to BigQuery.
Upon a complete walkthrough of this article, you will gain a decent understanding of Google BigQuery along with the salient features that it offers. You will also learn about the steps involved in migrating data from JSON to BigQuery in the simplest manner. Read along to learn more about the process of data migration from JSON to BigQuery!
Prerequisites
- Basic hands-on experience with Google Cloud Console.
Introduction to Google BigQuery
Google BigQuery is a robust and fully managed Data Warehousing Service from Google, based on a Massively Parallel Processing Architecture that allows users to query enormous amounts of data in real-time.
In addition, it houses a comprehensive SQL layer that supports fast processing for a diverse range of analytical queries and has strong integration support with numerous Google applications and services such as Google Sheets, Google Drive, etc. Google BigQuery is Serverless and built to be highly scalable.
Google utilizes its existing Cloud architecture to successfully manage a serverless design. It also makes use of different data models that gives users the ability to store dynamic data.
Key Features of Google BigQuery
Some of the key features of Google BigQuery are as follows:
- Scalability: To provide consumers with true Scalability and consistent Performance, Google BigQuery leverages Massively Parallel Computing and a Highly Scalable Secure Storage Engine. The entire Infrastructure with over a thousand machines is managed by a complex software stack.
- Serverless: The Google BigQuery Serverless model automatically distributes processing across a large number of machines running in parallel, so any company using Google BigQuery can focus on extracting insights from data rather than configuring and maintaining the Infrastructure/Server.
- Storage: Google BigQuery uses a Columnar architecture to store mammoth scales of data sets. Column-based Storage has several advantages, including better Memory Utilization and the ability to scan data faster than typical Row-based Storage.
- Integrations: Google BigQuery as part of the Google Cloud Platform (GCP) supports seamless integration with all Google products and services. Google also offers a variety of Integrations with numerous third-party services, as well as the functionality to integrate with application APIs that are not directly supported by Google.
Introduction to JSON Files
JSON stands for JavaScript Object Notation. It is a popular Data Serialization format that is easy for humans to read and write, and easy for machines to parse and generate as well. The JSON file format is derived from the JavaScript Programming Language Standard ECMA262 3rd Edition. It is mainly used to transfer data between a Server and a Web Application and was originally developed as an alternative to XML. The data in JSON format is stored in Key-Value pairs. JSON can store various types of data such as Arrays, Objects, Strings, etc. In the later section of this article, you will learn about the steps involved in migrating data from JSON to BigQuery.
Required Permissions to Load Data from JSON to BigQuery
If you want to load data from JSON to BigQuery, you will need some permissions that will let you load data into new or pre-existing BigQuery tables. You also need permission to access the bucket that contains your data in case you are loading data from Google Cloud Storage. These permissions are required when loading data into a new table or partition, or if you are appending or overwriting a table or partition. At least, the following permissions are required to load data from JSON to BigQuery:
- bigquery.tables.create
- bigquery.tables.updateData
- Bigquery.jobs.create
- Bigquery.jobs.create
You must have storage.objects.get permissions to load data from a Google Cloud Storage bucket. If you are using a URI Wildcard, you must also have storage.objects.list permissions to load data from JSON to BigQuery.
Steps to Load Data from JSON to BigQuery
You can load newline delimited JSON data from Google Cloud Storage into a new BigQuery table by using several ways but using the Cloud Console is the simplest among them.
Follow the steps given below to load JSON data from Google Cloud Storage into a BigQuery Table:
- Step 1: Open the Google BigQuery Page in the Cloud Console.
- Step 2: Navigate to the Explorer panel, click on Project and select a dataset.
- Step 3: Expand the Actions option and click on Open.
- Step 4: In the Detail Panel of the console, click on the Create Table button to create a new Google BigQuery table.
- Step 5: Once the Create Table Page opens, you will be prompted to fill three fields.
- Step 6: In the source field i.e (Create Table From), select Google Cloud Storage.
- Step 7: Select the File that you wish to upload from the Google Cloud Storage Bucket.
- Step 8: In File Format, select the format of the file that you wish to upload which is JSON in this case.
- Step 9: For Dataset Name, choose the appropriate Dataset and make sure that the table type is set to Native table.
- Step 10: In the Schema section, select Auto-detection. When Automatic Detection is enabled, Google BigQuery starts the inference process by selecting a random file in the data source and scanning up to the first 500 rows of data to use as a representative sample. Google BigQuery then examines each field and tries to assign a data type to it based on the sample values. You can also enter Schema definition manually by enabling Edit as Text option and entering the table schema as a JSON array.
- Step 11: If you manually create a Schema, click on Add Field to manually input the schema.
- Step 12: Once you have created the Schema, click on the Create Table button to create the Google BigQuery Table.
Once you follow all the above steps in the correct sequence, you will be able to migrate data from JSON to BigQuery.
Conclusion
In this article, you learned about the steps involved in loading data from JSON to BigQuery from scratch. You also learned about the key features of Google BigQuery. To carry out an in-depth analysis of your project you would often need to extract data in JSON format from multiple sources to have all insights. Integrating and analyzing your data from a diverse set of data sources can be challenging and this is where Hevo Data comes into the picture.
Share your experience of migrating data from JSON to BigQuery. Tell us in the comments below!
Rakesh is a research analyst at Hevo Data with more than three years of experience in the field. He specializes in technologies, including API integration and machine learning. The combination of technical skills and a flair for writing brought him to the field of writing on highly complex topics. He has written numerous articles on a variety of data engineering topics, such as data integration, data analytics, and data management. He enjoys simplifying difficult subjects to help data practitioners with their doubts related to data engineering.