Unlock the full potential of your data by integrating it seamlessly with BigQuery. With Hevo’s automated pipeline, get data flowing effortlessly. Watch our 1 minute demo below to see it in action!

Automated data pipelines that stream data to your BigQuery warehouse provide real-time data updates, enabling you to make more accurate data-driven decisions. This specifically helps organisations that deal with financial or security-related data, helping them track fraud detection, user behaviour, and real-time analytics dashboards. To help you achieve this, we have provided you with 2 step-by-step methods can help in streaming data to BigQuery.

Ease your BigQuery Integrations with Hevo!

Streamline your BigQuery data pipelines with Hevo’s no-code platform. Whether you’re consolidating data from multiple sources or enabling real-time analytics, Hevo simplifies the entire process from start to finish.

  • End-to-End Automation: From ingestion to transformation, everything runs on autopilot.
  • Plug-and-Play Connectors: Integrate 150+ data sources into BigQuery without writing a single line of code.
  • Real-Time Replication: Keep your BigQuery warehouse always up-to-date with live data sync.
  • Auto Schema Mapping: Let Hevo intelligently map and adjust schema changes as your data evolves.

Try Hevo and discover why 2000+ customers have chosen Hevo and how 40+ teams across industry verticals use Hevo to power their analytics stack..

Get Started with Hevo for Free

Methods to DataStream to BigQuery

Method 1: DataStream to BigQuery using Hevo’s No-code Data Pipelines

Step 1: Configure the Source

Select PostgreSQL as Source.

      Step 2: Configure BigQuery as your Destination

      Configure BigQuery as the Destination.
      • Click “Save and Continue” to run the pipeline. Your ETL Pipeline with BigQuery as your Destination is configured with only 2 simple steps.
      Integrate PostgreSQL to BigQuery
      Integrate MySQL to BigQuery
      Integrate MongoDB to BigQuery

      Method 2: DataStream to BigQuery using Python

      Required Permissions

      To stream data into BigQuery, you will need the following IAM permissions:

      • bigquery.tables.updateData (lets you insert data into the table)
      • bigquery.tables.get (lets you obtain table metadata)
      • bigquery.datasets.get (lets you obtain dataset metadata)
      • bigquery.tables.create (required if you use a template table to create the table automatically)

      Prerequisites

      Here are certain things that you should have before streaming the data:

      • Make sure you have write access to the dataset that contains your destination table.
      • You need to have a billing account in Google Cloud Storage.
      • Grant Identity and Access Management (IAM) roles that give users the necessary permissions.

      Step 1: Installing the Python Dependency for Google BigQuery

      • To start Streaming Data to BigQuery using Python, you first need to install the Python dependency for Google BigQuery on your system. To do this, you can make use of the pip install command as follows:
      pip install --upgrade google-cloud-BigQuery
      • This is how you can install the Google BigQuery dependency for Python on your system.

      Step 2: Creating the Service Account Key

      • Once you’ve installed the Python dependency, you now need to create a service key for your Google BigQuery instance, which will help provide access to your Google BigQuery data.
      • To do this, go to the official website of Google Search Console and log in with your credentials, such as your username and password. You can also directly log in with your Google account, associated with the Google BigQuery database.
      • Once you’ve logged in, click on the create button to start configuring your service key.
      • Here, you will need to download the key in JSON format and save it on your system. To do this, select the JSON option found in the key-type section and then click on create.
      • The service key file will now start downloading on your system. Save the file and safely copy the path of this location.
      • Once you’ve successfully downloaded the file, you need to configure the environment variable, thereby allowing Python’s BigQuery client to access your data. To do this, add the path of your JSON file to the environment variable as follows:
      export GOOGLE_APPLICATION_CREDENTIALS="/home/user/keys/google-service-key
      .json"

      This is how you can create and configure the account service key for your Google BigQuery instance.

      Step 3: Coding the Python Script to Stream Data to Google BigQuery

      • With your service key now available, you can start building the Python script for streaming data to BigQuery tables. To do this, import the necessary libraries as follows:
      from google.cloud import BigQuery
      • Once you’ve imported the libraries, you need to initialise the client for your Google BigQuery instance to set up the connection. You can use the following line of code to do this:
      client = bigquery.Client()
      • With your connection now up and running, you now need to configure the Google BigQuery table in which you want to insert the data. To do this, you can use the following lines of code:
      table_name = "<your fully qualified table name >"
      insert_rows = [
          {u"firstname": u"Arsha",u”lastname”:u”richard”,u"age": 32},
          {u"firstname": u"Shneller",u”lastname”:u”james”,u"age": 39},
      ]
      • You can now start loading your data into your Google BigQuery table using the Python script as follows:
      errors = client.insert_rows_json(table_name, insert_rows)  
      if result == []:
          print("Added data")
      else:
          print("Something went wrong: {}".format(result))

      This is how you can use Python’s Google BigQuery dependency to start Streaming Data to BigQuery.

      Troubleshooting stream inserts

      Here are some errors that you may encounter while streaming data into BigQuery:

      #1 Error:

      google.auth.exceptions.AuthenticationError: Unauthorized

      Solution: Ensure that your service account credentials or OAuth tokens are correctly configured and have the necessary permissions (roles/bigquery.dataEditor, roles/bigquery.admin) to write data to the specified dataset and table in BigQuery.

      #2 Error:

      google.api_core.exceptions.NotFound: 404 Not found.

      Solution: Verify that the dataset and table IDs specified in your Python code match the actual IDs in your BigQuery project.

      Limitations of Streaming Data to Google BigQuery Manually

      • Streaming data to BigQuery requires you to write multiple custom integration-based code snippets to establish a connection, thereby making it challenging, especially when the data source is not a Google service.
      • This method of streaming data to BigQuery fails to handle the issue of duplicate records. It requires you to write custom code that will help identify the “duplicate” rows and, help remove them from the database after the insertion process.
      • Google BigQuery follows a complex quota-based policy that considers numerous factors such as, whether you’re using the de-duplication feature, etc. It thus requires you to prepare your code carefully with the quota policy in mind.

      Conclusion

      In this blog, we have discussed two methods by which you can stream our data to BigQuery. While using Python scripts provides flexibility and control over data pipelines, it can pose challenges such as managing scalability, handling schema changes dynamically, and ensuring robust error handling and monitoring. Hevo, on the other hand, simplifies these complexities with its managed service approach. It automates schema detection and evolution, handles scalability seamlessly with auto-scaling capabilities, and provides built-in error handling and monitoring.

      Sign up for a 14-day free trial with Hevo and streamline your data integration. Also, check out Hevo’s pricing page for a better understanding of the plans.

      Discover how connecting BigQuery to Python can enhance your data operations. Our guide walks you through the process with clear steps for seamless integration.

      FAQs to stream data to BigQuery

      1. Can you stream data to BigQuery?

      Yes, you can stream data into BigQuery using its streaming inserts feature.

      2. Does BigQuery support streaming inserts?

      Yes, BigQuery supports streaming inserts. It allows you to stream individual rows of data into BigQuery tables in real-time using the BigQuery Streaming API. 

      3. What is streaming data in big data?

      Streaming data in the context of big data refers to continuously flowing, real-time data that is generated continuously and needs to be processed and analyzed in near real-time.

      4. What is a datastream in GCP?

      In Google Cloud Platform (GCP), a Datastream refers to a managed service that enables real-time data integration and replication from various sources to Google Cloud destinations.

      Talha
      Software Developer, Hevo Data

      Talha is a Software Developer with over eight years of experience in the field. He is currently driving advancements in data integration at Hevo Data, where he has been instrumental in shaping a cutting-edge data integration platform for the past four years. Prior to this, he spent 4 years at Flipkart, where he played a key role in projects related to their data integration capabilities. Talha loves to explain complex information related to data engineering to his peers through writing. He has written many blogs related to data integration, data management aspects, and key challenges data practitioners face.