Data analytics involves storing, managing, and processing data from different sources and analyzing it thoroughly to develop solutions for our business problems. While JSON helps to interchange data between different web applications and sources through API connectors, Snowflake assists you in analyzing that data with its intuitive features. Therefore, JSON Snowflake data migration is crucial for analyzing semi-structured datasets.

This article will elucidate two methods to help you integrate data from a JSON document into Snowflake.

Overview of JSON 

JSON is a text-based format widely used to exchange data between different applications. It is mainly built on two data structures: objects and arrays enclosed inside different brackets. JSON’s minimal syntax makes the files smaller and more accessible to transmit. It is versatile, as you can use it with varied programming languages. JSON also supports complex data structures, making it one of the best choices for exchanging data in modern development.

Overview of Snowflake

Snowflake logo

Snowflake is a cloud platform for large-scale data processing. It allows you to create databases within data warehouses. You can host your data warehouse on different platforms, such as AWS, Microsoft Azure, and Google Cloud Platform (GCP). Since Snowflake has separate storage and compute layers, you only have to pay for the services you use, making it a cost-effective option in the long run.

Simplify Snowflake ETL using Hevo’s No-code Data Pipelines

A fully managed No-code Data Pipeline platform like Hevo Data helps you integrate data from 150+ Data Sources (including 60+ Free Data Sources) to a destination of your choice, such as Snowflake, in real-time in an effortless manner. Check out why Hevo is the Best:

  1. Schema Management: Hevo takes away the tedious task of schema management & automatically detects the schema of incoming data and maps it to the destination schema.
  2. Incremental Data Load: Hevo allows the transfer of data that has been modified in real-time. This ensures efficient utilization of bandwidth on both ends.
  3. Live Support: The Hevo team is available round the clock to extend exceptional customer support through chat, E-Mail, and support calls.
Get Started with Hevo for Free

Let’s see how to load JSON files in Snowflake using different methods.

Method 1: JSON Snowflake Data Loading Using SnowSQL (Recommended)

Prerequisites

  • After installation, configure SnowSQL with your Snowflake account credentials.
  • Download and install SnowSQL from the Snowflake website.

Step 1: Create a Target Table

  • Ensure that you have a target table in Snowflake that matches the structure of your JSON data. If the table doesn’t exist, you can create it using a SQL statement.
  • Example SQL to create a table:
CREATE TABLE your_table (
    id INTEGER,
    name STRING,
    data VARIANT
);

Step 2: Stage the JSON File:

  • Before loading data, you need to stage the JSON file in a Snowflake internal stage or an external stage (like Amazon S3, Azure Blob Storage, or Google Cloud Storage).
  • You can create a stage in Snowflake using:
CREATE STAGE my_stage;
  • Upload your JSON file to the stage. You can use SnowSQL for this:
!put file:///path/to/your/file.json @my_stage;

Step 3: Load JSON Data into Snowflake:

  • Use the COPY INTO command to load the JSON data from the stage into your target table. Snowflake’s VARIANT data type is ideal for loading semi-structured data like JSON.
  • Example of loading data:
COPY INTO your_table
FROM @my_stage/file.json
FILE_FORMAT = (TYPE = 'JSON');

Method 2: Copying Data into the Target Table Using S3 Bucket

This method will teach you how to parse JSON in Snowflake using an S3 Bucket. 

Prerequisites

  • You must have a Snowflake account configured with Amazon AWS, granting users the necessary privileges to create a data warehouse, database, and table.
  • You must have SnowSQL Client installed.

    Step 1: Copy Data to the Target Table

    Execute the COPY INTO <table> command to copy the data to the target table named RAW_SOURCE.

    COPY INTO raw_source
    FROM @my_stage/server/2.6/2016/07/15/15
    FILE_FORMAT = (TYPE = JSON);

      Step 2: Query the Copied Data

      You can query the data using a SELECT statement. 

      1. Retrieve device_type with quotes.

        You must use src: devic_type to specify a column and the name of the JSON element.

        SELECT src:device_type
        FROM raw_source;

        Result for the following query:

        +-----------------+
        | SRC:DEVICE_TYPE |
        |-----------------|
        | "server"        |
        +-----------------+

        2. Retrieve device_type value with quotes.

        Here, you must use a string data type to typecast the data.

        SELECT src:device_type::string AS device_type
        FROM raw_source;

        Result of the following query:

        +-------------+
        | DEVICE_TYPE |
        |-------------|
        | server      |
        +-------------+

        Step 3: Using Flatten Function

        The flatten function derives a lateral view for an object array or variant. 

          1. Flatten the array object into a Variant Column

          You can flatten event objects into the array and convert them into a separate row. The output consists of a VALUE column that stores the data for individual events.  

          • Query the data for each event.
          SELECT src:device_type::string,
              src:version::String,
              VALUE
          FROM
              raw_source,
          LATERAL FLATTEN( INPUT => SRC:events );
          • To insert the result of the above query, use CREATE TABLE.
          CREATE OR REPLACE TABLE flattened_source AS
            SELECT
              src:device_type::string AS device_type,
              src:version::string AS version,
              VALUE AS src
            FROM
              raw_source,
          LATERAL FLATTEN( INPUT => SRC:events );

            2. Flatten object keys into separate columns.

            You can create a table where each object key is stored in a separate column. You need to type the value of the data using (::); if you don’t, the column assumes it to be a variable type holding any value. 

                • Using CREATE TABLE to create a table named ‘events.’ 
                create or replace table events as
                select
                src:device_type::string as device_type
                , src:version::string as version
                , value:f::number as f
                , value:rv::variant as rv
                , value:t::number as t
                , value:v.ACHZ::number as achz
                , value:v.ACV::number as acv
                , value:v.DCA::number as dca
                , value:v.DCV::number as dcv
                , value:v.ENJR::number as enjr
                , value:v.ERRS::number as errs
                , value:v.MXEC::number as mxec
                , value:v.TMPI::number as tmpi
                , value:vd::number as vd
                , value:z::number as z
                from
                raw_source
                , lateral flatten ( input => SRC:events );

                  Step 4: Update Data into the RAW_SOURCE Table from S3 Bucket

                  • First, you must identify and add the key constraints to your table. Here, we assume devic_type and value:rv are the primary keys to DEVICTYPE and RV in the events table.
                  ALTER TABLE events ADD CONSTRAINT pk_DeviceType PRIMARY KEY (device_type, rv);
                  • Insert a JSON record into the RAW_SOURCE table.
                    select
                    PARSE_JSON ('{
                      "device_type": "cell_phone",
                      "events": [
                        {
                          "f": 79,
                          "rv": "786954.67,492.68,3577.48,40.11,343.00,345.8,0.22,8765.22",
                          "t": 5769784730576,
                          "v": {
                            "ACHZ": 75846,
                            "ACV": 098355,
                            "DCA": 789,
                            "DCV": 62287,
                            "ENJR": 2234,
                            "ERRS": 578,
                            "MXEC": 999,
                            "TMPI": 9
                          },
                          "vd": 54,
                          "z": 1437644222811
                        }
                      ],
                      "version": 3.2
                    }');
                  • Insert the record into the events table using primary keys from the RAW_SOURCE table. The select command allows you to query the following data.
                  insert into events
                  select
                        src:device_type::string
                      , src:version::string
                      , value:f::number
                      , value:rv::variant
                      , value:t::number
                      , value:v.ACHZ::number
                      , value:v.ACV::number
                      , value:v.DCA::number
                      , value:v.DCV::number
                      , value:v.ENJR::number
                      , value:v.ERRS::number
                      , value:v.MXEC::number
                      , value:v.TMPI::number
                      , value:vd::number
                      , value:z::number
                      from
                        raw_source
                      , lateral flatten( input => src:events )
                      where not exists
                      (select 'x'
                        from events
                        where events.device_type = src:device_type
                  and events.rv = value:rv);
                  • You must compare JSON keys in all corresponding columns in the RAW_SOURCE table with columns in the events table if no primary keys are present. For more information, refer to the following documentation on updated data.

                  Limitations of JSON Snowflake Data Replication Using S3 Bucket

                  • You have to manually update all the data from the S3 table to Snowflake data using large lines of code, which is time-consuming.
                  • After updating all the data in your Snowflake Database Table, you need to clean up all the data files you inserted at the internal stage. This can make the process long and tiresome.

                  Use Cases of JSON Snowflake Data Migration

                  • Snowflake supports semi-structured data formats such as JSON, making it easy to load and analyze data without complex transformations. 
                  • With Snowflake’s built-in query extensions and JSON functions, you can easily manipulate the JSON data to generate desired results.
                  • Snowflake has an auto-scaling feature that ensures your overall data performance remains consistent. This feature allows you to allocate additional resources for computing your data and helps you prioritize your work needs.

                  Learn more about How to Parse JSON in PostgreSQL

                  Conclusion

                  You can easily upload JSON data in Snowflake using efficient methods such as SnowSQL and the S3 bucket. The user prefers to use automated loading through SnowSQL, while the S3 bucket is less preferred but provides an easy way of uploading JSON Data manually. Both these methods use the robust architecture provided by Snowflake to allow semi-structured data. With these steps laid out, you can ingest JSON data in Snowflake with ease. This capability enhances the data analytics processes.

                  Although you can migrate your JSON data to Snowflake in many ways, if you don’t know how to write code for data transformation, you can turn it in to Hevo. With its cost-effective zero-code pipeline, Hevo quickly automates the data migration process from your source to the destination.

                  Want to take Hevo for a spin? Sign Up for a 14-day free trial and experience the feature-rich Hevo suite firsthand.

                  Share your experience of JSON Snowflake integration in the comments section below!

                  FAQs (Frequently Asked Questions) 

                  1. How can you convert your JSON data into a relational table? 

                  You can convert your JSON data into a relational table using Hevo’s no-code data pipeline, which helps you load large datasets directly into your destination database. Other methods include loading your data into the Snowflake database or using an insert statement.

                  2. How can you upload a JSON file into a Snowflake table?

                  You can upload your JSON file into a Snowflake table using the PUT command and SELECT statement. After that, you can use SQL queries to upload data to a table or directly load your JSON file in Snowflake using an S3 Bucket.

                  3. What data type should be used to store JSON data natively in Snowflake?

                  In Snowflake, the VARIANT data type is used to store JSON data natively. This data type allows for the storage of semi-structured data, including JSON, XML, and Avro, enabling users to work with diverse data formats without needing to define a fixed schema.

                  Saloni Agarwal
                  Technical Content Writer, Hevo Data

                  With a strong background in market research for data science and cybersecurity products, Saloni is an expert at crafting informative articles on key topics within the data science domain, such as data transformation, processes, and analysis. Saloni's passion for the field drives her to continually learn and stay abreast of emerging technologies and trends, ensuring her contributions are impactful. Her work aims to enrich the discourse in data science, providing valuable insights and fostering a deeper understanding of complex subjects.