Shopify to Redshift: 2 Easy Methods

on Tutorial • February 23rd, 2020 • Write for Hevo

Software as service offerings like Shopify has revolutionized the way businesses step up their sales channels. Shopify provides a complete set of tools to aid in setting up an e-commerce platform in a matter of a few clicks. Shopify comes bundles with all the configurations to support a variety of payment gateways and customizable online shop views. Bundles with this package are also the ability to run analysis and aggregation over the customer data collected through Shopify images.

Even with all these built-in Shopify capabilities, organizations sometimes need to import the data from Shopify to their data warehouse since that allows them to derive meaningful insights by combining the Shopify data with their organization data. Doing this also means they get to use the full power of a data warehouse rather than being limited to the built-in functionalities of the Shopify analytics. This post is about the methods in which data can be loaded from Shopify to Redshift, one of the most popular cloud-based data warehouse.

Table of Contents

Introduction to Shopify

shopify logo

Shopify is a powerful e-commerce platform designed to allow people or businesses to sell their offerings/products online. Shopify helps you set up an online store and also offers a point of sale (POS) to sell the products in person. Shopify provides you with payment gateways, customer engagement techniques, marketing, and even shipping facilities to help you get started.

Various product or services that you can sell on the Shopify:

  • Physical Products: Shopify allows you to perform the door-step delivery of the products you’ve manufactured that can be door-shipped to the customer. These include anything like printed mugs/t-shirt, jewellery, gifts, etc.
  • Digital Products: Digital Products can include e-books, audios, course material, etc.
  • Services and Consultation: If you’re providing services like Life consultation, home-cooked delicacies, event planning, or anything else, Shopify has got you covered.
  • Memberships: Various memberships such as gym memberships, yoga-classes membership, event Membership, etc. can be sold to the customers.
  • Experiences: Event-based experiences like adventurous sports and travel, mountain trekking, wine tasting, events, and hands-on workshops. You can use Shopify to sell tickets for these experiences as well.
  • Rentals: If you’re running rental services like apartment rentals, rental Taxis, or gadgets, you can use Shopify to create ads and engage with the customer.
  • Classes: Online studies, fitness classes can be advertised here.

Shopify allows you to analyze trends and customer interaction on their platform. However, for advanced analytics, you may need to store the data into some database or data warehouse to perform in-depth analytics and then move towards a visualization tool to create appealing reports that can demonstrate these trends and market positioning.

For further information on Shopify, you can check the official site here.

Introduction to Redshift

redshift logo

Redshift is a columnar data warehouse managed by Amazon Web Services (AWS). It is designed to run complex analytical problems in a cost-efficient manner. It can store petabyte-scale data and enable fast analysis. Redshift’s completely managed warehouse set up, combined with its powerful MPP (massively parallel processing) have made it one of the most famous cloud data warehouse options among modern businesses. You can read more about the features of Redshift here.

Segment to Redshift: Approaches to Move Data

This blog covers two methods for migrating data from Shopify to Redshift:

Method 1: Using Shopify APIs to connect Shopify to Redshift

Making use of Shopify APIs to connect with Redshift is one such way. Shopify provides multiple APIs such as billing, customer, inventory, etc. and can be accessed through its RESTful endpoints. This method makes use of custom code to connect with Shopify APIs and uses it to connect Shopify to Redshift.

Method 2: Using Hevo Data, a No-code Data Pipeline

A fully managed, No-code Data Pipeline platform like Hevo Data, helps you load data from Shopify (among 100+ Sources) to Redshift in real-time, in an effortless manner. Hevo with its minimal learning curve can be set up in a matter of minutes making the users ready to load data without compromising performance. Its strong integration with various sources such as databases, files, analytics engine, etc gives users the flexibility to bring in data of all different kinds in a way that’s as smooth as possible, without having to write a single line of code.

Get started with Hevo! Sign up for a 14-day free trial! 

Methods to connect Shopify to Redshift

There are multiple methods that can be used to connect Shopify to Redshift and load data easily:

Method 1: Using Shopify APIs to connect Shopify to Redshift

Since Redshift supports loading data to tables using CSV, the most straightforward way to accomplish this move is to use the CSV export feature of Shopify admin. But this is not always practical since this is a manual process and is not suitable for the kind of frequent sync that typical organizations need. We will focus on the basics of accomplishing this in a programmatic way which is much better suited for typical requirements.

Shopify provides a number of APIs to access the product, customer and sales data. For this exercise, we will use the Shopify private app feature. A private app is an app built to access only the data of a specific Shopify Store. To create a private app script, we first need to create a username and password in the Shopify Admin. Once you have generated the credentials, you can proceed to access the APIs. We will use the product API for reference in this post. 

Use the below snippet of code to retrieve the details of all the products in the specified Shopify store.

curl --user shopify_app_user:shopify_app_password GET /admin/api/2019-10/products.json?limit=100

The important parameter here is the limit parameter. This field is there because the API is paginated and it defaults to 50 results in case the limit parameter is not provided. The maximum pagination limit is 250 results per second.

To access the full data developers need to buffer the id of the last item in the previous request and use that to form the next curl request. The next curl request would look like as below.

curl --user shopify_app_user:shopify_app_password GET /admin/api/2019-10/products.json? limit=100&since_id=632910392 -o products.json

You will need a loop to execute this.From the above steps, you will have a set of JSON files that should be imported to Redshift to complete our objective. Fortunately, Redshift provides a COPY command which works well with JSON data. Let’s create a Redshift table before we export the data.

create table products( product_id varchar(25) NOT NULL, type varchar(25) NOT NULL, vendor varchar(25) NOT NULL, handle varchar(25) NOT NULL, published_scope varchar(25) NOT NULL )

Once the table is created, we can use the COPY command to load the data. Before copying ensure that the JSON files are loaded into an S3 bucket since we will be using S3 as the source for COPY command. Assuming data is already in S3, let’s proceed to the actual COPY command. The challenge here is that the Shopify API result JSON is a very complex nested JSON that has a large number of details. To map the appropriate keys to redshift values, we will need a json_path file that Redshift uses to map fields in JSON to the Redshift table. The command will look as below.

copy products           from ‘s3://products_bucket/products.json’            iam_role ‘arn:aws:iam:0123456789012:role/MyRedshiftRole'           json ‘s3://products_bucket/products_json_path.json’ The json_path file for the above command will be as below. {     "jsonpaths": [         "$['id']",         "$['product_type']",         "$[‘vendor’]",         "$[‘handle’]",         "$[‘published_scope’]"     ] }

This is how you can connect Shopify to Redshift. Please note that this was a simple example and oversimplifies many of the actual pitfalls in the COPY process from Shopify to Redshift.

Limitations of migrating data using ShopifyAPIs

  • The developer needs to implement a logic to accommodate the pagination that is part of the API results. 
  • Shopify APIs are rate limited. The requests are throttled based on a leaky bucket algorithm with a bucket size of 40 and 2 requests per second leak in case of admin APIs. So your custom script will need a logic to handle this limit in case your data volume is high.
  • In case you need to clean, transform, filter data before loading to the warehouse, you will need to build additional code to achieve this.
  • The above approach works for a one-off load but if frequent sync which also handles duplicates is needed, additional logic needs to be developed using a Redshift staging table.
  • In case you want to copy details that are inside the nested JSON structure or arrays in Shopify format, the json_path file development will take some development time.

Method 2: Using Hevo Data, a No-code Data Pipeline

Using hevo to connect Shopify to Redshift

Hevo Data, a No-code Data Pipeline can help you move data from Shopify (among 100+ sources) swiftly to Redshift. Hevo is fully-managed and completely automates the process of not only loading data from your desired source but also enriching the data and transforming it into an analysis-ready form without having to write a single line of code. Its fault-tolerant architecture ensures that the data is handled in a secure, consistent manner with zero data loss.

Steps to use Hevo Data:

Hevo Data focuses on two simple steps to get you started:

Configure Source: Connect Hevo Data with Shopify by simply providing the API key and Pipeline name.

Setting up Shopify

Integrate Data: Load data from Shopify to Redshift by simply providing your Redshift database credentials. Enter a name for your database, the host and port number for your Redshift database and connect in a matter of minutes.

Configuring Redshift to connect with Shopify.

Advantages of using Hevo Data Platform:

  • Real-Time Data Export: Hevo with its strong integration with 100+ sources, allows you to transfer data quickly & efficiently. This ensures efficient utilization of bandwidth on both ends.
  • Live Support: The Hevo team is available round the clock to extend exceptional support to its customers through chat, email, and support calls.
  • Schema Management: Hevo takes away the tedious task of schema management & automatically detects schema of incoming data and maps it to the destination schema.
  • Minimal Learning: Hevo with its simple and interactive UI, is extremely simple for new customers to work on and perform operations.
  • Secure: Hevo has a fault-tolerant architecture that ensures that the data is handled in a secure, consistent manner with zero data loss.
  • Live Monitoring: Hevo allows you to monitor the data flow so you can check where your data is at a particular point in time.

Want to try Hevo? Get started by signing up for a 14-day free trial and experience the feature-rich Hevo suite first hand. Have a look at our unbeatable pricing, that will help you choose the right plan for you.

What are your thoughts on moving data from Shopify to Redshift? Let us know in the comments.

No-code Data Pipeline for Redshift