Magento to Redshift: 2 Easy Methods

• May 29th, 2020

Magento to Redshift - Blog Cover Image

Do you have a professional eCommerce site that has been developed using Magento? That means you have also got a lot of transactional data, inventory data, payment details, customer data, etc. available for analysis. If you want to analyze and get deep and meaningful insights from your Magento data, transferring them to a data warehouse like Redshift would be a wise choice. In this post, you will learn about the different methods of moving data from Magento to Redshift, which will eventually help you choose the method that suits your business needs.

Table of Contents

Introduction to Magento and Redshift

Magento Logo
Image Source

Magento is an open-source e-commerce platform used by more than 100,000 online stores. Built on the Zend framework, the source code for Magento is available under OSL (Open Software License) v3.0. 

For further information on Magneto, check out the official website here.

Magento to Redshift-Amazon Redshift logo
Image Source

Redshift, a petabyte-scale cloud data warehouse service, was developed by Amazon and is a part of their AWS (Amazon Web Services) platform. Built to handle large-scale datasets, Redshift is an SQL-like relational database system that facilitates data analysis and features some of the industry’s best performance with respect to executing complex queries. With up to petabyte scalability and the ability to integrate with third-party visualization tools, Redshift is a powerful tool for generating business intelligence insights.

Why Move Data from Magento to Redshift?

Data migration from Magento to Redshift lets you:

  • Have a single source of truth by moving data from Magento to Redshift.
  • Benefit from Data Consolidation by moving data from Magento to Redshift.
  • Establish a new Data Warehouse.
  • Deploy other systems to work with Redshift
  • Execute complex queries that allow for deeper insight into your data, enhancing your organization’s advantage through better business intelligence.

Methods to Move Data from Magneto to Redshift

Method 1: Moving Data from Magneto to Redshift using Custom Scripts and APIs

You can choose this approach if you have access to or can hire engineering resources with the expertise to work with APIs and Redshift data warehouse. With the Magento API, you can develop scripts that automatically retrieve your Magento records, allowing you to load them from Magento to Redshift. These scripts may be run as a cron job or application in order to automate the retrieval of said records from Magento to Redshift.

Method 2: Moving Data from Magneto to Redshift using Hevo

Hevo can help you export data from Magento to Redshift in real-time free of cost, without having to write a single line of code. With the help of Hevo’s pre-built integrations with 100+ sources(including 40+ free sources) such as Magneto, you can set up the whole data pipeline without any technical intervention and load data from Magento to Redshift or a destination of your choice with ease.

Sign up here for a 14-day Free Trial!

Methods to Move Data from Magneto to Redshift

Method 1: Moving Data from Magneto to Redshift using Custom Scripts and APIs

Loading data manually from Magento to Redshift broadly involves the following steps:

Prerequisites

  1. Working knowledge of REST APIs.
  2. Required permissions to create and access Amazon S3. Learn to set up an Amazon S3 account here.
  3. A successfully set up Amazon Redshift Data Warehouse.

Step 1: Extracting Data from Magento using the Magento API

The first step in transferring data from Magneto to Redshift is to work with Magneto’s API. Magento’s Web API operates under both REST (REpresentational State Transfer) and SOAP (Simple Object Access Protocol) frameworks. We’ll be looking at the REST implementation in this article to move data from Magento to Redshift.

Magento’s API endpoints and the data they return may be tested using tools like Postman or CURL. Once you’ve become familiar with the endpoints and the JSON format that the data is returned in you can start to develop scripts to make requests of the API endpoints and automatically save the JSON objects to a target storage location.

A Note on Authorization:

The Magento API can use either third-party OAuth authentication, mobile token authorization, or user login credentials. When building your script, if accessing resources that require permission levels higher than  ‘anonymous’, you must ensure that you’ve set up your authorization and that it is applied in the script. For more information on authorization see Authentication in the Magento DevDocs.

Use the GET command to pull necessary data from a Magento endpoint. For this article, we’ll use calls to the /V1/orders endpoint as an example.

GET <host>/rest/<store_code>/V1/orders/

If necessary, an API request may specify only top-level objects. This call will return the requested objects and their respective fields. Eg:

GET <host>/rest/<store_code>/V1/orders/2?fields=billing_address,
customer_firstname,customer_lastname

The above call will return (as an example) the following JSON object:

{
"customer_firstname": "Veronica"
"customer_lastname": "Costello"
"billing_address": {
  "address_type": "billing"
  "city": "Calder"
  "country_id": "US"
  "customer_address_id": 1
  "email": "roni_cost@example.com"
  "entity_id": 4
  "firstname": "Veronica"
  "lastname": "Costello"
  "parent_id": 2
  "postcode": "49628-7978"
  "region": "Michigan"
  "region_code": "MI"
  "region_id": 33
  "street": "6146 Honey Bluff Parkway"
  "telephone": "(555) 229-3326"
  }
}

Once this data has been extracted which is to be moved from Magento to Redshift you can move onto the next step.

Step 2: Transforming the Data to Suit Redshift Schema

The JSON format is incompatible with Redshift, as Redshift is an RDMS (Relational Database Management System) that operates like an SQL database. Therefore, the JSON objects will have to be converted to a form that works with SQL databases. For the sake of simplicity, we would recommend converting it to a CSV (Comma Separated Value) file. This may be achieved by applying third-party tools or by integrating conversion logic into your script. 

Once your data has been converted to a CSV file you may commence the final step.

Step 3: Loading the Transformed Data into Redshift

Loading data into Redshift is a two-step process.:

  1. Loading the data into an Amazon S3 bucket
  2. Copying the data into Redshift

Step 4: Loading the data into an Amazon S3 bucket

The steps involved in loading the data into an Amazon S3 bucket are:

  1. Create a bucket in S3
    • Sign in to the Amazon Console
    • Open the console
    • Click Create Bucket
    • Enter a name for the bucket
    • Select your region
    • Click Create
  2. Create a folder in the bucket
    • Click the new bucket
    • Select Actions > Create Folder
    • Name the new folder data. Note that Amazon charges for data stored in buckets. For pricing info, see here.
  3. Upload the files to your new folder
    • Click the new folder
    • Click Upload
    • Click Add Files
    • Select all relevant CSV files
    • Click Start Upload

Step 5: Copying the data into Redshift

Apply the COPY command as shown below. This will transfer the data file from S3 to Redshift.

COPY table_name FROM 's3://<your-bucket-name>/load/file_name.csv' credentials 'aws_access_key_id=<Your-Access-Key-ID>;aws_secret_access_key=<Your-Secret-Access-Key>' CSV;

Potential Issues with Manual Scripts to Move Data from Magneto to Redshift

The issues for Magneto to Redshift data transfer are:

  • Coding scripts will require personnel that is comfortable with:
    • Coding at least one scripting language
    • Using APIs
    • Automating applications or setting up cron jobs
  • Assuming that the script has been successfully written and executed, it is now up to the programmer to maintain the script/application. Magento may modify its API over its lifetime, perhaps even retiring certain endpoints.
  • The programmer needs to constantly observe the performance of his script and stay up to date with the API to ensure its stability and continued service.
  • In case there is a need to write data transformations (clean, enrich, modify data) before or after loading data from Magento to Redshift, engineers would need to build additional lines of code here. This adds more workload on the engineering front.

This, obviously, is quite costly and not a very efficient use of resources to move data from Magento to Redshift.

Method 2: Moving Data from Magneto to Redshift using Hevo

Using Hevo to move data from Magneto to Redshift
Image Source: Self

Hevo is fully managed and completely automates the process of not only loading data from your desired source but also enriching the data and transforming it into an analysis-ready form without having to write a single line of code. Its fault-tolerant architecture ensures that the data is handled in a secure, consistent manner with zero data loss.

It provides a consistent & reliable solution to manage data in real-time and always have analysis-ready data in your desired destination. It allows you to focus on key business needs and perform insightful analysis using various BI tools such as Power BI, Tableau, etc. 

Get Started with Hevo for free

Check Out What Makes Hevo Amazing:

  • Secure: Hevo has a fault-tolerant architecture that ensures that the data is handled in a secure, consistent manner with zero data loss.
  • Schema Management: Hevo takes away the tedious task of schema management & automatically detects schema of incoming data and maps it to the destination schema.
  • Minimal Learning: Hevo, with its simple and interactive UI, is extremely simple for new customers to work on and perform operations.
  • Hevo Is Built To Scale: As the number of sources and the volume of your data grows, Hevo scales horizontally, handling millions of records per minute with very little latency.
  • Incremental Data Load: Hevo allows the transfer of data that has been modified in real-time. This ensures efficient utilization of bandwidth on both ends.
  • Live Support: The Hevo team is available round the clock to extend exceptional support to its customers through chat, email, and support calls.
  • Live Monitoring: Hevo allows you to monitor the data flow and check where your data is at a particular point in time.

Yes! It is that simple. Now, sit back and watch your data move from Magento to Redshift.

Conclusion

In this blog, you have learned how to connect the Magento to Redshift manually. You also came across the various limitations of connecting Magento to Redshift manually. So, if you are looking for a fully-automated data pipeline, then try Hevo.

Visit our Website to Explore Hevo

Hevo is a No-code Data Pipeline and has awesome 100+ pre-built integrations that you can choose from. Hevo can help you integrate your data from numerous sources(including 40+ free data sources) and load them into a destination to analyze real-time data with a BI tool such as Tableau. It will make your life easier and data migration hassle-free. It is user-friendly, reliable, and secure. Check out the pricing details here and watch this short overview video to get a sense of how Hevo works:

Want to take Hevo for a spin? Sign Up for a 14-day free trial and experience the feature-rich Hevo suite first hand.

Share your understanding of Magento to Redshift in the comments section below!

No-code Data Pipeline for Magento