Asana to Redshift – Easy Steps to Load Data

on Data Integration • January 31st, 2020 • Write for Hevo

Asana is a work management platform available as a cloud service that helps users to organize, track and manage their tasks. Asana helps to plan and structure work in a way that suits organizations. All activities involved in a typical organization, right from project planning to task assignment, risk forecasting, assessing roadblocks and changing plans can be handled on the same platform. All this comes at a very flexible pricing plan based on the number of users per month. There is also a free version available for teams up to a size of 15. Almost 70000 organizations worldwide use asana for managing their work. Since the platform is closely coupled with the day to day activities of the organizations, it is only natural that the organizations will want to have the data imported into their on-premise databases for analysis and building insights. In this blog post, we will learn to load data from Asana to Redshift Data Warehouse – one of the most widely used completely managed data warehouse services.

Methods for Asana to Redshift replication

Broadly, there are two approaches to move data from Asana to Redshift:

Method 1: Build a Custom Code

You will invest engineering bandwidth to hand-code scripts to get data from Asana’s API to S3 and then to Redshift. Additionally, you will also need to monitor and maintain this set up on an ongoing basis so that there is a consistent flow of data.

Method 2: Buy a Ready-to-Use Solution

Bringing data from Asana works pretty much out of the box while using a solution like Hevo Data Integration Platform (14 Days Free Trial). With minimal tech involvement, the data can be reliably streamed from Asana to Redshift in near real-time.

This post explores the first method in detail. Towards the end, the blog discusses the drawbacks of this approach and highlights simpler alternatives to achieve the same objective.

Building Custom Code to Move data from Asana to Redshift:

Understanding Asana APIs

Asana provides a comprehensive set of APIs to build applications using their platform. Not only does it allow to access data, but it also has APIs to insert, update and delete data related to any item in the platform. It is important to understand the object hierarchy followed by Asana before going into detail on how to access the APIs.

Asana Object Hierarchy

Objects in Asana is organized as the below basic units.

  1. Tasks – Tasks represent the most basic unit of action in Asana.
  2. Projects – Tasks are organized into projects. A project represents a collection of tasks that can be viewed as a board, list or timeline.
  3. Portfolio – A portfolio is a collection of projects. 
  4. Sections – Tasks can also be grouped as sections. Sections usually represent something lower in the hierarchy than projects.
  5. Subtasks – Tasks can be represented as a collection of subtasks. Subtasks are similar to the task, except that they have a parent task.

Users in Asana are organized as workspaces, organizations, and teams. Workspaces are the highest-level units. Organizations are special workspaces that represent an actual company. Teams are a group of users who collaborate on projects.

Asana API access

Asana allows API access through two mechanisms

  1. OAuth – This method requires an application to be registered in the Asana admin panel and a user approval to allow data access through his account. This is meant to be used while implementing applications using the Asana platform.
  2. Personal Access Token – A personal access token can be created from the control panel and used to successfully execute API calls using an authorization header key. This is meant to be used while implementing simple scripts. We will be using this method to access Asana data.

Asana API Rate Limits

Asana API enforces rate limits to maintain the stability of their systems. Free users can make up to 150 requests per minute and premium users can make up to 1500 requests per minute. There is also a limit to the concurrent number of requests. Users can make up to 50 read requests and 15 write requests concurrently. 

There is also a limit based on the cost of a request. Some of the API requests may be costly at the back end since Asana will have to traverse a large nested graph to provide the output.  Asana does not explicitly mention the exact cost quota but emphasizes that if you make too many costly requests, you could get an error as a response. 

Understanding Amazon Redshift

Redshift is a completely managed database offered as a cloud service by Amazon. It offers a flexible pricing plan with the users only having to pay for the resources they use. A detailed article on Redshift’s pricing plan can be found here. AWS takes care of all the activities related to maintaining a highly reliable and stable data warehouse. The customers can thus focus on their business logic without worrying about the complexities of managing a large infrastructure. 

Redshift is designed to run complex queries over large amounts of data and provide quick results. It accomplishes this through the use of massively parallel processing architecture. Redshift works based on a cluster of nodes. One of the nodes is designated as leader node and others are known as compute nodes. Leader node handles client communication, query optimization and task assignment. A comprehensive write up on Redshift’s architecture can be found here

Redshift can be scaled seamlessly by adding more nodes or upgrading existing nodes. Redshift can handle up to a PB of data. Redshift’s concurrency scaling feature can automatically scale the cluster up and down during high load times while staying within the customer’s budget constraints. Users can enjoy a free hour of concurrency scaling for every 24 hours of a Redshift cluster staying operational.

Moving data from Asana to Redshift 

Our objective here is to import a list of projects from Asana to Redshift. We will be using the project API in Asana to accomplish this. To access the API, we will first need a personal access token. Let us start by learning how to generate the personal access token.

  • Go to the Developer App management page in Asana and click on My Profile settings. Go to Apps and then to Manage Developer Apps. Click on Create New Personal Access Token. Add a description and click on Create. Copy the token that is displayed. Please note that this token will only be shown once and if you do not copy it, you will need to create another one. 

With the Personal Access Token, access the API as follows.

curl -X GET   -H 'Accept: application/json'  -H 'Authorization: Bearer {access-token}' 

Where access-token is the personal access token you copied in the previous step

  • The response will be a JSON in the following format.
{   "data": [     {       "gid": "12345",       "resource_type": "project",       "name": "Stuff to buy",       "created_at": "2012-02-22T02:06:58.147Z",       "archived": false,       "color": "light-green",       … }, {....}

The response will contain a key named data. The value for the ‘data’. Key will be a list of project details formatted as a nested JSON. Save the file as projects.json

  • Use the command-line JSON processor utility jq to convert the JSON to CSV for loading to MySQL. For simplicity, we will only convert the name, created_at and due_date attributes of project details.
jq -r '.data[] | [.name, .start_on, .due_on] | @csv' projects.json > projects.csv

Please note that name, start_on, and due_on are fields contained in the response JSON. If you need different fields, you will modify the above code as per your requirement.

  • Copy the CSV file to an AWS S3 location.
aws s3 cp projects.csv  s3://my_bucket/projects/
  • Loading data to Redshift -Login to AWS management console and type the following command in the Query Editor in Redshift console and execute.
copy target_table_name from ‘s3://my_bucket/projects/ credentials access_key_id <access_key_id> secret_access_key <secret_access_key>

Where access_key_id and secret_access_key represent the IAM credentials.

That concludes the effort. However, we still have only imported one table into our Redshift. For that table, we have only ingested three columns. Asana has a large number of objects in its database with each object having numerous columns. To accommodate all these in our current approach, we will need to implement a complex script using a programming language. 

Asana to Redshift – Drawbacks of Custom Code Approach

  1. Asana provides some of the critical information related to objects like authors, workspaces, etc are available inside nested JSON structures in the original JSON. These can only be extracted by implementing a lot of custom logic.
  2. In most cases, the import function will need to be executed periodically to maintain a recent copy. Such jobs will need mechanisms to handle duplicates and scheduled operations. 
  3. The above method uses a complete overwrite of the Redshift table. This may not be always practical. In case incremental load is required, Redshift INSERT INTO command will be required. But this command does not manage duplicates on its own. So a temporary table and related logic to handle duplicates will be required. 
  4. The current approach does not have any mechanism to handle the rate limits by Asana. 
  5. Any further improvement to the current approach will need the developers to have a lot of domain knowledge in Asana and their object hierarchy structure.

A wise choice would be using a simple ETL tool like Hevo and do not worry about all these complexities in implementing a custom ETL from Asana to Redshift

Hevo – Easy solution to move data from Asana to Redshift

The best way to avoid the above limitations is to use a Data Integration Platform like Hevo that works out of the box. Hevo can help you load data from Asana to Redshift in two simple steps:

  1. Authenticate and Connect Asana Source
  2. Configure the Redshift Database where you want to load data

Hevo’s fault-tolerant, dependable Data Integration Platform will ensure that your data is securely moved from Asana to Redshift in real-time.

The Hevo Advantage: 

  1. Code-free Data Loading: You need not invest any engineering resources to write and maintain any ETL scripts or cron jobs
  2. Set up in Minutes: Since Hevo is fully managed, the setup time is minimal. Data is moved from Asana to Redshift in minutes.
  3. 100% Data Accuracy: Hevo reliably delivers your data in from Asana to Redshift. Hevo is built on an AI-powered, fault-tolerant architecture that ensures that you always have accurate and current data available for analysis.
  4. Automatic Schema Handling: Hevo does automatic schema detection, evolution and mapping. The platform will detect any change in incoming Asana schema and make necessary changes in Redshift, without any human intervention.
  5. Detailed Activity Logs and Monitoring: Hevo provided a detailed activity log for all the data that is moving through the pipeline. This gives you a complete view of everything from user activities, data transfer failures and so on. 
  6. 100s of additional data sources: In addition to Asana, Hevo can load data from 100s of additional data sources, ensuring that you easily scale as your business requirements grow.

To experience a seamless data load from Asana to Redshift, sign up for a 14-day free trial with Hevo.

No-code Data Pipeline for Redshift