MongoDB to PostgreSQL: Steps to Load Data Fast

on Tutorial • February 22nd, 2020 • Write for Hevo

MongoDB is a NoSQL database that provides a document-oriented storage mechanism. It is the preferred choice for most use cases involving structured and semi-structured data. MongoDB’s comprehensive querying layer combined with the ability to dynamically add keys makes it a suitable choice for requirements that keep evolving. In most cases, databases like MongoDB provide a great level of flexibility during the initial stages of product development.

In some cases, as time passes, the use cases mature leading to table structures getting frozen and the flexibility of a NoSQL database starts losing importance. At this stage, most companies will try to move data from the NoSQL database to a conventional relational database which provides even better-querying abilities. In this post, we will cover the methods to move data from MongoDB to PostgreSQL, one of the most popular SQL databases. 

In this article, you will learn the following:

Understanding MongoDB and PostgreSQL

As mentioned above, MongoDB’s uniqueness comes from the fact that it does not force users to follow a concrete structure for the tables. MongoDB can also be operated as a completely managed service using MongoDB Atlas. It is a true cloud agnostic database that can be deployed in any of the public cloud providers like AWS, GCP, etc. It can work on top of a cluster-based master-slave architecture providing great horizontal scaling capability. 

MongoDB uses a base object called document to store a set of keys and values. A logical group of such documents forms a collection. A group of such collections is called a database. If there is a need for processing or an aggregating a large amount of data, MongoDB’s native map-reduce programming support can be used. MongoDB also has provisions for indexes over collections.

PostgreSQL is very popular for use cases involving traditional strict table structure requirements because of its reliability and the feature-rich querying layer. The fact that it has been in development for over 30 years and most bigs have been ironed out provides great confidence in using it in the enterprise context. Postgres works based on a single master model and hence is not great at scaling seamlessly.

Two approaches to move data from MongoDB to PostgreSQL

There are many ways to migrate data from MongoDB to PostgreSQL. This blog will highlight the following methods.

Method 1: Writing custom code to move data from MongoDB to PostgreSQL

This would need you to deploy engineering resources to extract data from MongoDB, convert the JSON output to CSV and load data to PostgreSQL.

Method 2: Implementing a Fully-Managed No-code Data Pipeline, Hevo Data
Hevo provides an easy alternative to loading data from MongoDB to PostgreSQL in real-time without having to write any code. Hevo handles all the MongoDB data migration hassles for you by automatically managing the schema, mapping it to the relevant PostgreSQL tables, and reliably loading data.

This post talks about the first method in depth. Additionally, the blog also talks about the limitations of MongoDB ETL. Towards the end, the blog discusses a simpler alternative to move data from MongoDB to PostgreSQL.

Steps to Move Data from MongoDB to PostgreSQL

Copying data from MongoDB to PostgreSQL is usually accomplished using the mongoexport command and COPY command in PostgreSQL. Here are the broad steps:

  1. Extract data from MongoDB using mongoexport command
  2. Create a table in PostgreSQL to add the incoming data
  3. Load the exported CSV from MongoDB to PostgreSQL

Let us try to understand this process with the help of an example. 

Step 1: Extract data from MongoDB using mongoexport command

We will first use the mongoexport command to create a CSV file from an already existing Mongo collection.

mongoexport --host localhost --db productdb --collection products --type=csv 
--out products.csv --fields name,category,color, brand,description

Here the database name is productdb, the collection name is products and products.csv is the output file we expect. The last attribute fields will contain the names of the keys that will be exported to CSV. This is very important because MongoDB does not maintain a hard structure of keys and it is very much possible that not all keys are present in all the documents. The developer will have to ensure the keys that need to be present in CSV is specified. In case there is no such key present in a document, mongoexport will not throw any error. It will just silently populate that column as a blank value. If not careful enough, this can lead to unexpected results.

Step 2: Create a product table in PostgreSQL to add the incoming data

CREATE TABLE products (
id SERIAL PRIMARY KEY,
name VARCHAR NOT NULL,
category VARCHAR NOT NULL,
color VARCHAR  NOT NULL,
brand VARCHAR  NOT NULL,
description VARCHAR NOT NULL
)

Step 3: Load the exported CSV to PostgreSQL

COPY products(name,category,color, brand,description)
FROM 'C:tmppersons.csv' DELIMITER ',' CSV HEADER;

And that completes the process. Even though these steps may seem simple, it is to be noted that this is an oversimplified version of the real data migration problem. MongoDB and PostgreSQL are very different databases and there are a number of factors that can lead to unforeseen errors in this migration. Let’s look at some of the cases which can get us into trouble while using the above approach.

A Simpler Alternative to Sync Data from MongoDB to PostgreSQL

A way to mitigate all these complexities and have this executed through a set of simple clicks will be using a No-code Data Pipeline like Hevo. Hevo can abstract away all the painful details and let the developers focus on their core problems by executing such migrations smoothly.

You can sign up for a 14-day free trial to experience the simplicity yourself.

Hevo can migrate data from MongoDB to PostgreSQL in realtime in 3 simple steps:

  1. Authenticate and connect to your MongoDB database.
  2. Select the replication mode: (a) Full Dump (b) Incremental load using OpLog
  3. Configure the PostgreSQL destination and start moving data

MongoDB to PostgreSQL – Limitations of Writing Custom Code

  1. In typical use cases, there will be a large number of keys in a document and thereby in a MongoDB collection. Since there is nothing that restricts developers from adding more keys on the fly, most MongoDB collections will have hundreds of keys. Finalizing and specifying the keys to export in itself will be a big challenge.  
  2. Since an intermediate file is involved and that file contains the whole of a MongoDB collection, this approach requires the developer to have access to an instance with enough storage space to temporarily store the file.
  3. Normally, not all keys in the MongoDB collection will be present in all the documents in that collection. This method does not explicitly do anything to handle the case of non-existent keys. This will have to be specifically handled in usual scenarios.
  4. Many a time, the copying data is not a one time process, but a periodic process which keeps syncing the two databases. In that case, this will need to be modified using custom logic to accommodate the scheduling and handling of duplicate data.
  5. For large MongoDB collections that serve a production application, there is a risk of this process affecting the response time of the incoming traffic. In such cases, it is imperative that the developer implements some throttling logic.
  6. Another source of trouble could be the presence of arrays as field values in your MongoDB collection. A simple CSV export could lead to unexpected results here.

The Hevo Advantage

  1. Simplicity: ETL for a NoSQL database like MongoDB can get tricky. Hevo abstracts the complexity of making data migration a cakewalk for you. 
  2. Auto-schema Management: Hevo automatically detects the schema of the incoming object and flattens the JSON-like structure into a relational format. This takes away the tedious task of schema management from you. 
  3. Ability to Transform Data before and after loading to PostgreSQL: Whether you want to transform currencies, standardize time zones or simply aggregate and join MongoDB data for faster queries. You can achieve all this on Hevo.
  4. Ability to Bring Data from Other Data Sources: In addition to MongoDB, Hevo can help you bring data from 100’s of different data sources (Databases, Sales and Marketing Applications, Streaming Services, and more) making it an ideal partner for your business growing data needs. This will allow you to scale your infrastructure at will.
  5. Live Monitoring and Tracking: Hevo provides granular tracking and monitoring of data flow so that at any point you know exactly where your data is. 
  6. 24*7 Support: Hevo team provides exceptional support over chat, email and support call to ensure that your data projects run smoothly.

Sign up for a 14-day free trial to see how Hevo can add value. 

No-code Data Pipeline for your Data Warehouse