MongoDB is a leading NoSQL database and provides a wide range of functionality along with versatility for allowing various data types and structures for storage. PostgreSQL is also a robust platform for RDBMS which is traditional.

This is good for storage but restricts the ability to gain information from data. Moving data from PostgreSQL to MongoDB helps resolve this issue to some extent.

This article provides a step-by-step guide to transferring data from PostgreSQL to MongoDB.

Steps to Connect PostgreSQL to MongoDB

Transferring data from PostgreSQL to MongoDB is a simple process. The complexity depends on the structure of the document and the data present in the PostgreSQL database and the structure required in the MongoDB database.

The process to move data is a two-step one where the data from PostgreSQL and then import the data into MongoDB using the mongoimport command.

Two methods to move data from PostgreSQL to MongoDB include:

Method 1: Using TSV to transfer data from PostgreSQL to MongoDB

The TSV layout is a fast and clean option for PostgreSQL to MongoDB. However, it best works if the unique PostgreSQL schema is surprisingly easy and also you don’t want to embed files in different files with a one-to-many relationship for moving data from PostgreSQL to MongoDB.

For instance, in case you want to create a group of files that constitute clients and plan on embedding every customer’s order in those files, TSV won’t paintings as it doesnt aid hierarchical information structures. Each row inside the TSV turns into a document. You can nevertheless map values in every row to fields deeper for your files; you simply can`t embed files.

You could create an address field and create nested state and city fields as in the example below. However, you couldn’t store multiple address entities

Consider the PostgreSQL created table “users.”

postgres=# select * from users;
 userid | first_name | last_name | logins | upgraded | city | state 
--------+------------+-----------+--------+----------+------+-------
      1 | Bob        | Smith     |     10 | t        | NYC  | NY
COPY (SELECT
        userid AS "userId.int32()",
        logins AS "loginCount.int32()",
      first_name AS "firstName.auto()",
      last_name AS "lastName.auto()",
        CASE WHEN upgraded=TRUE THEN 'TRUE' ELSE 'FALSE' END AS "upgradedAccount.boolean()",
        city AS "address.city.auto()",
        state AS "address.state.auto()"
    FROM
        users
) TO '/tmp/users.tsv' WITH (FORMAT CSV, HEADER TRUE, DELIMITER E't');

As seen in the example for PostgreSQL to MongoDB, we are trying to rename each column that needs to be exported using the AS command. The alias that is created is of the type mongoFieldName.type(). for instance userId.int32() is a alias. When we use the mongoimport for executing the import of PostgreSQL to MongoDB, the command will parse the header and create fields based on types. In the general use case, we use the auto() command of mongoimport to self-determine the type based on context.

For the column that is UPGRADED, which is of type BOOLEAN, the PostgreSQL returns t for true and f for false, and the default value won’t’ be recognized, hence the case statement issued to determine the values based on condition.

Limited data nesting can also be done using the TSV migration process in PostgreSQL to MongoDB. The dataset’s city and state fields can be considered an example for moving data from PostgreSQL to MongoDB.

The end line of the query is the one that exports as CSV using a tab delimiter. this converts it into TSV format. This command is used to export files into MongoDB:

mongoimport --uri mongodb+srv://<mongodb_user>:<mongodb_password>@<altas-cluster>.mongodb.net/<DATABASE> 
 --collection users --type tsv --file users.tsv --headerline --columnsHaveTypes

This will result in the following document in the “users” collection after PostgreSQL to MongoDB:

{
    userId: 1,
    first_name: "Bob",
    last_name: "Smith",
      LoginCount : 10,
      upgraded : true,
    "Address" : {
            "city" : "NYC",
            "state" : "NY" }
}

Method 2: Using JSON to transfer data from PostgreSQL to MongoDB

This method for moving data from PostgreSQL to MongoDB is effective when the database schema is complex and in the end, you require nested arrays of records documents using JSON for data migration.

To utilize the JSON to generate the results of the PostgreSQL query, you require three functions:

  • row_to_json: It returns the column name as keys and the value as rows and then row as a JSON object.
  • array_to_json: returns an array of data as a JSON array.
  • array_agg: Accepts a set of values and returns an array where each value in the set becomes an element in the array

Let’s look at an orders table which in our relational schema keeps a record for every product ordered by the user:

 id | userid | product | quantity | price 
----+--------+---------+----------+-------
  1 |      1 | shoes   |        4 | 50.75
  2 |      1 | razer   |       20 |  1.75

Here is an example of PostgreSQL to MongoDB query using all three functions:

COPY (SELECT row_to_json(results)
FROM (
  SELECT userid,first_name, last_name,
    (
      SELECT array_to_json(array_agg(o))
      FROM (
        SELECT id, product, quantity, price
        FROM orders
        WHERE products.userid = users.userid
      ) o
    ) AS orders
  FROM users
) results) TO '/tmp/orders.json' WITH (FORMAT text, HEADER FALSE);

The query above will create a file orders.json with JSON documents for each user from the user’s table:

{
    id: 1,
    first_name: "Bob",
    last_name: "Smith",
    "orders" : [
        {
            "id" : 1,
            "product" : "shoes",
            "quantity" : 4,
            "price" : 50.75
        },
        {
            "id" : 2,
            "product" : "razer",
            "quantity" : 20,
            "price" : 1.75
        }
    ]
}

Once you have written the query and saved it, you can use mongoimport to import the file:

mongoimport --uri mongodb+srv://<mongodb_user>:<mongodb_password>@<atlas-cluster>.mognodb.net/<DATABASE> 
--collection orders --jsonArray orders.json

Methods for Postgres to MongoDB migration: TSV Vs JSON

MethodData with simple schemaData with complex schemaExport ManuallyImport to Mongo ManuallyFully AutomatedPulls data AutomaticallyFree to useNeed CLI login for every use
TSVYes NoYes Yes NoNoYesYes 
JSONYes YesYesYes NoNoYesYes

To migrate from Postgres to MongoDB, you might want to think about the following things:

  • Does your data have a rapidly changing structure? If yes, MongoDB is a great choice to deal with complex data structures. 
  • Does your application require a high velocity, high volume workload?
  • What are your organization’s preferences? 

Additional Resources for PostgreSQL Integrations and Migrations

Conclusion

MongoDB and PostgreSQL are trusted sources that a lot of companies use as they provide many benefits, but transferring data from them into a data warehouse is a hectic task.

The automated data pipeline helps solve this issue, and this is where Hevo comes into the picture.

References:

Arsalan Mohammed
Research Analyst, Hevo Data

Arsalan is a research analyst at Hevo and a data science enthusiast with over two years of experience in the field. He completed his B.tech in computer science with a specialization in Artificial Intelligence and finds joy in sharing the knowledge acquired with data practitioners. His interest in data analysis and architecture drives him to write nearly a hundred articles on various topics related to the data industry.

No-code Data Pipeline For Your Data Warehouse