MongoDB Substring (Substr) Aggregation: Simplified 101

• February 28th, 2022

MongoDB Substring

Handling data in recent times needs immense flexibility, and many will not require a rigid traditional RDMS. Here’s where NoSQL Databases come in handy and are popular today. With more flexible requirements and development of modern applications need the support of NoSQL Databases such as MongoDB. Organizations store and retrieve their business data from MongoDB using applications they use to run business activities. 

MongoDB comes with many functions and features that help users handle data in the Database with ease. MongoDB aggregation functions are widely used to manipulate data in the Database. It allows users to perform different operations on data. MongoDB Substring is a function that retrieves substring from the data string.

In this article, you will learn about the MongoDB aggregation functions, key features, and architecture of MongoDB. You will also read about the MongoDB Substring function that is widely used in Data Analysis. Moreover, understand how to use the MongoDB Substring function using syntax and examples.

Table of Contents

What is MongoDB?

MongoDB Logo
Image Source

MongoDB is a Document-Oriented NoSQL Database ideal for storing large amounts of data. MongoDB does not utilize tables and rows like typical relational Databases do. Instead, it uses collections and documents. Collections are analogous to relational Database tables in that they include groups of documents and functions. 

As part of a platform-as-a-service offering, 10gen Software created MongoDB in 2007. In 2009, the company shifted to an open-source development method with commercial backing and extra services. 10gen had given the name MongoDB Inc. in 2012.

Documents comprise key-value pairs, which constitute MongoDB’s fundamental data type. MongoDB’s appeal stems from its efficient querying and rapid data retrieval capabilities, which provide more profound insights through Big Data Analytics.

Features of MongoDB

Some of the main features of MongoDB are listed below:

  • Document-Oriented: There are collections in every Database, and those collections include documents. The number of fields in each document might change, making each one unique. Each document might be distinct in terms of its size and content. 
  • Flexibility: the document’s structure is more in accordance with how programmers design their classes and objects in their different programming languages. Programmers claim that their classes are rows and columns and contain an organized layout of keys and values.
  • Shema Less: MongoDB’s data schema makes it possible to easily create hierarchical connections, store arrays, and other more complicated structures. In MongoDB, rows (or documents, as they’re known) don’t need to be pre-configured with a schema. 
  • Scalability: The fields in a MongoDB row can be constructed as per requirement on the fly. When it comes to scalability, the MongoDB Database is unmatched. Numerous businesses worldwide have set up clusters, some operating 100s of nodes and storing millions of documents.

MongoDB Architecture

Image Source

MongoDB’s architecture is composed of the following components:

Database

Additionally, it refers to the physical container for information stored. Different Databases exist on a single MongoDB system. Each Database has its collection of data on the system files.

Collection

The collection contains a variety of materials from a variety of areas. There are no schemas in collections. A single Database is going to manage all of the collections.

Document

The document has been allocated to key values connected with dynamic schemas. The advantage of adopting various schemas is that documents do not have to share the same fields and can have a variety of data kinds.

To learn more about MongoDB, click here.

Simplify Data Analysis with Hevo’s No-code Data Pipeline

Hevo Data, a No-code Data Pipeline helps to load data from any data source such as MongoDB, SaaS applications, Cloud Storage, SDK,s, and Streaming Services and simplifies the ETL process. It supports 100+ data sources (including 30+ free data sources) and is a 3-step process by just selecting the data source, providing valid credentials, and choosing the destination. Hevo not only loads the data onto the desired Data Warehouse/destination but also enriches the data and transforms it into an analysis-ready form without having to write a single line of code.

Get Started with Hevo for Free

Its completely automated pipeline offers data to be delivered in real-time without any loss from source to destination. Its fault-tolerant and scalable architecture ensures that the data is handled in a secure, consistent manner with zero data loss and supports different forms of data. The solutions provided are consistent and work with different BI tools as well.

Check out why Hevo is the Best:

  1. Secure: Hevo has a fault-tolerant architecture that ensures that the data is handled in a secure, consistent manner with zero data loss.
  2. Schema Management: Hevo takes away the tedious task of schema management & automatically detects the schema of incoming data and maps it to the destination schema.
  3. Minimal Learning: Hevo, with its simple and interactive UI, is extremely simple for new customers to work on and perform operations.
  4. Hevo Is Built To Scale: As the number of sources and the volume of your data grows, Hevo scales horizontally, handling millions of records per minute with very little latency.
  5. Incremental Data Load: Hevo allows the transfer of data that has been modified in real-time. This ensures efficient utilization of bandwidth on both ends.
  6. Live Support: The Hevo team is available round the clock to extend exceptional support to its customers through chat, E-Mail, and support calls.
  7. Live Monitoring: Hevo allows you to monitor the data flow and check where your data is at a particular point in time.
Sign up here for a 14-Day Free Trial!

Aggregation in MongoDB

Aggregation operations in MongoDB are used to handle data records/documents and produce calculated results. It is analogous to SQL’s aggregate function. It aggregates information from several documents and then executes various operations on the gathered data, such as total, mean, minimum, and maximum, to deliver a calculated result.

MongoDB aggregates data in three distinct ways:

  • Aggregation pipeline
  • Map-reduce function
  • Single-purpose aggregation

Aggregation Pipelines

A document aggregation pipeline is composed of one or more processing stages:

  • Each step conducts an operation on the document processed. A stage, for example, can filter documents, group them, and calculate values.
  • The documents generated by a step are forward to the following stage.
  • A pipeline for document aggregation can produce results for groups of documents. For example, return the MongoDB substring, sum, mean, highest, and lowest values.

The following example of an aggregate pipeline has two steps and returns the total order number of burgers organized by Burger name:

db.orders.aggregate( [
// Stage 1: Filter Burger order documents by Burger size
{
$match: { size: "medium" }
},

// Stage 2: Group remaining documents by Burger name and calculate the total quantity
{
$group: { _id: "$name", totalQuantity: { $sum: "$quantity" } }
}
] )

Single Purpose Aggregation

Single Purpose Aggregation is utilized when we want quick access to documents, such as when we need to count the number of documents or when we need to locate all different values in a document.

It just gives access to the pipeline’s common aggregation mechanism via the count(), distinct(), and estimatedDocumentCount() methods, and hence lacks the pipeline’s flexibility and possibilities.

Let’s use a collection named “sales” that stores purchases:

{
    _id: 5bd761dcdo323e45a93ccxsb,
    saleDate: 2020-08-25T12:54:14.170+00:00,
    items: [
      {
        "name": " paper",
        "price": 19,
        // ...
      },
    ],
    storeLocation: "St’ Louis",
    customer: {
        age: 35,
        satisfaction: 4,
       // ...
    },
    couponUsed: false,
    purchaseMethod: "Delivey"
}

If we wanted to get the overall number of sales, we could run:

const totalNumberOfSales = await collection.countDocuments();

countDocuments() will aggregate the collection’s total number of documents and return it to us.

Map Reduce

Map reduction is a technique used to aggregate results for vast amounts of data. Map Reduce serves two primary functions: first, it acts as a map, grouping all documents, and second, it acts as a reducer, performing operations on the grouped data.

Syntax:

db.collectionName.mapReduce(mappingFunction, reduceFunction, {out:'Result'});

MongoDB Substring

Users can use the MongoDB Substring function to extract a substring from a string. MongoDB Substring is an aggregation pipeline operator that helps users manipulate documents in the Database.

$substr

The MongoDB Substring ($substr) returns a substring of a string beginning at the supplied index point and terminating with the specified number of characters. The index is numeric.

Syntax:

{ $substr: [ <string>, <start>, <length> ] }

Here, 

  • If the start value is negative then the MongoDB Substring function will return an empty string “ ”.
  • If the value of length is negative then the MongoDB Substring function will return the substring starting from the specified starting value or index till the full remaining length of the string. 

The arguments may be any acceptable expression as long as the first is a string and the second and third are integers.

Consider the following MongoDB Substring example for stock levels collection of documents:

{ "_id" : 1, "item" : "PQR1", quarter: "12Q1", "description" : "product 1" }
{ "_id" : 2, "item" : "PQR2", quarter: "12Q4", "description" : "product 2" }
{ "_id" : 3, "item" : "ABC1", quarter: "15Q2", "description" : null }

The following procedure, which makes use of the MongoDB Substring $substr operator, converts the quarter value to a yearSubstring and a quarterSubstring. The following code is given below.

db.inventory.aggregate(
   [
     {
       $project:
          {
            item: 1,
            yearSubstring: { $substr: [ "$quarter", 0, 2 ] },
            quarterSubtring: { $substr: [ "$quarter", 2, -1 ] }
          }
      }
   ]
)

The following output for MongoDB Substring extracted is returned by the operation, shown below.

{ "_id" : 1, "item" : "PQR1", "yearSubstring" : "12", "quarterSubstring" : "Q1" }
{ "_id" : 2, "item" : "PQR2", "yearSubstring" : "12", "quarterSubstring" : "Q4" }
{ "_id" : 3, "item" : "ABC1", "yearSubstring" : "15", "quarterSubstring" : "Q2" }

Conclusion

In this article, you will learn about MongoDB and what are the aggregation functions in MongoDB. You also read about one of the widely used aggregation functions, MongoDB Substring that is used to extract a piece of string or a substring from the given string based on the conditions provided by the user. In which you went through examples of how to use MongoDB Substring $substr function with their outputs. MongoDB Substring helps in cleaning data values, splitting the strings, etc. that can be useful for companies in Data Analytics. 

MongoDB substring is a valuable aggregation technique. MongoDB ranks 5th overall in ranking the most popular Database management systems worldwide as of January 2022. Performing all reporting, transformation, and sophisticated querying operations is critical to a Database developer’s job, and MongoDB is an efficient partner.

Visit our Website to Explore Hevo

MongoDB Database stores valuable business data that can be used to generate insights. Companies need to analyze their business data stored in multiple data sources. The data needs to be loaded to the Data Warehouse to get a holistic view of the data. Hevo Data is a No-code Data Pipeline solution that helps to transfer data from 100+ data sources to desired Data Warehouse. It fully automates the process of transforming and transferring data to a destination without writing a single line of code.

Want to take Hevo for a spin? Sign Up here for a 14-day free trial and experience the feature-rich Hevo suite first hand.

Share your experience of learning about the MongoDB Substring Aggregation function in the comments section below!

No-code Data Pipeline For your Data Warehouse