MongoDB Server-Side JavaScript and Custom ID

Everybody loves ObjectId. It’s convenient, you never run out of numbers, and you feel yourself a hardcore programmer working with hex. You can even extract a timestamp out of it, which comes in handy quite often. After all, you can freely move data between collections and databases being sure every ID is unique.

However, some timid souls do not like it. I can’t take out of my head the picture of a customer trying to tell his 12-symbol-long receipt number by phone I recently witnessed, poor thing barely managed to provide it to the customer service. So, the task was clear – we need a simple transaction number that corresponds with following requirements:

  1. Consists of a predefined prefix followed by a number: AZ1854
  2. Numbers are consequent providing the prefix is the same
  3. We want it to be created as easily as possible, lazy programmers are good programmers (are they?)
  4. We continue using ObjectId for internal purposes as primary key (_id), which means the custom ID will be stored in another field.

To deal with all that we will create a MongoDB JavaScript function, save it into the database and call it whenever we need to create a new transaction. Sounds complicated? Of course it doesn’t!

Unique Index

We’ll start with an index. Why do we need an index? First of all, since it’s a transaction number, we will use it for looking up transactions, and index will makes those queries much faster. And do not forget that it’s going to be a unique number, and the unique property will help us with it.

db.getCollection('transactions').ensureIndex(
    { custom_id: 1 },
    { unique: true, sparse: true }
);

transactions is the name of the collection we need to create the index in.
custom_id is how we call the field.
unique tells the MongoDB to confirm we do not insert a duplicate, and if we try, it will throw an error for us to catch.
sparse is another property that makes MongoDB avoid indexing of all documents that do not have the field presented. In our case it serves another purpose as well, let’s see that the MongoDB tutorial tells us about it:

In many situations you will want to combine the unique constraint with the sparse option. When MongoDB indexes a field, if a document does not have a value for a field, the index entry for that item will be null. Since unique indexes cannot have duplicate values for a field, without the sparse option, MongoDB will reject the second document and all subsequent documents without the indexed field.

In other words, without sparse we wouldn’t be able to create two or more documents with no custom_id field. If a non-existing field is indexed, it’s considered to be null, and MongoDB won’t let us create another document where custom_id has the same value. That’s why we need sparse – MongoDB won’t index a document if custom_id is missing. So now we can create as many such documents as we want to.

To Store or Not To Store

MongoDB generously allows us storing our JavaScript functions right in the database. There are many discussions whether we should actually do it, however most of the arguments against it come from misuse of the feature. The basic rule here is the same as in real life – try not to abuse the hospitality by walking all over the carpet in your muddy shoes or putting legs on the dinner table. The same story is here – first think if you are better off storing the function in the database or you’d rather make it a part of the application.

For example, it’s not usually a good idea to store there any retrieval queries, no matter how often do you call them – it’s not going to perform faster (rather slower), but will definitely complicate your application so a new developer (or even yourself in a few months) will have a hard time to understand the query. Even storing the function in your Git repository is a problem – every member of your team will need to re-run the script locally with every change in the function, so you’d better be ready to retaliation. It may take the shape of a few gigabytes of useless data “accidentally” pushed into the repository; don’t be surprised if it happens when you are on a road trip with your family and have to work from a coffee shop via weak Wi-Fi signal, that’s just a “coincidence”.

Another “no” is using an old MongoDB version because they couldn’t perform multiple JavaScript operations simultaneously. This is not a problem since version 2.4 when V8 JavaScript engine was integrated into MongoDB, so all the good people can surely ignore this argument. Those fearless people who run pre-2.4 versions of MongoDB don’t need no advice.

So, think twice before storing a script on the server and then think once again. However, if the script serves a narrow utilitarian purpose and won’t be updated too often, server-side storage often comes handy. How to do it? Pretty easy, you need to save the function into db.system.js collection of your database. We’ll get back to it a bit later, meanwhile you can read the docs to gather more details on how it works.

One Script to Rule Them All

Now we need to create a JavaScript function that serves a single purpose – creating new transactions. We aren’t going to use the good old insert directly, whenever we need to create a function we’ll simply call the function and let it do all the work. What kind of work? It’s pretty simple:

  1. The function loads the latest transaction that has custom_id starting with the prefix provided
  2. Increase the custom_id by 1, create a new ObjectId
  3. Insert new document into MongoDB that contains only two fields: custom_id (the increased one) and _id (ObjectId we just created)
  4. As we already know, modern MongoDB versions (thankfully) do not lock write operations while performing a JavaScript function, so what if the function performs in two concurrent instances and the new custom_id is already taken when we are trying to insert it? That’s what we need the unique index for, we just increase the custom_id once again and try to insert it one more time
  5. We return the ObjectId we just created

You must be wondering why we need to create ObjectId ourselves before inserting the document instead of allowing MongoDB generate one automatically. The thing is that MongoDB’s insert() function does not return us the _id field, so we would need to perform another query to look it up. That’s a limitation we have to live with.

Version 3.2 has introduced the new function insertOne() that returns the _id we need, but it is fairly new so we are going to stick to the conventional way. However, feel free to simplify the code by using it in your projects.

So, here’s how we do it:

db.system.js.save({
    _id: 'custom_id',
    value: function(collectionName, prefix) {
        var entity, entities, curNum = 0, id, customID, limit;
        var collection = db.getCollection(collectionName);

        // find last entity that uses specified prefix
        var entities = collection.find({ custom_id: { $regex: '^' + prefix + '[0-9]+', $options: 'i' } })
            .sort({ _id: -1 }).limit(1);
        if (entities.hasNext()) {
            entity = entities.next(); 
        }
        if (entity && entity.custom_id) {
            // remove the prefix to determine last number
            curNum = parseInt(entity.custom_id.split(prefix).join(''));
        }

        // save the document and confirm it went well
        // we will try five times max to prevent the loop from never ending
        limit = 5;
        do {
            customID = prefix + ++curNum;
            id = ObjectId();
            collection.insert({ _id: id, custom_id: customID });
        } while (!collection.count({ _id: id }) && --limit);

        return id || false;
    }
});

As you see, we insert a document that contains two fields: _id and value. Former is the name we are going to use for calling the function, and latter should be the function itself. By calling db.system.js.save() we insert it into the collection system.js via save() function which performs an upsert operation – updates the document if _id is already in use, otherwise inserts a new one. That’s exactly what we need, so if you change something in the function you can call it again and the old function code will be replaced by newer version.

You can perform the script right in your MongoDB client or save it into a file and run it from there:

mongo localhost:27017/database_name customid.js

Once performed, the script will be saved, so we can proceed to actually using it.

Usage

It’s time to try out our new script in action! Let’s open the Mongo Shell:

mongo localhost:27017/database_name

First we need to load it, as stored functions aren’t loaded automatically. We can do it by calling db.loadServerScripts(), which loads into memory all our stored functions, so no need to call it separately for each particular function. And only then we can call the function itself:

db.loadServerScripts();
print(custom_id('test', 'SM'));
// output: ObjectId("5516b7e6dee4810687faee77")

The function has performed, created new MongoDB document and returned its ID, which we immediately display via print() function. To save any actual data you simply need to update the document by the _id field.

Do not forget to call db.loadServerScripts() every time you modify the function, otherwise you will continue using the original version again and again, wondering why your changes do not have effect.

Now let’s look into the collection test using db.test.find():

{
    "_id" : ObjectId("5516b7e6dee4810687faee77"),
    "custom_id" : "SM1"
}

If you call the function a few more times, you’ll get documents with values SM2, SM3, SM4, etc. That’s good, but you won’t open the Mongo Shell every time you need to create a document, so how do we call it from PHP?

$mongo = new MongoClient();
$db = $mongo->database_name;

$transaction_id = $db->execute("custom_id('test', 'SM')");
$transaction_id = empty($transaction_id['retval']['str']) ? null : $transaction_id['retval']['str'];
var_dump($db->test->findOne(array('_id' => new MongoId($transaction_id))));

As you see, in PHP you don’t even need to call db.loadServerScripts(), they are loaded automatically. In this example we perform the function using execute() function (behind the curtain that’s good old eval), which provides us with the return value. If everything goes well, you are going to see the document we just created:


array(2) {
    ["_id"] => object(MongoId) (1) {
        ["$id"] => string(24) "568838ee9e6831bff73dc4a1"
    }
    ["custom_id"] => string(3) "SM7"
}

What’s Next?

Progress never stops, especially in the trendy NoSQL technologies. Not long ago PHP Mongo extension has been rebuilt and separated into two parts: the driver and the library. The improvements in performing custom MongoDB commands make storing the script in the application much more handy than it used to be, so the decision to store it in the MongoDB itself needs to be reconsidered, which I’m intended to do in one of the next articles.

However, you’ve learned the basic principles of MongoDB custom scripts and gathered knowledge to modify the script for your needs. Good luck in your own exploration of the NoSQL world!

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>