How to Design Document Database - Mongodb Schema Design
How to Design Document Database - Mongodb Schema Design

In this post I want to talk about design of your schema inside MongoDB. It is extremely important to plan the schema of your database first before you even start implementing your application. This is why here I want to show you some rules that you can follow in order to create correct MongoDB schema.

So, why it is important to plan schema for your database? Actually when you start with the project you simply want to start writing code as soon as possible. And this is a bad approach.

You must plan your application at least from the data base perspective and you can easily do it just with pen and paper or just by using some editor, where you can write down all your columns, all your tables and everything that you need. While planning you will see, what you need to implement and what problems or requests you will have. If you simply start to implement something without planning your database it won't be scalable and it will be more difficult to maintain such data base.

It's not a relational database

My rule number one here is that you should not blindly convert relational schema for your database inside MongoDB. Why is that? Actually a lot of people know how relational databases are working. We have there tables, we have columns inside tables and we have relations between different tables.

Relational DB

First of all we have users where we store the ID of the user, full name but also references to our roles and the departments. And as you can see we have additionally 2 tables with departments and with roles. Our user references with IDs first of all to the role table and secondly to the department table. This is how relational data bases are working.

Actually the main point is, when we are talking about relational data bases we think just about our tables out columns and about our data. We don’t think at all about our queries.

When people start working with MongoDB they simply try to apply all the same rules to MongoDB. This actually means they simply use MongoDB as a relational database. So, they just have 3 collections, users, roles and departments and they create documents inside. User document with ID of our role and ID of our department. And actually it is totally fine. But you must understand that MongoDB is document oriented database. And it is not relational database. It doesn't make any sense to simply take MongoDB and use it as a relational database.

It's a wild west

So, rule number 2 about MongoDB is that you don't have any rules. And actually this is true, you don't have some columns, you don't have relations, you don't have strict fields like in relational databases. You can throw inside documents and inside collections whatever you want.

It is super important rule because you must actually plan and design your schema of MongoDB database depending of your project.

You are not doing it all the same in every single project like you are doing with relational database. So, this is how we could convert our relational example to MongoDB.

Embed example

As you can see we have three collections and references between them. But actually we can do it in a different way, we can simply create a single collection users and inside we store our users. Inside every single user we can just store a string of the department without creating a reference. And if you have for example several departments for one user then you can store an array of these departments. The same goals with roles. As you can see inside roles we don’t have just a title, but we also have a salary inside. And we can store inside MongoDB embedded array of objects. This is why in this case we can without creating additional collections just store all this stuff embedded inside our users collection. And actually this is awesome, because we simply query our users and we are getting all data inside just with single query without any additional requests or additional joins.

Embed unless you have a valid reason not to

So, my rule number 3 here embed unless you have a valid reason not to embed. Which actually means embedding is really awesome and you must always prefer embedding over referencing. First of all because it is faster, you don't have any joins or lookups. And secondly you can request or update all this data just with single query. You don't need to make multiple updates in order to update our user then role or department.

Reference if you really need to

But rule number 4 here is use referencing if you really need to. And actually embedding is not always suitable option. First of all because our documents can be really huge.

The maximum size in MongoDB of a single document is 16 Mb.

It may sound like huge amount of data for strings, but you can easily fill it if you embed lots of data.

Books store

Let's look on this example, here we have our book store and we have all books inside this book store. And yes this is super comfortable we have just a single document. But the main problem here that we will get newer and newer books every single time when we create a new book. And at some point our document will be too huge to support and it might also get to this limit of 16 Mb.

Second important note when you want to reference data is when you need this data without parent. For example if we have a request to get a single book without our book store then you can’t really embed things inside. You must create additional collection store their books and then reference your book store. In this case you are solving several problems.

  • First of all you won't get such a huge document.
  • Secondly you can use this entity separated from the parent and you don't have any data duplication.

Actually if we will look again on our example with a user roles and departments, you can understand that we have data duplication there. And it is completely normal situation for MongoDB where we can have data duplications. If we use just references we don't have any data duplications, because they are stored in additional collections.

Don't grow indefinitely

And rule number five is that your arrays should not grow indefinitely. In our example with book store and books it is exactly this problem. But you might also get this problem with less extreme use case. For example you have a relation between user and user messages.

Messages

And every single message has a reference to the user. This is completely normal but you might also want to have any array of IDs of all messages inside user. And after some time this array will be really huge, because you simply created lots of messages and you have hundreds or thousands of IDs. This is why in this case it is much better to have the ID of the parent just inside the child and then filter the collection by parent ID.

And also if you are not sure that you know all differences between NoSQL data bases and relational data bases make sure to check this post also.