A few days ago, I published a blog post about a new eXo Add-On: eXo Community Chat.

You can read more about it here : eXo Platform Add-ons: Chat in Space

In that post, I promised some technical insights about the technology I used. So, here they are.

eXo Openness

In many discussions with our partners and customers, some questions pop up regularly. One of these is: “Can you use eXo Platform with a different database than the one you use?” And my answer is “Yes”, you can use our JCR but if you prefer to use any other relational database or even NoSQL databases for your developments, you’re free. This post, I hope, will show you how.

Perfect match for my Chat application

Regarding the application I was designing and the features I wanted on it, a Document Oriented Storage looked like a perfect match.

To summarize, a Chat application is about:

  • Writing/reading messages;
  • A lot of messages;
  • Notifying users of new messages;
  • Formatting messages, searching, etc…

So, it’s clearly about content and a lot of it.

You can of course create an “XMPP proxy app” but I wanted a pure web-based solution using modern technologies, and an easy to install one as well. In this case, you can either set up any relational database and create your model or you can leverage the flexibility of MongoDB. I’m going to explain what this flexibility means for me.

There’s a second important point for me, as a designer/developer. With MongoDB, you’re not tied by the model, you can change it whenever you want. Nothing is written in stone.

Thus, you can start to develop an app as a POC (Proof Of Concept), then you can iterate. And while you develop and add new features, you can change the model, add new columns, change existing ones, create new collections, etc.

That’s actually what I do when I write an app for myself. My design process is very simple:

  • I take some time to look at what’s out there, what I like AND what I dislike;
  • I write notes, lots of them, and list the top five features I want in my app;
  • I set up my priorities;
  • I design some mock-ups on paper;
  • I start to develop.

Inside eXo – my day-to-day job – it’s a different game, you must write specifications (both functional and technical) because a lot of people are involved of course. But when you’re on your own, you have everything (the features aspect and technical aspect) already in mind: you don’t have to write that much down. Then, when you finally start to develop, if you have to reset and drop the database each time you change your mind, it slows you down. MongoDB is great for this as you can create a collection when you want. Flexibility also has its counter side of course. If you don’t take care of your model, you will end up with inconsistencies and big problems in your model – with great powers comes great responsibilities.

Flexibility

So, how flexible is MongoDB?

Let’s take a simple example using MongoDB Java API:

You can see the code is really straightforward:

  • Get the Collection
  • Create a query object
  • Search

Adding a user is not that much harder as you can see:

Now, this is about flexibility. You end up here with a unicity problem. You can have multiple instances for the same session (this code is not thread safe), and that’s something you don’t want in your collection.

So, just bind the MongoDB Object ID to the session and you’re done (at least on the DB side):

This is exactly what I love about MongoDB, you can iterate your code very quickly.

WriteConcern.SAFE : for the sake of simplicity

There are different strategies for writing data with a NoSQL DB, which can have many server instances. There are lots of explanation about this but if you want a shortcut to one: How MongoDB Different Write Concern Values Affect Performance On A Single Node?.

In my case, I chose the SAFE mode: it’s not the safest way but it’s “safe” enough for a Chat, I think, and it’s also much faster than Journal Safe. Maybe NORMAL mode would be fine for this Chat app as well (there’s no critical data in general with Chat, so, we could afford to risk loosing data for the sake of performance). I could make this configurable if MongoDB’s speed becomes a latency problem. But at least for now, I will use this mode (the best option I think).

Performance and Scalability

One Collection versus Large Number of Collections

When I started to play with MongoDB, I stored messages this way:

As you can see, I stored all messages in a single Collection. In this case, you create an index for each column, right?

Then, I read more about MongoDB, how it works effectively and discovered this page: Using a Large Number of Collections.

There’s one thing to notice in this page, it’s “Of course, this only makes sense if we do not need to query for items from multiple logs – aka collections – at the same time”.

Let’s come back to my use case. Actually, I store all messages in one place but I will always query them filtered by Room. Thus, I just need to search on one room at a time and I want all messages from this room.

As you can see, I now create one collection per Room. As I create one room per discussion (one-to-one discussion or Space discussion), I can end up with a large number of collections but that’s a recommendation and I want to give it a try. If there are too many collections, I could switch to a user-oriented model where I will create one collection per user, which will create fewer collections.

Indexes

At first, I didn’t create indexes in MongoDB even if you should never work without the right indexes! So, why? Because I always do some quick performance tests and add indexes only when necessary. When I did my first JMeter performance test, what a surprise! Performance started at 100 messages per second, then it went to 50 messages per second with 500 messages or so in the Collection, then, dropped to 30 messages per second over 1000 messages. If you don’t use indexes, this will happen for sure. So, I added Indexes in the right place -where I filter my queries, and the result is clearly better. No matter how many messages, notifications, users, spaces I have now, the action per second is very stable.

On my laptop (CPU: 2.8GHz Intel Core 2 Duo), results are encouraging. I ran that JMeter test locally, so, JMeter took half my CPU just to “follow the music”.

Another great thing is I totally decoupled the Chat Client and the Chat Server onto two different Java Application Servers. Chat clients (Chat portlet and Notification portlet) are running on eXo Platform server while Chat server is running on a standard Tomcat 7 bundle.

I won’t go further here about multiservers as it will be the topic of my next blog post.

I don’t have the response time graph on this screenshot but I assure you it’s stable ;-)

I did “long” run test with more than 100 000 messages and the performance doesn’t decrease (well, just a little). Of course, I don’t retrieve the entire Room when I show it, I apply limits like this:

So, by default, I get the 200 most recent messages in the last 7 days, never more. Two reasons for that:

  • Overall performance, the limits are a must-have for that
  • Web oriented (I need to send the data back to the client and I can’t afford to send megabytes of data).

What’s Next?

As I’m very new with MongoDB, I’m sure the choices I made are not the best/most efficient (there’s always room for optimization though). It’s the first project I did with MongoDB (and not the last) and even if I know quite a bit now about Front End optimization, I’m just a beginner with MongoDB. If you have feedback, don’t hesitate to contact me to build a better app with your experience.

And wait, there’s more coming. In few days, I will publish a second part about MongoDB, we will see how easy it is to use the Chat App in the Cloud using the excellent MongoHQ service.

By the time, if you want to take a deeper look at the code, it’s Open Source and available on eXo Addons Repositories: https://github.com/exo-addons/chat-application

« Again, thanks for reading and stay tuned – more things are coming, Benjamin. »