Sunday, April 15, 2012

Seamless database content localization with NHibernate

Localization is a well known problem we as developers usually face when building multilingual applications. It is well studied, and for most development frameworks, there are ways in which you can localize your UI text or application resources. But when talking about localizing content that is going to be stored in a database, that's a whole different story.

There is little or no support in most persistence platforms to enable this scenery out of the box whether it is an ORM, or the "default" persistence method for the technology stack of your choice. Also, when trying to solve this problem we usually try to reinvent the wheel. By that, I mean there are literally hundreds of man hours lost in redoing the same thing, again and again, for each application.

The worst part is most solutions we come up with or at least that I have seen/experienced are really intrusive in the way we "ideally" try to approach persistence nowadays. This ideal being that the way we write our models shouldn't be contaminated by the implementation detail of how they are going to be persisted.  But sadly, that is not the case most of the time and our models end up getting splattered by the constraints imposed by the persistence mechanism.

Specially with ORMs this is very noticeable because we end up creating entities just to enable the localization process and making these entities part of our domain model almost by force. Turning these "localization enablers" into a sort of dependency magnets, spreading through our models. What happens then is that if you want to take your domain objects somewhere else, you need to take the localization mapping objects with you and anything else they need.

If you are interested on knowing about other approaches to localization here are a couple of links on the subject. You can start by reading an overview of all of them in this NHForge post. Here are the links to each method explained: MichalSiim Viikman, Ayende, Alkampfer [1, 2, 3], Fabio.

Again, disclaimers are in order. The code provided here is just a proof of concept and not production ready. Keep that in mind at all times. If you want to make it production ready, fork the github repository at the end of this post and send me a pull request. I will be more than glad to add your name to the contributors list and this post.

Yet another solution
As I mentioned before, there are several solutions out there already. The reason why I don't like them is that almost all of them except for Ayende's, force you to create your domain model with a particular persistence trick in mind. Whether it is adding the property as a dictionary or having a special type for it, they are kind of intrusive and I didn't like that.

So what this solutions proposes is to implement a not so commonly used feature of NHibernate called interceptors. This interceptors pattern is called upon whenever NHibernate performs an operation, and they allow you plug into the framework's pipeline so that you can transform it, update it, analyze it, enhance it, or whatever you want to do.

In this case our interceptor is going to look for the localization message entries in the database of each of the properties in our entity according to the current culture it is working on, and based on the results it will update the entity's values with the localized values. Just so we don't query the database every time, which would be a big performance hit, we are going to be using NHibernate's second level cache.

That's enough talk! Let's get down to it!

The code
I have hosted the code at this github repository. You can download it, fork it, use it, modify it, sell it (not advised), or wear it like a hat. I am going to be using NHibernate + FluentNHibernate + Moq + MSTests + SQLite but aside from NHibernate, the rest is not really that important.

So first, lets see how our localization persistence entities are going to look like.

Here we have two classes: LocalizationEntry and LocalizationEntryId. One serves as a composite Id of the other for catching purposes, but in general, they are not complex classes. The Id class consists of the entity's type, the entity's id, the property to which the localization message belongs to and the culture in which this message should be displayed.

You can see I have overridden the Equals and GetHashCode methods for the composite Id. This is required by the framework in order to use the composite object as an id.

Here is the fluent mapping for these same entities.

I activated the cache on this entity so that we can take advantage of the second level cache feature that NHibernate provides us with. This way we won't query the database every time.

Last but not least our proof of concept interceptor implementation.

How to use it
To get a better understanding on how to use the interceptor we just have to take a look at the tests, since they pretty much explain the use a consumer of this method would give to the API.

In fact, it is so transparent that you just use it the same what you would use NHibernate's persistence. We just open a session and pass in the interceptor we are going to use. I normally do this at the request level, since I usually do session per request handling, but it will work any other way. You just need to make sure that the interceptor you are using has the right culture set or you may end up getting the wrong results.

The only difference shows up when you are going to store the localized values. In the examples I insert the values into the database before I run the integration test as their own entries. This may be just is fine for you, but if you go to the ugly section, you can read on a way in which to make that process just as transparent as the load.

By the way, if you are interested in how the testing is implemented and you come from the Java side of things, you may learn more on the subject by taking a look at this post on integration tests for your database using Java.

The Good
We totally decouple the localization logic from our domain models so that we can use them as we please without carrying any baggage. This leaves the door open to lots of possibilities.

This mechanism can be replicated with Hibernate for Java and possibly other ORMs.

The Bad
I didn't implement the localization insertion part. I hope I will do it sometime, but like I said, if you want to contribute, go ahead.

As said in the disclaimer at the beginning, this is not production ready code. I didn't do any error handling or checking for the entity's types. You won't be able to localize any non string properties. 

There is no way to easily query the localized data by the localized fields. If you don't need sorting for this fields or something similar, the solution is OK. However, if you do, consider other options. I would look into indexing the localized content using Lucene or something like that, and working your search related cases from there.

The Ugly
Here are a couple of improvements that could be done and that could be easily added.

Catching and Pre-Catching
As described earlier I tried to use a composite so that I could take advantage of the second level cache. It would be more efficient to mix that with some pre-catching. For instance, executing a query to load all the localization messages for the entity on the first go and making sure the results get stored using the second level cache for queries.

Transparent persistence of the localization values
Although for the purpose of this post I didn't need to do it. The persistence for the localization values could also be done the same way as the load by implementing another method from the base interceptor class. Just setting the value for the property and saving the entity, could save the culture dependent message to the database transparently.

I leave that as an exercise to whomever wants to dig a little deeper. ;)

Cherry-picking fields to be localized
In this example I assumed I wanted to localize every property. I didn't check for entity property types. But in case we didn't want to run into performance problems we could cherry-pick which fields we would like to localize by adding a special attribute that the interceptor would look for to decide which properties to localize or not. I don't like this approach very much because it will force the attribute on the model, but there are other ways of doing the same thing without having to decorate our domain class directly. (Tip: Fluent NHibernate uses a similar approach)

Selecting values using business logic
Another interesting problem would be the fact that sometimes we want to localize but based on a specific application or culture related logic. For instance, with numerals we may like to show the pluralized version of a word. It is not the same to have 1 "message" in your inbox, that to have 2 "messages". This concept is tricky and may need some thought to get it right, but it doesn't really worries me since the localization code is totally decoupled from our models, which gives us a lot of freedom to work.

Again, that's all for now! Let me know id this post was helpful and see you soon with some more smelly code... with potatoes!

1 comment:

  1. Beautiful, this comes at just the right time. Thank you, José!