Thursday, April 12, 2012

Integration tests for your database code

I hear a lot of people talking about tests and I have been to a couple of events where speakers have given presentations on the subject. Everyone talks about unit tests, and TDD and BDD, Continuous Integration. However, I don't know if you noticed but database-related integration testing is often overlooked, omitted, or briefly mentioned when talking about tests. It makes you wonder why. Doesn't it?

Why is that fear on the subject. Are database-related tests not needed?

Yes, they are. You need them because there are things you can't simply mock (stress and load tests to mention some). More than that, you may not have other options because you are dealing with legacy code and no time ($) to do a proper re-factor and unit testing. We need them because we should make sure our "whole" system works as expected.

But to be truthful, the main reason would be that database related testing sucks. Is difficult to get right. It is slow compared to other kinds of tests and if not implemented properly it could become a waste of time and a source of headaches.

Yet a lot of the logic we write in our software relies on certain preconditions and behaviors of the underlying persistence mechanism to be correct. Most of the times we think there is no way of testing those assumptions, than to actually use them. Using them ether by running the solution on the developers local system and database or an integration server or something similar.

Before you start screaming and writing me off your list I must say.

I don't couple my business logic with my persistence. What I meant is that parts of our logic rely on behaviors that we sometimes take for granted they will happen as we think. These are things like transactions management, when using Spring's Transactional attribute, or cascaded persistence, etc. all of which could fail at run-time.

Sadly, unit tests can't help us in this regard. It would seem that the only way of actually testing that this behaviors are what we expected, is to execute or deploy the application on a controlled environment. Or is it?

You already know what the answer is.... don't you?

the answer is...

42...  :]

To explain it better, I am going to split the whole process into two scenarios or contexts and attack both in different ways.

Case one: Building Application from Scratch
Like Uncle Bob likes to say, there is nothing like the green field. That bast meadow where you first start to build your "architecturally sound" software. There is nothing there, no mess left behind by others, no constraints. It opens up the door to a lot of different opportunities (including creating a big mess). This is our first scenery. But before we start digging in on the hows and whys, some disclaimers are in order.

I am going to assume you know what and ORM is and that you are using one, and if you are not you have a pretty good reason not to. Either way I will explain what I usually do or would do when I find myself in both situations. That doesn't mean that this is the "best" or recommended way. It just means it is my preference. If you have your own ideas on how to improve the process or maybe a more efficient one: don't be shy and share!

Also, I am going to use Hibernate + JPA for the examples because a friend asked me to, but this would be easily extended to NHibernate, or other ORMs like the Entity Framework code first approach. If you ask for it on the comments I can extend or add new posts to include those too.

What do we want to achieve?
We want to test our persistence and query logic, usually located in the "DAO" layer of the application. Since we are good developers ;) we want these tests to be deterministic, self-verifiable, and order independent.

What do we need?
We need to setup a complete database environment, equal or similar to the one we are going to be using, and then populate it with test data, to be able to assert the behaviors in our test code. We will use Hibernate + JPA + HSQLDB + Spring 3.0.

First thing would be to configure Hibernate to recreate the database schema based on the mappings we have, every time it initializes for the integration tests. I will do this by initializing a new spring context and JPA persistence configuration, just for the tests.

Notice that I am also using an in memory database provider HSQLDB to make things a little less complex. However, this could be any other provider. You would just have to provide any connection details needed and to make sure you have the right permissions for the schema.

Since we are using Hibernate in Java we are going to take advantage of the functionality Hibernate gives us of executing an initialization script called import.sql after the schema update process during its own initialization. You can read about it here and here.

Another way of doing it would be to use DbUnit, of which I will talk more in the next example case.

So now we have our import.sql file ready to be executed at the context's initialization. Here you would place your test's data as a set of insert statements. This will allow us to initialize the schema created by Hibernate from your model with the tests data just before the context is accessible to the tests

As you need more tests and the schema evolves, you will extend and update this scripts.

And that's all! Now you can start writing your database tests.

The good
One of the good parts and the one I particularly like the most, is that the database schema doesn't need to be stored as a script in the source control. Instead it will be stored with the code, since the actual model that you are using is the blueprint used by Hibernate to create the database schema. So any changes in the mapping reflect on your integration tests as soon as the context is initialized and developers don't have to write down migration scripts for the changes which is error prone and boring.

This also means if you change database providers for instance, from MSSQL to Oracle, Hibernate will be the one creating the script for the creation of the schema, and it will be using the dialect of your choice. This is particularly useful when you still haven't made any decisions on what the underlying persistence support will be.

The bad
The main problem I see with this approach is that it only works if you don't have previous data that you need to maintain. But if you do, there is no way (that I know of) to use this behavior on the "update" mode and also keep the "migration" done by Hibernate.

Also, if the test data gets too big there is no way to split the script so that it would be more manageable using some kind of "import" directives. However, I think in version 3.6.10 > of Hibernate you can set the scripts to load for each persistence unit using a configuration property, or through code. You can find more info about it in this Stack Overflow thread and in this Spring Forum thread.

By the way, there seems to be a "bug" with the JpaVendorAdapter that I use in this type of set up, which sets the provider to "update". To resolve this issue just set the generateDdl flag to false. You can find more info on the reasons here.

The Ugly
Integration tests are slow. This solution needs you to instantiate the context and database for each set of tests. It only initializes the context once on each test suite. However, although this is faster than doing it for each test, you may run into order dependency problems among tests because of the data, if you don't write your tests properly. Use wisely.

You can still set it up so that each tests executes with a clean db setup. A solution would be to regenerate the schema an repopulate the tables with data for each test using the SchemaExporter class. I leave that as an exercise to the dear reader ;)

Case two: Building from a Legacy Database System
Now, take into consideration that once you release a version of this application, you are actually moving into  building on top of a legacy system.

In this case you already have a database, it may be from a previous version or from a totally unrelated product. Whatever the case is, I am assuming you can't loose data or in some cases modify the database schema. The create approach in Hibernate won't be of much help here. What do we do then?

Well, in this case we are going to need something else. Here is where DbUnit comes to the rescue.

There are other solutions like the SQL Ant task that executes SQL scripts against the db before you run your tests, but I like dbunit better because I can put the database initialization and finalization into my tests, per test, instead of at build time. Also in most cases I can escape having to write the SQL insert statements letting DbUnit in charge of the dirty bits of generating them.

Whats different from this approach?
In this case, we are going to get a little more control over what is happening on the db. We will still insert the sample data, just that this time, it will be DbUnit the one doing it. To do that, we first need to put this information in and xml file format that DbUnit will understand, so that it may be able to dump it into the db when we tell it to.

The format is pretty simple. All records are hanging from a dataset root object and the name of the node will be the name of the table where you want to insert this information. For specific column values you use the attributes of the node. The actual content to put in the column is the value of the attribute.

The Good
We have per test database setup. This is faster than the method we already talked about but, you still need to do a little cleanup afterwards. A way to avoid this would be to execute each test under a transaction and at the end of the test just roll the transaction back. However, this may not always be possible.

We also get a "semi-raw" code based access to the database for verification. In this sense you get more control over the verification of what is really happening. In case you didn't notice when I was testing the insertion in the first method, I used the findUserById method thus relaying on my own code. While for unit tests this may not be a problem as long as the code you are using has its own unit tests, when it comes to db integration tests I wouldn't recommended it. The reason being that you could fall into the trap of, for instance, thinking that your insertions are working when in fact they could just be cached by the underlying persistence mechanism. Be aware of it.

Finally, you can have both methods working side by side and use them as needed.

The Bad
While DbUnit will insert the data for you, it won't create the schema from the information we give it (not that it could). That's why we are still using the "update" mode with our persistence.

There are also some funny things about the way data gets cleared because it relies on the order the records where inserted. I remember this particular dog biting me some time ago.

The Ugly
DbUnit seems like it has not noticed the rest of the world has changed. By this I mean that you end up writing lots of boilerplate code for your tests that could have been avoided with some properly coded annotations. This is typical of the JUnit 3.X style of tests where you would need to inherit from a TestCase class or something similar. Not cool. However, if you refactor your tests, (like you do, right?) you will end up with little or no repetition.

This is all for now. You can see the full code at this repository on github.

Keep tuned for some more smelly code.. with potatoes on the side soon and leave some feedback if you want to help me improve the quality of these posts ;)

Update 14/04/2012: I renamed the github repository.

No comments:

Post a Comment