Localization is a well known problem we as developers usually face when building multilingual applications. It is well studied, and for most development frameworks, there are ways in which you can localize your UI text or application resources. But when talking about localizing content that is going to be stored in a database, that's a whole different story.
There is little or no support in most persistence platforms to enable this scenery out of the box whether it is an ORM, or the "default" persistence method for the technology stack of your choice. Also, when trying to solve this problem we usually try to reinvent the wheel. By that, I mean there are literally hundreds of man hours lost in redoing the same thing, again and again, for each application.
The worst part is most solutions we come up with or at least that I have seen/experienced are really intrusive in the way we "ideally" try to approach persistence nowadays. This ideal being that the way we write our models shouldn't be contaminated by the implementation detail of how they are going to be persisted. But sadly, that is not the case most of the time and our models end up getting splattered by the constraints imposed by the persistence mechanism.
Specially with ORMs this is very noticeable because we end up creating entities just to enable the localization process and making these entities part of our domain model almost by force. Turning these "localization enablers" into a sort of dependency magnets, spreading through our models. What happens then is that if you want to take your domain objects somewhere else, you need to take the localization mapping objects with you and anything else they need.
If you are interested on knowing about other approaches to localization here are a couple of links on the subject. You can start by reading an overview of all of them in this NHForge post. Here are the links to each method explained: Michal, Siim Viikman, Ayende, Alkampfer [1, 2, 3], Fabio.
Again, disclaimers are in order. The code provided here is just a proof of concept and not production ready. Keep that in mind at all times. If you want to make it production ready, fork the github repository at the end of this post and send me a pull request. I will be more than glad to add your name to the contributors list and this post.
Yet another solution
As I mentioned before, there are several solutions out there already. The reason why I don't like them is that almost all of them except for Ayende's, force you to create your domain model with a particular persistence trick in mind. Whether it is adding the property as a dictionary or having a special type for it, they are kind of intrusive and I didn't like that.
So what this solutions proposes is to implement a not so commonly used feature of NHibernate called interceptors. This interceptors pattern is called upon whenever NHibernate performs an operation, and they allow you plug into the framework's pipeline so that you can transform it, update it, analyze it, enhance it, or whatever you want to do.
In this case our interceptor is going to look for the localization message entries in the database of each of the properties in our entity according to the current culture it is working on, and based on the results it will update the entity's values with the localized values. Just so we don't query the database every time, which would be a big performance hit, we are going to be using NHibernate's second level cache.
That's enough talk! Let's get down to it!
The code
I have hosted the code at this github repository. You can download it, fork it, use it, modify it, sell it (not advised), or wear it like a hat. I am going to be using NHibernate + FluentNHibernate + Moq + MSTests + SQLite but aside from NHibernate, the rest is not really that important.
So first, lets see how our localization persistence entities are going to look like.
Here we have two classes: LocalizationEntry and LocalizationEntryId. One serves as a composite Id of the other for catching purposes, but in general, they are not complex classes. The Id class consists of the entity's type, the entity's id, the property to which the localization message belongs to and the culture in which this message should be displayed.
You can see I have overridden the Equals and GetHashCode methods for the composite Id. This is required by the framework in order to use the composite object as an id.
Here is the fluent mapping for these same entities.
I activated the cache on this entity so that we can take advantage of the second level cache feature that NHibernate provides us with. This way we won't query the database every time.
Last but not least our proof of concept interceptor implementation.
How to use it
To get a better understanding on how to use the interceptor we just have to take a look at the tests, since they pretty much explain the use a consumer of this method would give to the API.
In fact, it is so transparent that you just use it the same what you would use NHibernate's persistence. We just open a session and pass in the interceptor we are going to use. I normally do this at the request level, since I usually do session per request handling, but it will work any other way. You just need to make sure that the interceptor you are using has the right culture set or you may end up getting the wrong results.
The only difference shows up when you are going to store the localized values. In the examples I insert the values into the database before I run the integration test as their own entries. This may be just is fine for you, but if you go to the ugly section, you can read on a way in which to make that process just as transparent as the load.
By the way, if you are interested in how the testing is implemented and you come from the Java side of things, you may learn more on the subject by taking a look at this post on integration tests for your database using Java.
The Good
We totally decouple the localization logic from our domain models so that we can use them as we please without carrying any baggage. This leaves the door open to lots of possibilities.
This mechanism can be replicated with Hibernate for Java and possibly other ORMs.
The Bad
I didn't implement the localization insertion part. I hope I will do it sometime, but like I said, if you want to contribute, go ahead.
As said in the disclaimer at the beginning, this is not production ready code. I didn't do any error handling or checking for the entity's types. You won't be able to localize any non string properties.
There is no way to easily query the localized data by the localized fields. If you don't need sorting for this fields or something similar, the solution is OK. However, if you do, consider other options. I would look into indexing the localized content using Lucene or something like that, and working your search related cases from there.
The Ugly
Here are a couple of improvements that could be done and that could be easily added.
Catching and Pre-Catching
As described earlier I tried to use a composite so that I could take advantage of the second level cache. It would be more efficient to mix that with some pre-catching. For instance, executing a query to load all the localization messages for the entity on the first go and making sure the results get stored using the second level cache for queries.
Transparent persistence of the localization values
Although for the purpose of this post I didn't need to do it. The persistence for the localization values could also be done the same way as the load by implementing another method from the base interceptor class. Just setting the value for the property and saving the entity, could save the culture dependent message to the database transparently.
I leave that as an exercise to whomever wants to dig a little deeper. ;)
Cherry-picking fields to be localized
In this example I assumed I wanted to localize every property. I didn't check for entity property types. But in case we didn't want to run into performance problems we could cherry-pick which fields we would like to localize by adding a special attribute that the interceptor would look for to decide which properties to localize or not. I don't like this approach very much because it will force the attribute on the model, but there are other ways of doing the same thing without having to decorate our domain class directly. (Tip: Fluent NHibernate uses a similar approach)
Selecting values using business logic
Another interesting problem would be the fact that sometimes we want to localize but based on a specific application or culture related logic. For instance, with numerals we may like to show the pluralized version of a word. It is not the same to have 1 "message" in your inbox, that to have 2 "messages". This concept is tricky and may need some thought to get it right, but it doesn't really worries me since the localization code is totally decoupled from our models, which gives us a lot of freedom to work.
Again, that's all for now! Let me know id this post was helpful and see you soon with some more smelly code... with potatoes!
Showing posts with label architecture. Show all posts
Showing posts with label architecture. Show all posts
Sunday, April 15, 2012
Seamless database content localization with NHibernate
Labels:
architecture,
db,
dependency management,
fun,
hibernate,
integration tests,
localzization,
nhibernate,
oop,
open source,
proof of concept,
rantings,
software development,
tdd
Thursday, April 12, 2012
Integration tests for your database code
I hear a lot of people talking about tests and I have been to a couple of events where speakers have given presentations on the subject. Everyone talks about unit tests, and TDD and BDD, Continuous Integration. However, I don't know if you noticed but database-related integration testing is often overlooked, omitted, or briefly mentioned when talking about tests. It makes you wonder why. Doesn't it?
Why is that fear on the subject. Are database-related tests not needed?
Yes, they are. You need them because there are things you can't simply mock (stress and load tests to mention some). More than that, you may not have other options because you are dealing with legacy code and no time ($) to do a proper re-factor and unit testing. We need them because we should make sure our "whole" system works as expected.
But to be truthful, the main reason would be that database related testing sucks. Is difficult to get right. It is slow compared to other kinds of tests and if not implemented properly it could become a waste of time and a source of headaches.
Yet a lot of the logic we write in our software relies on certain preconditions and behaviors of the underlying persistence mechanism to be correct. Most of the times we think there is no way of testing those assumptions, than to actually use them. Using them ether by running the solution on the developers local system and database or an integration server or something similar.
Before you start screaming and writing me off your list I must say.
I don't couple my business logic with my persistence. What I meant is that parts of our logic rely on behaviors that we sometimes take for granted they will happen as we think. These are things like transactions management, when using Spring's Transactional attribute, or cascaded persistence, etc. all of which could fail at run-time.
Sadly, unit tests can't help us in this regard. It would seem that the only way of actually testing that this behaviors are what we expected, is to execute or deploy the application on a controlled environment. Or is it?
You already know what the answer is.... don't you?
the answer is...
42... :]
To explain it better, I am going to split the whole process into two scenarios or contexts and attack both in different ways.
Case one: Building Application from Scratch
Like Uncle Bob likes to say, there is nothing like the green field. That bast meadow where you first start to build your "architecturally sound" software. There is nothing there, no mess left behind by others, no constraints. It opens up the door to a lot of different opportunities (including creating a big mess). This is our first scenery. But before we start digging in on the hows and whys, some disclaimers are in order.
I am going to assume you know what and ORM is and that you are using one, and if you are not you have a pretty good reason not to. Either way I will explain what I usually do or would do when I find myself in both situations. That doesn't mean that this is the "best" or recommended way. It just means it is my preference. If you have your own ideas on how to improve the process or maybe a more efficient one: don't be shy and share!
Also, I am going to use Hibernate + JPA for the examples because a friend asked me to, but this would be easily extended to NHibernate, or other ORMs like the Entity Framework code first approach. If you ask for it on the comments I can extend or add new posts to include those too.
What do we want to achieve?
We want to test our persistence and query logic, usually located in the "DAO" layer of the application. Since we are good developers ;) we want these tests to be deterministic, self-verifiable, and order independent.
What do we need?
We need to setup a complete database environment, equal or similar to the one we are going to be using, and then populate it with test data, to be able to assert the behaviors in our test code. We will use Hibernate + JPA + HSQLDB + Spring 3.0.
First thing would be to configure Hibernate to recreate the database schema based on the mappings we have, every time it initializes for the integration tests. I will do this by initializing a new spring context and JPA persistence configuration, just for the tests.
Notice that I am also using an in memory database provider HSQLDB to make things a little less complex. However, this could be any other provider. You would just have to provide any connection details needed and to make sure you have the right permissions for the schema.
Since we are using Hibernate in Java we are going to take advantage of the functionality Hibernate gives us of executing an initialization script called import.sql after the schema update process during its own initialization. You can read about it here and here.
Another way of doing it would be to use DbUnit, of which I will talk more in the next example case.
So now we have our import.sql file ready to be executed at the context's initialization. Here you would place your test's data as a set of insert statements. This will allow us to initialize the schema created by Hibernate from your model with the tests data just before the context is accessible to the tests
As you need more tests and the schema evolves, you will extend and update this scripts.
And that's all! Now you can start writing your database tests.
The good
One of the good parts and the one I particularly like the most, is that the database schema doesn't need to be stored as a script in the source control. Instead it will be stored with the code, since the actual model that you are using is the blueprint used by Hibernate to create the database schema. So any changes in the mapping reflect on your integration tests as soon as the context is initialized and developers don't have to write down migration scripts for the changes which is error prone and boring.
This also means if you change database providers for instance, from MSSQL to Oracle, Hibernate will be the one creating the script for the creation of the schema, and it will be using the dialect of your choice. This is particularly useful when you still haven't made any decisions on what the underlying persistence support will be.
The bad
The main problem I see with this approach is that it only works if you don't have previous data that you need to maintain. But if you do, there is no way (that I know of) to use this behavior on the "update" mode and also keep the "migration" done by Hibernate.
Also, if the test data gets too big there is no way to split the script so that it would be more manageable using some kind of "import" directives. However, I think in version 3.6.10 > of Hibernate you can set the scripts to load for each persistence unit using a configuration property, or through code. You can find more info about it in this Stack Overflow thread and in this Spring Forum thread.
By the way, there seems to be a "bug" with the JpaVendorAdapter that I use in this type of set up, which sets the provider to "update". To resolve this issue just set the generateDdl flag to false. You can find more info on the reasons here.
The Ugly
Integration tests are slow. This solution needs you to instantiate the context and database for each set of tests. It only initializes the context once on each test suite. However, although this is faster than doing it for each test, you may run into order dependency problems among tests because of the data, if you don't write your tests properly. Use wisely.
You can still set it up so that each tests executes with a clean db setup. A solution would be to regenerate the schema an repopulate the tables with data for each test using the SchemaExporter class. I leave that as an exercise to the dear reader ;)
Case two: Building from a Legacy Database System
Now, take into consideration that once you release a version of this application, you are actually moving into building on top of a legacy system.
In this case you already have a database, it may be from a previous version or from a totally unrelated product. Whatever the case is, I am assuming you can't loose data or in some cases modify the database schema. The create approach in Hibernate won't be of much help here. What do we do then?
Well, in this case we are going to need something else. Here is where DbUnit comes to the rescue.
There are other solutions like the SQL Ant task that executes SQL scripts against the db before you run your tests, but I like dbunit better because I can put the database initialization and finalization into my tests, per test, instead of at build time. Also in most cases I can escape having to write the SQL insert statements letting DbUnit in charge of the dirty bits of generating them.
Whats different from this approach?
In this case, we are going to get a little more control over what is happening on the db. We will still insert the sample data, just that this time, it will be DbUnit the one doing it. To do that, we first need to put this information in and xml file format that DbUnit will understand, so that it may be able to dump it into the db when we tell it to.
The format is pretty simple. All records are hanging from a dataset root object and the name of the node will be the name of the table where you want to insert this information. For specific column values you use the attributes of the node. The actual content to put in the column is the value of the attribute.
The Good
We have per test database setup. This is faster than the method we already talked about but, you still need to do a little cleanup afterwards. A way to avoid this would be to execute each test under a transaction and at the end of the test just roll the transaction back. However, this may not always be possible.
We also get a "semi-raw" code based access to the database for verification. In this sense you get more control over the verification of what is really happening. In case you didn't notice when I was testing the insertion in the first method, I used the findUserById method thus relaying on my own code. While for unit tests this may not be a problem as long as the code you are using has its own unit tests, when it comes to db integration tests I wouldn't recommended it. The reason being that you could fall into the trap of, for instance, thinking that your insertions are working when in fact they could just be cached by the underlying persistence mechanism. Be aware of it.
Finally, you can have both methods working side by side and use them as needed.
The Bad
While DbUnit will insert the data for you, it won't create the schema from the information we give it (not that it could). That's why we are still using the "update" mode with our persistence.
There are also some funny things about the way data gets cleared because it relies on the order the records where inserted. I remember this particular dog biting me some time ago.
The Ugly
DbUnit seems like it has not noticed the rest of the world has changed. By this I mean that you end up writing lots of boilerplate code for your tests that could have been avoided with some properly coded annotations. This is typical of the JUnit 3.X style of tests where you would need to inherit from a TestCase class or something similar. Not cool. However, if you refactor your tests, (like you do, right?) you will end up with little or no repetition.
This is all for now. You can see the full code at this repository on github.
Keep tuned for some more smelly code.. with potatoes on the side soon and leave some feedback if you want to help me improve the quality of these posts ;)
Update 14/04/2012: I renamed the github repository.
Why is that fear on the subject. Are database-related tests not needed?
Yes, they are. You need them because there are things you can't simply mock (stress and load tests to mention some). More than that, you may not have other options because you are dealing with legacy code and no time ($) to do a proper re-factor and unit testing. We need them because we should make sure our "whole" system works as expected.
But to be truthful, the main reason would be that database related testing sucks. Is difficult to get right. It is slow compared to other kinds of tests and if not implemented properly it could become a waste of time and a source of headaches.
Yet a lot of the logic we write in our software relies on certain preconditions and behaviors of the underlying persistence mechanism to be correct. Most of the times we think there is no way of testing those assumptions, than to actually use them. Using them ether by running the solution on the developers local system and database or an integration server or something similar.
Before you start screaming and writing me off your list I must say.
I don't couple my business logic with my persistence. What I meant is that parts of our logic rely on behaviors that we sometimes take for granted they will happen as we think. These are things like transactions management, when using Spring's Transactional attribute, or cascaded persistence, etc. all of which could fail at run-time.
Sadly, unit tests can't help us in this regard. It would seem that the only way of actually testing that this behaviors are what we expected, is to execute or deploy the application on a controlled environment. Or is it?
You already know what the answer is.... don't you?
the answer is...
42... :]
To explain it better, I am going to split the whole process into two scenarios or contexts and attack both in different ways.
Case one: Building Application from Scratch
Like Uncle Bob likes to say, there is nothing like the green field. That bast meadow where you first start to build your "architecturally sound" software. There is nothing there, no mess left behind by others, no constraints. It opens up the door to a lot of different opportunities (including creating a big mess). This is our first scenery. But before we start digging in on the hows and whys, some disclaimers are in order.
I am going to assume you know what and ORM is and that you are using one, and if you are not you have a pretty good reason not to. Either way I will explain what I usually do or would do when I find myself in both situations. That doesn't mean that this is the "best" or recommended way. It just means it is my preference. If you have your own ideas on how to improve the process or maybe a more efficient one: don't be shy and share!
Also, I am going to use Hibernate + JPA for the examples because a friend asked me to, but this would be easily extended to NHibernate, or other ORMs like the Entity Framework code first approach. If you ask for it on the comments I can extend or add new posts to include those too.
What do we want to achieve?
We want to test our persistence and query logic, usually located in the "DAO" layer of the application. Since we are good developers ;) we want these tests to be deterministic, self-verifiable, and order independent.
What do we need?
We need to setup a complete database environment, equal or similar to the one we are going to be using, and then populate it with test data, to be able to assert the behaviors in our test code. We will use Hibernate + JPA + HSQLDB + Spring 3.0.
First thing would be to configure Hibernate to recreate the database schema based on the mappings we have, every time it initializes for the integration tests. I will do this by initializing a new spring context and JPA persistence configuration, just for the tests.
Notice that I am also using an in memory database provider HSQLDB to make things a little less complex. However, this could be any other provider. You would just have to provide any connection details needed and to make sure you have the right permissions for the schema.
Since we are using Hibernate in Java we are going to take advantage of the functionality Hibernate gives us of executing an initialization script called import.sql after the schema update process during its own initialization. You can read about it here and here.
Another way of doing it would be to use DbUnit, of which I will talk more in the next example case.
So now we have our import.sql file ready to be executed at the context's initialization. Here you would place your test's data as a set of insert statements. This will allow us to initialize the schema created by Hibernate from your model with the tests data just before the context is accessible to the tests
As you need more tests and the schema evolves, you will extend and update this scripts.
And that's all! Now you can start writing your database tests.
The good
One of the good parts and the one I particularly like the most, is that the database schema doesn't need to be stored as a script in the source control. Instead it will be stored with the code, since the actual model that you are using is the blueprint used by Hibernate to create the database schema. So any changes in the mapping reflect on your integration tests as soon as the context is initialized and developers don't have to write down migration scripts for the changes which is error prone and boring.
This also means if you change database providers for instance, from MSSQL to Oracle, Hibernate will be the one creating the script for the creation of the schema, and it will be using the dialect of your choice. This is particularly useful when you still haven't made any decisions on what the underlying persistence support will be.
The bad
The main problem I see with this approach is that it only works if you don't have previous data that you need to maintain. But if you do, there is no way (that I know of) to use this behavior on the "update" mode and also keep the "migration" done by Hibernate.
Also, if the test data gets too big there is no way to split the script so that it would be more manageable using some kind of "import" directives. However, I think in version 3.6.10 > of Hibernate you can set the scripts to load for each persistence unit using a configuration property, or through code. You can find more info about it in this Stack Overflow thread and in this Spring Forum thread.
By the way, there seems to be a "bug" with the JpaVendorAdapter that I use in this type of set up, which sets the provider to "update". To resolve this issue just set the generateDdl flag to false. You can find more info on the reasons here.
The Ugly
Integration tests are slow. This solution needs you to instantiate the context and database for each set of tests. It only initializes the context once on each test suite. However, although this is faster than doing it for each test, you may run into order dependency problems among tests because of the data, if you don't write your tests properly. Use wisely.
You can still set it up so that each tests executes with a clean db setup. A solution would be to regenerate the schema an repopulate the tables with data for each test using the SchemaExporter class. I leave that as an exercise to the dear reader ;)
Case two: Building from a Legacy Database System
Now, take into consideration that once you release a version of this application, you are actually moving into building on top of a legacy system.
In this case you already have a database, it may be from a previous version or from a totally unrelated product. Whatever the case is, I am assuming you can't loose data or in some cases modify the database schema. The create approach in Hibernate won't be of much help here. What do we do then?
Well, in this case we are going to need something else. Here is where DbUnit comes to the rescue.
There are other solutions like the SQL Ant task that executes SQL scripts against the db before you run your tests, but I like dbunit better because I can put the database initialization and finalization into my tests, per test, instead of at build time. Also in most cases I can escape having to write the SQL insert statements letting DbUnit in charge of the dirty bits of generating them.
Whats different from this approach?
In this case, we are going to get a little more control over what is happening on the db. We will still insert the sample data, just that this time, it will be DbUnit the one doing it. To do that, we first need to put this information in and xml file format that DbUnit will understand, so that it may be able to dump it into the db when we tell it to.
The format is pretty simple. All records are hanging from a dataset root object and the name of the node will be the name of the table where you want to insert this information. For specific column values you use the attributes of the node. The actual content to put in the column is the value of the attribute.
The Good
We have per test database setup. This is faster than the method we already talked about but, you still need to do a little cleanup afterwards. A way to avoid this would be to execute each test under a transaction and at the end of the test just roll the transaction back. However, this may not always be possible.
We also get a "semi-raw" code based access to the database for verification. In this sense you get more control over the verification of what is really happening. In case you didn't notice when I was testing the insertion in the first method, I used the findUserById method thus relaying on my own code. While for unit tests this may not be a problem as long as the code you are using has its own unit tests, when it comes to db integration tests I wouldn't recommended it. The reason being that you could fall into the trap of, for instance, thinking that your insertions are working when in fact they could just be cached by the underlying persistence mechanism. Be aware of it.
Finally, you can have both methods working side by side and use them as needed.
The Bad
While DbUnit will insert the data for you, it won't create the schema from the information we give it (not that it could). That's why we are still using the "update" mode with our persistence.
There are also some funny things about the way data gets cleared because it relies on the order the records where inserted. I remember this particular dog biting me some time ago.
The Ugly
DbUnit seems like it has not noticed the rest of the world has changed. By this I mean that you end up writing lots of boilerplate code for your tests that could have been avoided with some properly coded annotations. This is typical of the JUnit 3.X style of tests where you would need to inherit from a TestCase class or something similar. Not cool. However, if you refactor your tests, (like you do, right?) you will end up with little or no repetition.
This is all for now. You can see the full code at this repository on github.
Keep tuned for some more smelly code.. with potatoes on the side soon and leave some feedback if you want to help me improve the quality of these posts ;)
Update 14/04/2012: I renamed the github repository.
Labels:
agile,
architecture,
code smell,
continuous integration,
db,
fun,
integration tests,
open source,
tdd,
unit tests
Sunday, March 18, 2012
Not all TDDs are created equal.
There is little or nothing that has not been said about TDD. As you probably know if you are a practitioner, there are lots of articles describing step by step the whole process behind it and arguably also the state of mind that TDD aims to promote. However, there is something that I find again and again when I go to Coding Dojos, events like the Global Day of Code Retreat, or even when I am pair programming or mentoring someone.
There seems to be, at least from what I have seen, two main ways to approach TDD. I'm going to call them Holistic and Reductionist approaches. Both of these have their advantages and disadvantages and since it is something that comes up a lot with beginners and seasoned professionals alike it seemed like a good idea to pay some attention to it and write this post.
So in order to understand it better and just like when you go to practice yoga or music or anything else you want to get really good at, I started paying attention to the way I was doing things. Particularly, behavior and thought processes I followed when practicing TDD from both of these perspectives.
Note: The rest of the post addresses a lot of abstract concepts, and I'm terrible at explaining myself. If you feel like this post would benefit from more examples, or needs some editing, let me know and I will add them or change it.
The Holistic approach.
What do I mean by "Holistic approach"? Holism is described as the idea that natural systems should be seen and studied as a whole and not as a collection of their parts. The underlying principle is that the whole is not just equal to the sum of its parts.
This is mainly a top to bottom approach when dealing with solving the issue.
When we normally take this approach, what usually happens is that you tend to focus on solving the problem at hand directly instead of the sub problems.
With this approach you think of things like: outputting all prime numbers in the sequence of the n first natural numbers starting at 2, instead of thinking first of solving how to finding out if a given number is prime or not, or outputting all numbers which satisfy a condition to the screen.
What I have seen is that since you tend to focus on the functionality and results of the principal issue instead of the sub problems, you get results that you can use faster. Most of the time they are partial or incomplete results, but the bottom line is you get more "deliverable" value from the start. Things like: I can create a user, but the process doesn't check all fields that need to be validated, etc.
However, as a side effect, what usually happens is that one tends to add "extras" to the code to help him test. I am talking about things like object properties to inspect inner state in your unit tests, having virtual methods that could be marked as final because you need to override them to test, etc.
Another common side effect that I have detected is, that the frequent mantra of "the emerging architecture", although still true, usually happens at the end, when you think you have a solid grip over your main problem and then you start refactoring your code.
The inconvenience I see with this is, that if you don't refactor as aggressively and often as you should, this "architecture" may not emerge at all, or it may be deficient. I am not sure if this is good or not, but I personally like to have at least "some" architecture than no architecture, or an "emerging" architecture. Mainly because in both cases it could turn out to be a recipe for disaster if everyone on your team is not on the same page. Like when you have a junior developer on your team or not everyone has the same skill level regarding TDD or refactorization.
One last thing is that as you are developing, you tend to prevent your target problem's corner cases, but not the sub problems' corner cases. This may lead to other bugs when the functionality gets refactored out or when it gets used by others, since they will tend to assume that it works as expected, not just in the context of the main issue, but in the context of the sub problem.
To give an idea, following the prime numbers example, if another coder wanted to use your is_prime() function, it would be reasonable for him to expect false when it gets passed a 0 or 1, but if that wasn't part of the main problem's constraints, the developer may not have considered those exceptions, leading in turn to new bugs.
The Reductionist approach.
Reductionism on the other side is sometimes considered the opposite of Holism. In this case you would try to understand the whole by looking at its parts and their interactions. Inherently this is a bottom up approach when doing TDD.
You focus on sub-problems of the problem and when you have solved them, you focus on the interactions. Finally, you refine the interactions so that the results combined provide the solution you are looking for.
Contrary to holism, in this case, architecture emerges from the start, driven by the necessity to decouple particular sub problems. For me this approach makes it easy to loose focus on what your general aim is. It requires a great deal of self control and experience to know when is it enough and you should move on and continue with another area.
However, code gets tested more thoroughly. Corner cases for each sub problem are more evident and they tend to be addressed from the start. So the problems found when the clients of your API made assumptions about how the code worked diminish a great deal.
Nevertheless, focusing on the sub problems usually narrows your thinking. As a consequence you need to do more integration tests in order to validate sub parts and the way they are supposed to be used. You tend to write a more developer-friendly API because you are always thinking as the client of your API when you are writing the unit tests.
This way of doing things seems to work better when you happen to have a roadmap or well-defined plan of how the pieces fit together. For instance, when you are implementing algorithms, which are usually decomposed in very specific sub problems.
Conclusions?
Well, this is most of what I have observed myself and with a lot of input from the conversations I have had with friends. In the end, to me at least, no one method is applicable to every problem. In my day to day job I usually jump from one mode to the other depending on how it "feels". I'm very interested in hearing what others have to say about this subject so please leave a comment below.
Until next time and happy coding!!
Labels:
agile,
architecture,
code smell,
coding dojo,
kata,
rantings,
tdd
Thursday, February 9, 2012
Video Compilations - Agile Practices
Ever since I heard of agile I have been interested in it, not just because of the school of thought behind it but also because being the lazy programmer that I am, anything that helps me and the team out to be more productive is always welcome.
Last night I was going through some posts on G+ and I came across a video of a presentation by Jon Skeet on Skills Matter and along side there were a couple of other presentations on the subject of agile. So, as it often happens when I am in the learning mood I could not just stop there and I had to watch them all. Which gave me the idea of starting a series of posts where I would link to conferences, presentations, etc. on different subjects.
So here is the first of these posts. This time the subject is Agile Development Practices and it touches as much as the theoretical side as the practical side. Take a look and let me know if you find them useful.
Without further delay, here is the list and until next time!
1. Shock therapy as a strategy for booting up teams.
2. The Cosmic Stopping Problem, otherwise known as the choice uncertainty principle.
3. Punctuated equilibrium - how software systems evolve
Take advantage of these concepts and you may find a way to achieve the ultimate potential of a team. This session will be a "Deep Agile" presentation keying off topics presented to engineers at MIT.
However, with the rising popularity of agile, mainly due to lack of experience or management over-expecting results,in coming years many agile projects will fail miserably. Agile is not the silver bullet. In his enthusiastic style, speaker Sander Hoogendoorn, global agile thought leader at Capgemini and involved in agile projects since the mid-nineties, demonstrates the differences in traditional and agile projects, and shows why agile projects will fail – independent of the process used.
Sander elaborates on a series of agile anti-patterns that people will recognize immediately. Think of the Scrumdamentalist, Agile-In-Name-Only, the Pseudo-Iteration, Guesstimation, the Bob-the-Builder Syndrome, Parkinson’s Law, the Agile Project Manager and Student Syndrome – of course, with many embarrassing examples and anecdotes from real-life projects.
Last night I was going through some posts on G+ and I came across a video of a presentation by Jon Skeet on Skills Matter and along side there were a couple of other presentations on the subject of agile. So, as it often happens when I am in the learning mood I could not just stop there and I had to watch them all. Which gave me the idea of starting a series of posts where I would link to conferences, presentations, etc. on different subjects.
So here is the first of these posts. This time the subject is Agile Development Practices and it touches as much as the theoretical side as the practical side. Take a look and let me know if you find them useful.
Without further delay, here is the list and until next time!
The Agile Buffet
Scrum,Feature Driven Development,extreme programming, DSDM, Test Driven Development, Business Driven Development, Kanban - wow,lots to choose from. Why choose one when you can take the best from all of them? Let's talk about how to identify the best aspects of different methodologies and how you can work with them.
Kanban and Scrum - making the most of both
There's a lot of buzz on Kanban right now in the agile software development community. Since Scrum has become quite mainstream now, a common question is "so what is Kanban, and how does it compare to Scrum?" Let's clear up the fog. What are these things? Where do they complement each other? Are there any potential conflicts? The purpose of this session is to clarify Kanban and Scrum by comparing them, so you can figure out how these may come to use in your environment.Visual Management for Agile teams
Join Visual Management blog author Xavier Quesada Allue as he explains basic patterns and introduces dozens of original ideas for building great task boards and visually managing your work and that of your teams.Self-Organization: The Secret Sauce for Improving your Scrum team
High performance depends on the self-organizing capability of teams. Understanding how this works and how to avoid destroying self-organization is a challenge. Until you understand complex adaptive systems and how Toyota works it is difficult to improve team velocity. Jeff will discuss three core topics:1. Shock therapy as a strategy for booting up teams.
2. The Cosmic Stopping Problem, otherwise known as the choice uncertainty principle.
3. Punctuated equilibrium - how software systems evolve
Take advantage of these concepts and you may find a way to achieve the ultimate potential of a team. This session will be a "Deep Agile" presentation keying off topics presented to engineers at MIT.
Distributed Agile Development
Most agile methodologies tend to assume that the team is co-located in a single team room. They give little guidance as to how to address team distribution although proven practices are starting to emerge within the community. The Microsoft patterns & practices team has been experimenting with distributed teams for several years, mining proven practices from the community and experimenting them out on numerous agile projects. This talk summarizes those learning and proven practices and gives examples of their application – both good and bad – within their teams.Scrum Tuning: Lessons learned from Scrum implementation...
Adwords introduced a Scrum implementation at Google in small steps with remarkable success. As presented at the Agile 2006 conference this exemplifies a great way to start up Scrum teams. The inventor and Co-Creator of Scrum will use this approach in building the Google Scrum implementation to describe some of the subtle aspects of Scrum along with suggested next steps that can help in distributing and scaling Scrum in a "Googly way".Continuous Integration and Continuous Deployment
As software developers, we face a risky, time-consuming and painful process in delivering software. The solution the delivery of software continuously through build, test and deployment automation. This session will talk about how we can move from CI to continuous delivery. It will also help to distinguish between CI and continuous deployment.Emergent Architecture
Agile software development emphasizes that some increment of business value be delivered every iteration. How can this happen when your iterations are two weeks in length and you estimate it will take you two months just to design the database and the access layers? The answer is to think in differently.Agile Anti-Patterns!
The popularity of agile software development processes and methodologies is imminent and fast growing. Many organizations and projects turn towards agile to help solve the problems of traditional software development. Scrum,extreme programming,test driven development,and lean are no longer the new kids on the block.However, with the rising popularity of agile, mainly due to lack of experience or management over-expecting results,in coming years many agile projects will fail miserably. Agile is not the silver bullet. In his enthusiastic style, speaker Sander Hoogendoorn, global agile thought leader at Capgemini and involved in agile projects since the mid-nineties, demonstrates the differences in traditional and agile projects, and shows why agile projects will fail – independent of the process used.
Sander elaborates on a series of agile anti-patterns that people will recognize immediately. Think of the Scrumdamentalist, Agile-In-Name-Only, the Pseudo-Iteration, Guesstimation, the Bob-the-Builder Syndrome, Parkinson’s Law, the Agile Project Manager and Student Syndrome – of course, with many embarrassing examples and anecdotes from real-life projects.
Google Tech Talks by Misko Hevery about Testing and Refactorization
This is set of talks given by Misko Hevery at Google Tech Talks. It is definitely worth watching since it touches on the key principles of why we write code that we can't test and how to avoid or re-factor this mistakes out.
Labels:
agile,
architecture,
distributed teams,
oop,
resources,
scrum,
tdd,
videos
Location:
Barcelona, Spain
Wednesday, February 8, 2012
Where does bad code come from?
Let me tell you a story. There is this developer. Who could be you or me, or any other of the thousands out there. He goes to work every day, not because he has to, but because he wants to, because he loves what he does. But, no matter what, there is always something that prevents him from enjoying his day fully: bad code.
Yes, that putrid, stinking amorphous "thing", that after a while if you are anything like me, makes you want to cry. The subject of probably more than one song and definitely, the cause of a great deal of the headaches developers suffer.
But... you have to wonder how is that bad code comes to be? Where does it come from? How is it that even if code is not organic, if its not a piece of meat, it starts degrading and rotting like one.
I know, I known... someone else wrote it. The bad code I mean. ;) But... What is that the other guy does, or doesn't do to make it rot?
I think the answer is simple. Writing software, and particularly software design is about how you structure your software. What goes where, who talks to who, which parts of your code do you allow to do what, etc. That is pretty hard to do properly most of the time. There is of course, a lot more into it. I am for the brevity of this post over simplifying it. But I personally think that is one of the main concepts behind it. It is also, the main reason for bad code to exist: dependency management. Or to say it properly: bad dependency management. You don't see it? Keep reading.
So... What goes wrong with software development?
Let's go back to our developer friend. Let's say he starts a new project, and that he has all his requirements ready! What does he do?
He goes and draws these big diagrams. Because who doesn't like drawing? Right?. He creates this huge representation of what he is going to do, maybe database schema, gantt chart, etc. When he is finished, and after a few minutes of contemplating his magnificent work of art, showing it to everyone, etc. he finally sets of to write the code.
The code then flows like a river. It just does! Lines and lines of code come out until its finished! Then he ships it and that's it! He lives happily ever after!
Ok. In the words of Orson Welles: If you want a happy ending, that depends, of course, on where you stop your story.
Yes, sadly, we know that's not the whole story. Because, after shipping ,what actually happens is that people start using this code. Not only that. They start requesting changes. What? Then some more changes. Then even more changes, and after that.. You guessed it right!... more changes! Finally, the poor guy who is developing it just decides to end it all and kills himself.
Why did this happen?
Well, this developer didn't manage his dependencies so the code started to putrefy. It became rigid. It became ugly, hard to maintain, hard to modify, hard to fix and easy to break. It probably was not even easy to test. It started rotting, to have a mind of its own and at one point the stench was so bad he had no other choice but to run away from it. Creator running away from its creation like a Frankenstein.
Its not the changes that created the problem, but the way he dealt with them. I mean, who has not written one of those famous patches, just to annotate it with a big TODO, o FIXME. Always with the hope that, some day, when you have a time, you will come back later and do some house keeping. We know that day, is yet to come.
This concept is not new. Not at all! In fact, I am positive most developers and projects that I know of, suffer from some kind of technical debt. You don't (or at least I don't) write perfect code from the start most of the time. Its a process! An evolution. It is a part of the software development processes to create that technical debt, and to repay it. When, and how you decide to do it, is what makes the difference.
Will you pay it at the end, with all its interest and let it ether drive the project to the ground, or in many cases, constraint development and probably dictate the architecture. Or will you chose to repay it as you go, and keep your architecture evolving and open to as many possibilities as you can. In any case, if you are not good on regularly doing your house keeping, at least, you can train your nose to detect early that something needs two minutes of your time.
Which brings me to my next point.
How do you know your code is starting to decay?
Answer: When you feel the smell.
Doing some research, I found that the main code smells have been already classified by Robert C. Martin. If you don't know who "Uncle Bob" is, and what he has done, you go Google it! Now! Well.. not now, first finish reading my post... but I mean it. ;)
These groups in which he classified code smells are:
Rigidity
If your code starts getting stiff, like a dead body, it may as well be. That means, if you can't take care of that little thread that is showing up out of place, without unweaving the whole t-shirt, you have a problem. By that I mean, if you have a bug that in order to fix it you have to go around the whole code base touching stuff, your code is definitely rigid.
Fragility:
Have you ever had a system that got broken by the most unexpected and unrelated thing ever. Kind of, the butterfly effect. A butterfly flaps its wings half a world away and a tsunami ends up destroying your house.
For example, miss-spelled magic strings. The typical case of the new guy that sees one of those magic strings and can't resist the temptation to correct the spelling mistake. But it turns out that mistake is hard coded on views and probably even 3rd party systems, and after a while you find out half the application stopped working. Yeah... I've been there...
Mobility:
How much of your code can you reuse? If you wanted to rebuild your code using a different MVC framework, or maybe move it into a desktop application. How much of it would you have to write again? If your answer is, almost all of it, you have a mobility (re-usability) problem. In fact you probably have an architectural problem, but that is another story.
Viscosity:
We developers, are lazy. That's a fact. When something is tedious and boring to do, but we "have" to do it, chances are that we will try to find ways not to. This particular code smell type, comes in two flavors.
What I mean is, when faced with two options, one that solves the problem and maintains the architecture, and one that does not, if the proper solution is much harder to implement, developers will not tend to use it. There you go, sticky everywhere, the view accessing the database, or the model that knows about the services. Uncle Bob call's this kind of viscosity: by design.
On the same line if there are several ways of making a change but the one that is not a hack, includes a 2 hours build. People will use the hacks. They will try to find ways in which to avoid that 2 hours build, and you would be surprised how inventive some people can be. In this particular case where the development environment is the one conspiring against you, you have an environmental viscosity problem.
I think we can agree that a 2 hours build and people committing changes where they compromise the architecture are problems on their own. The important part here is to notice how subtle things directly affect the way you manage your dependencies, which in turn, if not addressed properly, can, and will deteriorate the quality of the code base.
Anyways... In subsequent posts I am going to be talking a lot more on this subject and how can you actually hone your skills at managing your dependencies so that this code smells eventually get reduced to a minimum. I will also be talking about SOLID Software Design Principles and lots of other stuff like re factorization of legacy code and unit testing.
Keep tuned!
Yes, that putrid, stinking amorphous "thing", that after a while if you are anything like me, makes you want to cry. The subject of probably more than one song and definitely, the cause of a great deal of the headaches developers suffer.
But... you have to wonder how is that bad code comes to be? Where does it come from? How is it that even if code is not organic, if its not a piece of meat, it starts degrading and rotting like one.
I know, I known... someone else wrote it. The bad code I mean. ;) But... What is that the other guy does, or doesn't do to make it rot?
I think the answer is simple. Writing software, and particularly software design is about how you structure your software. What goes where, who talks to who, which parts of your code do you allow to do what, etc. That is pretty hard to do properly most of the time. There is of course, a lot more into it. I am for the brevity of this post over simplifying it. But I personally think that is one of the main concepts behind it. It is also, the main reason for bad code to exist: dependency management. Or to say it properly: bad dependency management. You don't see it? Keep reading.
So... What goes wrong with software development?
Let's go back to our developer friend. Let's say he starts a new project, and that he has all his requirements ready! What does he do?
He goes and draws these big diagrams. Because who doesn't like drawing? Right?. He creates this huge representation of what he is going to do, maybe database schema, gantt chart, etc. When he is finished, and after a few minutes of contemplating his magnificent work of art, showing it to everyone, etc. he finally sets of to write the code.
The code then flows like a river. It just does! Lines and lines of code come out until its finished! Then he ships it and that's it! He lives happily ever after!
Ok. In the words of Orson Welles: If you want a happy ending, that depends, of course, on where you stop your story.
Yes, sadly, we know that's not the whole story. Because, after shipping ,what actually happens is that people start using this code. Not only that. They start requesting changes. What? Then some more changes. Then even more changes, and after that.. You guessed it right!... more changes! Finally, the poor guy who is developing it just decides to end it all and kills himself.
Why did this happen?
Well, this developer didn't manage his dependencies so the code started to putrefy. It became rigid. It became ugly, hard to maintain, hard to modify, hard to fix and easy to break. It probably was not even easy to test. It started rotting, to have a mind of its own and at one point the stench was so bad he had no other choice but to run away from it. Creator running away from its creation like a Frankenstein.
Its not the changes that created the problem, but the way he dealt with them. I mean, who has not written one of those famous patches, just to annotate it with a big TODO, o FIXME. Always with the hope that, some day, when you have a time, you will come back later and do some house keeping. We know that day, is yet to come.
This concept is not new. Not at all! In fact, I am positive most developers and projects that I know of, suffer from some kind of technical debt. You don't (or at least I don't) write perfect code from the start most of the time. Its a process! An evolution. It is a part of the software development processes to create that technical debt, and to repay it. When, and how you decide to do it, is what makes the difference.
Will you pay it at the end, with all its interest and let it ether drive the project to the ground, or in many cases, constraint development and probably dictate the architecture. Or will you chose to repay it as you go, and keep your architecture evolving and open to as many possibilities as you can. In any case, if you are not good on regularly doing your house keeping, at least, you can train your nose to detect early that something needs two minutes of your time.
Which brings me to my next point.
How do you know your code is starting to decay?
Answer: When you feel the smell.
Doing some research, I found that the main code smells have been already classified by Robert C. Martin. If you don't know who "Uncle Bob" is, and what he has done, you go Google it! Now! Well.. not now, first finish reading my post... but I mean it. ;)
These groups in which he classified code smells are:
Rigidity
If your code starts getting stiff, like a dead body, it may as well be. That means, if you can't take care of that little thread that is showing up out of place, without unweaving the whole t-shirt, you have a problem. By that I mean, if you have a bug that in order to fix it you have to go around the whole code base touching stuff, your code is definitely rigid.
Fragility:
Have you ever had a system that got broken by the most unexpected and unrelated thing ever. Kind of, the butterfly effect. A butterfly flaps its wings half a world away and a tsunami ends up destroying your house.
For example, miss-spelled magic strings. The typical case of the new guy that sees one of those magic strings and can't resist the temptation to correct the spelling mistake. But it turns out that mistake is hard coded on views and probably even 3rd party systems, and after a while you find out half the application stopped working. Yeah... I've been there...
Mobility:
How much of your code can you reuse? If you wanted to rebuild your code using a different MVC framework, or maybe move it into a desktop application. How much of it would you have to write again? If your answer is, almost all of it, you have a mobility (re-usability) problem. In fact you probably have an architectural problem, but that is another story.
Viscosity:
We developers, are lazy. That's a fact. When something is tedious and boring to do, but we "have" to do it, chances are that we will try to find ways not to. This particular code smell type, comes in two flavors.
What I mean is, when faced with two options, one that solves the problem and maintains the architecture, and one that does not, if the proper solution is much harder to implement, developers will not tend to use it. There you go, sticky everywhere, the view accessing the database, or the model that knows about the services. Uncle Bob call's this kind of viscosity: by design.
On the same line if there are several ways of making a change but the one that is not a hack, includes a 2 hours build. People will use the hacks. They will try to find ways in which to avoid that 2 hours build, and you would be surprised how inventive some people can be. In this particular case where the development environment is the one conspiring against you, you have an environmental viscosity problem.
I think we can agree that a 2 hours build and people committing changes where they compromise the architecture are problems on their own. The important part here is to notice how subtle things directly affect the way you manage your dependencies, which in turn, if not addressed properly, can, and will deteriorate the quality of the code base.
Anyways... In subsequent posts I am going to be talking a lot more on this subject and how can you actually hone your skills at managing your dependencies so that this code smells eventually get reduced to a minimum. I will also be talking about SOLID Software Design Principles and lots of other stuff like re factorization of legacy code and unit testing.
Keep tuned!
Labels:
architecture,
code smell,
dependency management,
oop,
solid,
uncle bob
Location:
Barcelona, Spain
Subscribe to:
Posts (Atom)