You duplicating? You wrong…

This is not the best example for the issue, but here we go: imagine your are going to sweep your house, and that you have one of those modern brooms that can be disassembled to fit into small broom cupboards. When you start, let’s say in the kitchen, you take the broom from the cupboard, assemble it, and start sweeping. When you are done with the kitchen, you don’t disassemble it and place it into the cupboard, because you have other rooms left, do you? You would feel stupid assembling and disassembling the broom for each room, wouldn’t you?
Then why you do that when you write software?

First, some misprints of the writer

Thanks to M. Barnett for pointing out the “mounting and dismounting broom” issue. It seems that I wrote this post too fast (or too late), and I didn’t notice that “mounting a broom”, in English is more something like this:

So I changed it to “assemble” and “disassemble” which is what I meant to write the first time, I promise… ;)

Intro

This post is about some very basic thoughts, about engineering behaviors students tend to forget… It might sound obvious for experienced people, so if you are one of them, better go get a beer with friends instead of reading it. So, back to what students forget. Among others, one of the most important things to keep in mind, one that will always guide you through the right path is:
DUPLICATION IS THE SOURCE OF ALL EVIL IN THE WORLD
And this applies not only to software development, but also to content management, organization and engineering in general. Damn! it even applies when you tiny up a wardrobe or clean the kitchen, as we saw in the preface.

Why is this misbehavior so usual in software?

Easy. Assembling a broom and taking it to the cupboard takes time, and therefore it is trivial for any dumb-ass to see that’s a waste of time. In software, many times it takes less time duplicating things than thinking how to avoid the duplication. It’s just a matter of Copy-Paste, when avoiding duplication can take weeks of work. That’s why duplication appears all around in bad software.

The student counter part

At this point, it’s when students tend to say:
- <<Weeks! then it is indeed better to duplicate code than loosing weeks>>
And then you play the role of the old, wise grandpa, telling them:
- <<Believe me, it’s better this way. You will thank it, when you grow older…>>

Example 1: Duplication of work - Duplication of decisions

Chronicles of a bad developer

  • Company “A”
  • Project “B”
  • You are the “Bad developer #1”
  • A simple operation: obtain the concatenated name of a user, composed as: “Name” + “Second Name” + “Last Name”
It’s just one line of code, so you come to the conclusion that it’s not worthy to think much about it. So, you leave it like that and copy-paste the line when you need it.
Some months later, project “B” has grown, becoming “big”, with let’s say 150 sub-projects. Your famous name composition operation is scattered all around, in more than 1500 places. Now imagine that the worst happens (the worst always happens), and that your boss demands a change in the way you are concatenating names. Let’s say, he wants it in the form: “Last Name, First Name”, instead of the previous. You are screwed, easy as that.
As you are a bad bad developer, first thing you say is:
- <<Boss, are you sure we need that change? I like names in the way they are right now>>
- <<Our clients don’t>> He replies…
No prob, you are a bad, but clever developer, so you think:
- <<Visual Studio will fix this, I heard somewhere that there’s a thing for these cases called Refactoring>>
Then you try to Refactor a whole sentence, and of course, it doesn’t work.
- <<Damn Visual Studio! Damn Microsoft! For things like these is why I prefer Linux! Sure that Linux knows how to Refactor a whole sentence>>
And then, you end up making massive text replacements, investing one week trying to find all the instances of the operation, and in fixing the resulting compilation errors. Of course, you forget to change 3 or 4 instances of it. That’s when you go to your boss again and say:
- <<Hey, not that bad. 3 out of 1500, that’s a 99% of success!>> 
- <<You are a 99% of an idiot>>… He replies…

Diary of the good developer

<<Monday, 36 of October of year 2124
Today, I slept well. I had breakfast and went to the office. My job for today was changing the way names are concatenated in our software. As I was clever enough a couple of months ago, and I isolated this task inside a method, all my job for today has been done in 10 secs. The rest of the day I was in the beach, getting some sun… No phone calls from my boss. No phone calls from clients… No news is good news. Life is wonderful…>>
You learned the lecture?

What lecture?

Pay attention Padawan to the following line of code:
FullName = Name + SecondName + LastName;
It doesn’t only make a name composition. It also decides how this concatenation will be done, and this is the key point. That’s what you need to understand and learn. Because duplicating work is bad, but duplicating decisions is one of the worst things you can ever do in software. I´ll put that in big so you remember it:
Remember: Duplicating work is bad, but duplicating decisions is one of the worst things you can ever do in software

Example 2: Duplication of Contents

As we mentioned, duplication is not only a software development issue.
Copy-pasting files is slower than copy-pasting code, but for sure it’s still faster than assembling and disassembling the broom ;)
That’s why many dumb-asses still don’t get it, making duplication of contents to be pretty frequent too. It’s the same stuff. You make a texture, you use it in hundreds of 3D models, and then one day you need to change it. You are screwed again man!
How to avoid duplication in this cases?
You should think about having global repositories for contents that are shared between many entities: texts, textures, sounds, xml files, whatever. Remember, duplication is the source of all evil in the world. If you find yourself doing a Copy-Paste, ask yourself if it is really necessary or if there’s a way to avoid it.

Example 3: Other, not so obvious, forms of Duplications

Imagine we have a game with the class Player, and with a property which holds his status. You know, things like “Dead”, “Alive”, “Running”, etc. At design time, we need to choose how to store that status:
  • The core C++ kings that want his player to be dead and alive at the same time (may be a zombies game?), will want his status to be an Hexadecimal Flag
  • Those who want their game to be fast and “cross-platform with their toaster”, will want his status to be an Int, you know: 0, 1, 2, etc
  • A Visual Basic developer will probably want it to be a String, actually writing the value “Dead”, “Alive”, etc.
Anyone is right? Nope (unless special circumstances). Let’s debate

Speed vs Comprehensiveness

- << Who said to use strings? Oh yes, the VisualBasic people… Come on man, do you know how slow is to compare two strings? Player is a core class, that is used all along the game, even in performance critical parts. You cannot be serious… strings… The solution is to use an Int, or an Hex flag.>>
- <<Who said to use an Int or an Hex Flag? Of course! The native people. Why not we all just turn back into binary, you prehistoric monkeys? Will we need to check what “State = 0” is every time we need to use it?  Man you cannot be serious… ints… The solution is to use strings…>>
As you probably already notice, one can stay like that forever, when there’s a much better solution… Yes, you clever boy, we will use Enumerations. But, do you know why? And what is more important, are you aware of all the benefits (and cons) of an Enumeration? Will talk more on this later, by now lets get deeper in the cons of the other alternatives.

Anyone said duplication again?

We have seen duplication of work (bad), and duplication of decisions (very bad). What none of the contenders in the previous discussion have seen is that both solutions imply a new, hidden form of duplication. It’s what I call: duplication of abilities.

Duplication of Abilities (also very bad)

When you choose to store the status of your player as an Int, or as a String, with values like “Dead” or “Alive”, you are delegating in all your developers the responsibility of writing that values correctly. In other words, you are duplicating the need for the ability to write it correctly in many many different places, and by many different people, instead of centralizing in a single point.
It’s quite likely that, from time to time, someone makes a mistake. So if any of them misspells a string value (“Deaz” instead of “Dead”), or simply misplaces an int, writing “1” instead of “0”, the compilator won’t say a thing, and the application will seem to run just fine. But... What are the effects of setting the status of a player incorrectly in a single, deep point of your code? Absolutely unpredictable. That can manifest as any kind of error in the application, what makes it very difficult to debug, trace and fix.
You must try to avoid this kind of duplication too, centralizing decisions, and reducing the number of places where a mistake can be made. 

Strong-type, baby

We said it before, but let’s just analyze another alternative for storing the “State” of the player in our game. What about Enums?
  • Enums are as fast at runtime as an Int or an hex flag
  • Enums are as comprehensible as Strings at debug time
  • Enums are convertible to and form Int, what makes cross-platform easy
  • Enums are convertible to and from String, what makes printing information easy
  • Enums are much more confortable to use, thanks to the Intellisense, you don´t need to remember the values, nor type them, just select them on the list
  • Enums are strongly typed, which will:
    • Minimize the so-called Duplication of Abilities. Since you select values on a list (thanks to the Intellisense) instead of typing them, you centralize the responsibility of making no mistakes in the enumeration
    • Even if any one ignores the Intellisense, and types the values manually, the presence of “hard to find” errors is also minimized, as a miswritten Enum value will be detected at compilation
So, YES! Enums ROCK! And thanks to the static helper methods found in the .Net Framework (Enum.Parse, Enum.GetNames, Enum.GetUnderlyingType, Enum.GetValues, Enum.IsDefined, etc), they are very versatile too.
So… Use them!!!

An additional security shield

Even if you are already strong-typing, there’s another thing you should always do, specially in big projects that cannot be tested completely too frequently… exactly, that’s correct: UNIT TESTING.
Again, designing good unit tests can take weeks, and again, students will say: <<Weeks! The it’s better not to use unit tests>>, and again you will have to tell no no no…
Unit test are the definitive way for giving big projects the robustness they need, going into places where the compiler can’t go, and testing deeper that just syntax. A unit test can detect a wrong player State, but of course, it has to be well designed. If you want to know more about unit testing, please refer to the following article by Roy Osherove:

Conclusion

Every time you find yourself doing a Copy-Paste, ask yourself:
  • Why are you doing it
  • Is it really necessary to duplicate what you are duplicating
  • Can it come to problems if something changes in the future
  • Is there any way of avoiding that duplication
Try to find implicit or hidden duplications, that can be problematic in the future. Try to reduce the number of places where mistakes can be made. Try to reduce the number of people that can make mistakes (and I mean something deeper than firing dumb-asses!). Strong-type and Unit-Test when possible!
Finally, apply some common sense to all of this. Sometimes, you should do just the opposite to what I´m telling… ja ja… ;)
Take Care!