Thursday, 11 November 2010

A way to handle big merges

I had the task of merging two branches with lots of changes. The problem was I wasn't very familiar with the code but I took the task because I have lots of experience in this sort of stuff, although this merge was the biggest I have done. The branches were over a year old and unfortunately it looked like they were differentiated as best as possible. For instance doing refactoring in one branch but not bothering to do it in the other. An extra challenge was that the quality wasn't very good and documentation was somewhat lacking. A typical situation, in other words.

So I thought about this for a while before plunging head in. Quickly I figured out that the best way to figure out what was needed would be a three-way compare where I would see the two branch heads and their common ancestor. This way I would immediately see where each difference was made (and in many cases the same blocks were changed in both branches, but in different ways). I tried a few merge tools but only one really did what I needed: KDiff3. It's by no means the perfect tool (it has some usability issues) but with it I could easily have the ancestor on the left pane, the other branch in the middle and the other - where I wanted to do the merge - in the right pane and I could edit the final product easily in the merge window at the bottom.

The branches were originally in Subversion, but I moved them to Git because it's far quicker and has a much better support for branching (and I knew it better).

In theory I could have configured KDiff3 to work from a single clone, but I figured it was just as well to to have each commit on it's own clone, just to be on the safe side.

So I had three clones: initial, branchA and branchB. And off I went.

At first I tried to do this the thinking man's way, i.e. try to actually merge the code, but it became quickly obvious that this would take me months since most changes were non-trivial. "If I change this, where else does it affect?"

So I turned this into an idiot-proof process: I simply flagged each change something like this:
#if defined(BRANCHA)

I had the following possibilities:
1. Something was changed in A, but not B (B is the same as ancestor),
2. Something was changed in B, but not A,
3. Something was changed in both A and B and in the same way,
4. Something was changed in both A and B but differently.

1 and two would be handled like the example above (unless a very trivial change, like an added logging command), third situation would mean no flagging and the fourth would reguire a if defined... elseif defined...else to keep the original code for later reference.

There were some 650 places I needed to flag but after that the code was in one branch and the cleanup was even possible. Now it is just a case of looking at each change and deciding whether it can be used on both branches or if this really is a differentiation. The latter can then possibly be made into run-time flags which is preferrable because this would mean the binary is the same for both branches. This in turn would mean automated testing would be a lot faster since most of the similar code could be tested just once rather than twice. And considering the changes I think the number of different code segments is well below 100 (rest is mostly refactoring that can be used in both branches).

This was definitely the best way because mistakes basically meant the code wouldn't compile rather than anything more complex. The mistakes I made usually involved putting the #endif directive on the wrong side of a brace or some similar thing and with the right technique even those are relatively easy to find.

After everything was done and the SW was built, tests passed. Like that, and for both branches. So what I spent figuring out the best way was saved many times over during testing and debugging.

No comments:

Post a Comment