Wednesday 11 October 2017

Every Commit Needs the Rationale to Support It

Each and every change to a codebase should be performed for a very specific reason – we shouldn’t just change some code because we feel like it. If you follow a checklist (mental or otherwise), such as the one I described in “Commit Checklist”, then each commit should be as cohesive as possible with any unintentional edits reverted to spare our blushes.

However, whilst the code can say what behaviour has changed, we also need to say why it was changed. The old adage “use the source Luke” is great for reminding us that the only source of truth is the code itself, but changes made without any supporting documentation makes software archaeology [1] incredibly difficult in the future.

The Commit Log

Take the following one line change to the JSON serialization settings used when persisting to a database:

DateTimeZoneHandling = DateTimeZoneHandling.Utc;

This single-line edit appeared in a commit all by itself. Now, any change which has the potential to affect the storage or retrieval of the system’s data is something which should not be entered into lightly. Even if the change was done to make what is currently a default setting explicit, this fact still needs to be recorded – the rationale is important.

The first port of call for any documentation around a change is probably the commit message. Given that it lives with the code and is (usually) immutable it stands the best chance of remaining intact over time. In the example above the commit message was simply:

“Bug Fix: added date time zone handling to UTC for database json serialization”

In the same way that poor code comments have a habit of simply stating what the code does, the same malaise can affect commit messages by merely restating what was changed. Our example largely suffers from this, but it also teases us by additionally mentioning that it was done to fix a bug. Suddenly we have so many more unanswered questions about the change.

Code Change Comments

In the dim and distant past it was not unusual to use code comments to annotate changes as well as to describe the behaviour of the code. Before the advent of version control features like “blame” (aka annotate) it was non-trivial to track down the commit where any particular line of code changed. As such it seemed easier to embed the change details in the code itself rather than the VCS tool, especially if the supporting documentation lived in another system; you could just use the Change Request ID as the comment.

As you can imagine this sorta worked okay at first but as the code continued to change and refactoring became more popular these comments became as distracting and pointless as the more traditional kind. It also did nothing to help reduce the overheard of tracking the how-and-why in different places.

Feature Trackers

The situation originally used to be worse than this as new features might be tracked in one place by the business whilst bugs were tracked elsewhere by the development team. This meant that the “why” could be distributed right across time and space without the necessary links to tie them all together.

The desire to track all work in one place in an Enterprise tool like JIRA has at least reduced the number of places you need to look for “the bigger picture”, assuming you use the tool for more than just recording estimates and time spent, but of course there are lightweight alternatives [2]. Hence recording the JIRA number or Trello card number in the commit message is probably the most common approach to linking these two sides of the change.

As an aside, one of the reasons many teams haven’t historically put all their documentation in their source code repo is because it’s often been inaccessible to non-developer colleagues, either due to lack of permissions or technical ability. Fortunately tools like GitHub have started to bridge this divide.

Executable Specifications

One of the oldest problems in software development has been keeping the supporting documentation and code in sync. As features evolve it becomes harder and harder to know what the canonical reason for any change is because the current behaviour may be the sum of all previous related requirements.

An ever-growing technique for combating this has been to express the documentation, i.e. the requirements, in code too, in the form of tests. At a high level these are acceptance tests, with more technical behaviours expressed as unit or integration tests.

This brings me back to my earlier example. It’s incredibly rare that any code change would be committed without some kind of corresponding change to the automated tests. In this instance the bug must have manifested itself in the persistence layer and I’d expect at least one new test to be added (or an existing one fixed) to illustrate what the bug is. Hence the rationale for the change is to fix a bug, and the rationale can largely be described through the use of one or more well written tests rather than in prose.

Exceptions

There are of course no absolutes in life and fixing a spelling mistake should not require pages of notes, although spelling incorrectly on purpose probably does [3].

The point is that there is a balance to be struck if we are to trade-off the short and long term maintenance of the system. It might be tempting to rely on tribal knowledge or the product owner’s notes to avoid thinking about how the rationale is best expressed, but finding a way to encode that information in executable form, such as through tests, provides both the present reviewer and the future software archaeologist with the most usable representation.

 

[1] See my “Software Archaeology” article for more about spelunking a codebase’s history.

[2] I’ve written about the various tools I’ve used in the past in  “Feature Tracking”.

[3] The HTTP “referer” header being a notable exception, See Wikipedia.

No comments:

Post a Comment