Monday 22 August 2016

Manually Forking Chunks of Open Source Code

Consuming open source projects is generally easy when you are just taking a package that pulls in source or binaries into your code “as is”. However on occasion we might find ourselves needing to customise part of it, or even borrow and adapt some of its code to either workaround a bug or implement our own feature.

If you’re forking the entire repo and building it yourself then you are generally going to play by their rules as you’re aware that you’re playing in somebody else’s house. But when you clone just a small part of their code to create your own version then it might not seem like you have to continue honouring their style and choices, but you probably should. At least, if you want to take advantage of upstream fixes and improvements you should. If you’re just going to rip out the underlying logic it doesn’t really matter, but if what you’re doing is more like tweaking then a more surgical approach should be considered instead.

Log4Net Rolling File Appender

The driver for this post was having to take over maintenance of a codebase that used the Log4Net logging framework. The service’s shared libraries included a customised Log4Net appender that took the basic rolling file appender and then tweaked some of the date/time handling code so that it could support the finer-grained rolling log file behaviour they needed. This included keeping the original file extension and rolling more frequently than a day. They had also added some extra logic to support compressing the log files in the background too.

When I joined the team the Log4Net project had moved on quite a bit and when I discovered the customised appender I thought I’d better check that it was still going to work when we upgraded to a more recent version. Naturally this involved diffing our customised version against the current Log4Net file appender.

However to easily merge in any changes from the Log4Net codebase I would need to do a three-way diff. I needed the common ancestor version, my version and their version. Whilst I could fall back to a two-way diff (latest of theirs and mine) there were lots of overlapping changes around the date/time arithmetic which I suspected were noise as the Log4Net version appeared to now have what we needed.

The first problem was working out what the common ancestor was. Going back through the history of our version I could see that the first version checked-in was already a highly modded version. They had also appeared to apply some of the ReSharper style refactorings which added a bit of extra noise into the mix.

What I had hoped they would have done is started by checking in the exact version of the code they got from Log4Net and put in the check-in commit the Subversion revision number of the code so that I could see at what version they were going to fork it. After a few careful manual comparisons and some application of logic around commit timestamps I pinned down what I thought was the original version.

From here I could then trace both sets of commit logs and work out what features had been added in the Log4Net side and what I then needed to pull over from our side which turned out to be very little in the end. The hardest part was working out if the two changes around the date rolling arithmetic were logically the same as I had no tests to back up the changes on our side.

In the end I took the latest version of the code from the Log4Net codebase and manually folded in the compression changes to restore parity. Personally I didn’t like the way the compression behaviour was hacked in [1] but I wanted to get back to working code first and then refactor later. I tried to add some integration tests too at the same time but they have to be run separately as the granularity of the rollover was per-minute as a best case [2].

Although the baseline Log4Net code didn’t match our coding style I felt it was more important to be able to rebase our changes over any new Log4Net version than to meet our coding guidelines. Naturally I made sure to include the relevant Log4Net Subversion revision numbers in my commits to make it clear what version provided the new baseline so that a future maintainer has a clear reference point to work from.

In short if you are going to base some of your own code very closely on some open source stuff (or even internal shared code) make sure you’ve got the relevant commit details for the baseline version in your commit history. Also try and avoid changing too much unnecessarily in your forked version to make it easier to pull and rebase underlying changes in the future.

 

[1] What worried me was the potential “hidden” performance spikes that the compression could put on the owning process. I would prefer the log file compression to be a background activity that happens in slow time and is attributable to an entirely separate process that didn’t have tight per-request SLAs to meet.

[2] I doubt there is much call for log files that roll every millisecond :o).

No comments:

Post a Comment