Tuesday 30 June 2009

Where Are the "Lite" Editions of Static Code Analysis Tools?

When I started out, the compiler I was using was set to build on warning level 2 (it was MS C600) which pretty much just told you if your code was well formed or not, and that was all I cared about then. Fortunately whilst working there the company discovered Writing Solid Code by Steve Maguire along with Code Complete by SteveMcConnell. One of the practices Steve Maguire suggests is cranking the diagnostic level up on your tools to maximum and leaving it there - pretty much a best practice these days. The net effect of this is to enable the static code analysis within the compiler to highlight your 'valid', but potentially dubious code, giving you a chance to fix it before it becomes a bona fide bug. Yes, sometimes it gets it wrong, but there is nearly always a way to rewrite the code to silence the compiler (preferable) or a sledgehammer in the form of a pragma to disable the warning for a small section of code.


But compilers are first-and-foremost seen as a build tool - the static code analysis abilities are a nice-to-have feature that has probably dropped out as a by-product. For full-on code analysis you apparently need an industrial strength tool, which quite probably comes at an industrial strength price. Now I'm sure these tools have taken a considerable investment and therefore demand a high price, which is fine for a big corporate client who can also afford the training required, but for smaller outfits and freelance developers the cost is prohibitive. Nowadays I can get started in C++ development using good quality free tools like Code::Blocks for the IDE and GCC for the compiler suite or even the all-in-one: Visual Studio Express. But what about when I need to take my skill further so that I can improve the quality aspects of my coding?


At various clients in the past I have used tools like BoundsChecker (now DevPartner Studio) and Rational PurifyPlus to check for memory leaks, buffer overruns, uninitialised variables etc. and they have been useful, but I don't feel it was the most efficient use of my time. They are a great way of tracking down a specific issue, but not at all the right tool for continuously sanity checking the codebase. On the other hand one of the oldest and potentially useful tools for doing this with C++ is probably PC-Lint (something I am keen to get my hands on - more so since going to Anna-Jayne Metcalfe's ACCU talk Taming the Lint Monster). And even Microsoft have got in on the act and added an /Analyze switch to its Visual C++ compiler - but only in the Enterprise edition.

Code Analysis (Static or Dynamic) is a noisy business, meaning that you need to be proficient in the tool to make the best use of it. But to do that requires some serious time with it and many businesses just wont be able to justify the cost of both the tool and time required for training. And this is normally where those "Lite" editions come into play. Freelancers (and hobbyist coders) like myself, who often have the ability to affect buying decisions would have considerably more weight if we were able to demonstrate the real benefits because they were a natural part of our toolset.

One client I worked at lost many man days due to uninitialised member variables, something I know at least one static code analysis tool points out. Personally I get annoyed when I see a bug that I know a tool could have prevented. What I really wanted to introduce into the build process was a step to run a short static code analysis job before the build and then later, as we got more proficient with the tool, add a weekly step to run a much deeper analysis. The stumbling block was that I could not even get a trial version of the product I wanted to use so that I could play in my own time to build a list of current defects to use to justify the cost.

Having the knowledge of certain products on my CV naturally helps make me more marketable, but more importantly, as a professional I feel I have another weapon in my arsenal to help ensure the quality of my code.

Monday 22 June 2009

The Default 'Size' & 'Index' Type: size_t vs int

I started out writing applications in C on 16-bit Windows. The company I worked for favoured using the Windows SDK functions instead of the C-Runtime, so for example I would use lstrlen() instead of strlen(). Unfortunately, the Windows SDK is biased towards using int instead of size_t for parameters and return values that represent counts and indices. As I moved into working on MFC based applications the habit continued as MFC has it's own container types, such as CArray, which also favoured int, as does the popular common controls, like the CListCtrl. The final bias towards int was the use of -1 as an error code when using controls such as the combo and listboxes through constants such as CB_ERR and LB_ERR.

Unfortunately this habit got carried over into
my own framework because I also had my own string and container classes and my wrapper facade is very thin in many places. So it wasn't much of an issue until I finally junked my clunky containers and started using the STL in preference...

All of a sudden the compiler was constantly complaining about signed/unsigned comparisons which would ripple through my code as loops where changed from,

for (int i = 0; i < v.Size(); ++i)

initially to,

for (int i = 0; i < v.size(); ++i)

and finally to,

for (size_t i = 0; i != v.size(); ++i)

(Changing the code structure to use iterators instead - the correct solution - was going to be an exercise for another day.)

I realised that a large part of my codebase was int-based instead of size_t-based and the number of uses of static_cast was growing and making the code even uglier so I decided to bite the bullet and go size_t across the board. The entire codebase isn't massive (160,000 LOC or 45,000 SLOC according to the excellent
Source Monitor) and it took a few train journeys but it felt good. However one subsequent annoyance was with the comparisons to CB_ERR etc as they are just #define's for -1 so I followed the STL string type (which returns -1 as a result for some of the find() methods) and declared a global constant,

namespace Core
{
static const size_t npos = static_cast<size_t>(-1);
}

In retrospect I realise I should have declared an 'npos' in each class, such as CComboBox and CListBox, instead of globally, but there was quite a bit of code that just had "= -1" as an argument default and I got lazy trying to resolve all the issues quickly.

Now my codebase felt more wholesome because of the large scale refactoring of int to size_t and I felt more aligned to the C++ world. So I decided to see what happened when I ran the 64-bit cross compiler from the Platform SDK over it....

Yup, lots of errors about truncation from a 64-bit value to a 32-bit value (I always compile with /W4 /Wx) , i.e. conversions from a size_t to an int or DWORD! It appears that functions like GetWindowText() still traffic in int's and many of the I/O functions that specified sizes for buffers still use DWORD's. So, more static_casts went in.

And now the feeling of wholesomeness is gone again as I contemplate the impedance mismatch between the Windows API and the C/C++ standard libraries. My framework is largely a Wrapper Facade and is therefore thin, but I don't believe it should expose the use of int or DWORD to its clients. I also have use of my own 'uint' type in various places where I knew an int was undesirable, but that now also requires a cast when the source is a size_t. I even started a thread on one of the
ACCU channels to see if other people felt that size_t was overused and if it would be a good idea to at least use an index_t typedef for [] operators and index style parameters.

For now the possibly excessive use of size_t stays whilst I chew the fat. I also want to deal with the nastiness that WPARAM and LPARAM has created in the framework due to the documentation using these types without describing what a suitable type would be (e.g. the Character Code - for WM_CHAR) as I want to use richer typedefs instead of the limited set the Windows API uses (e.g. WCL::TimerID instead of UINT_PTR).

Friday 19 June 2009

Visual C++, the INCLUDE variable and the /USEENV switch

Back on my last contract I ported the codebase from VS2003 (VC++ 7.1) to VS2005 (VC++ 8.0). The port was pretty easy, but is wasn't until we started developing with VS2005 that we ran into a really nasty issue.

The system used STLport instead of the bundled STL for a number of reasons, and if you've used STLport with Visual C++ you'll possibly use a .cmd script, along with the devenv /useenv command line switch as a way of injecting the STLport paths into the #include chain before the standard MS ones. Here is an example of how that script built the include path and launched VC++,

. . .
set VCInstallDir=%VS80COMNTOOLS%..\..
set STLPORT=%DEV%\3rdParty\STLport-4.6.2
set BOOST=%DEV%\3rdParty\boost.1.33.0
. . .
set INCLUDE=^
%STLPORT%\stlport;^
%VCInstallDir%\VC7\include;^
%VCInstallDir%\VC7\atlmfc\include;^
%VCInstallDir%\VC7\PlatformSDK\include;^
%DEV%\OurCode\Lib;^
%BOOST%;^
. . .
start devenv.exe /useenv OurSolution.sln

Notice that the codebase also used the INCLUDE variable as the mechanism for finding it's own library code - the line "%DEV%\OurCode\Lib". These libraries were not 'static' in nature they were developed at the same time as the main code and so any library change was expected to cause the relevant dependencies to build immediately.

When I started using VS2005 for actual development I began to run into strange problems, which would manifest as bad builds - virtual functions calling the wrong code, memory corruption etc. Previous experience taught me to disable the "Minimal Rebuild" option and delete the .idb file, but they didn't seem to have any effect. Once my teammates joined me it became a real issue as we were wasting a considerable amount of time. We even talked about dropping VS2005 and going back to VS2003 until we could get our hands on VS2008.

Fortunately one of my colleagues, Sergey Buslov, was not going to be beaten so easily and after an exhaustive search came across a new setting in VS2005, called "External Dependencies". This resides in the same location as the other VC++ paths - "Tools ! Options" under the "Projects & Solutions ! VC++ Directories" section. This is where you normally configure the default Executable, Include, Lib etc paths. The "External Dependencies" list was exactly the same as the INCLUDE environment variable, and lo and behold if you cleared the list, all the dependency issues went away. The documentation only has this to say on the setting,

Exclude Directories
Directory settings displayed in the window are the directories that Visual Studio will skip when searching for scan dependencies.

I haven't seen any clarification yet, but I suspect that the Visual Studio team assume that you'll only put paths to 3rd party libraries in your INCLUDE variable. In my own work I have always had a single IncludeDependency that points to the root of my libraries source tree which is specified in every project so I didn't see this at home.

The real kicker though is that there is no obvious way of stopping Visual C++ from configuring the "External Dependencies" from the INCLUDE variable when using the /useenv command line switch! There is no EXTERNAL_DEPENDENCIES variable that I know of that would allow you to configure it appropriately (it does seem a very useful optimisation). In the end the hack Sergey came up with is really ugly :-) He found that if you modified the default paths, VS wouldn't interfere and inject the INCLUDE paths into it. So here are the instructions,
  1. Open Visual C++ from the standard icon, NOT by invoking it with devenv /useenv or your script.
  2. Open the "Tools ! Options" dialog and find the "External Dependencies" list.
  3. Modify them by adding another dummy entry at the end. We just stuck the string "(null)" in there for want of something better.
  4. Close Visual C++ and re-open it with your script and check that the "External Dependencies" list has not been hijacked.
At the time we hit this (06/2008) there was only one post I could find that seemed similar to our problem,

http://social.msdn.microsoft.com/forums/en-US/vcgeneral/thread/c447c0bd-bcdb-4e52-a8af-b1341ce3ad9f

I notice that it's also referenced in the following Microsoft Connect query, but closed as 'unreproducible',

http://connect.microsoft.com/VisualStudio/feedback/ViewFeedback.aspx?FeedbackID=274454

I'll see if I can open a new issue and maybe more information about this will come to light.

Thursday 18 June 2009

Building Visual C++ Projects From the Command Line

As a developer biased towards Windows applications I find that both as a professional and a hobbyist that Visual C++ is my day-to-day IDE. I still use VC++ 7.1 (aka VS2003) for my personal work as I ship source code as well as binaries and it's easier to upgrade projects and solutions to later editions of Visual C++ for compatability testing than to go backwards. With 6 class libraries (+ unit tests) and 12 applications to build, manually launching the solutions and interacting with the GUI is just tedious and obviously error prone. But a common question on the forums is whether it is possible to build projects using Visual C++ from the command line - sometimes I wonder if people take the "Visual" in Visual C++ too literally :-)

The starting point is firing up a command prompt and setting up the environment variables so that you can invoke Visual C++ to build a solution. Some editions of Visual C++ have a "Command Prompt" item listed in the External Tools - VS2008 does. If you look at the command for this you'll see that it runs a batch file called vcvars32.bat. But opening the IDE just to fire up a command prompt is madness. Fortunately MS added an environment variable from VC++ 7.0 to make life a little easier - VSxyCOMNTOOLS. This means you can open an arbitrary command prompt and run, say,

C:\> "%VS71COMNTOOLS%..\..\vc7\bin\vcvars32.bat"

Unfortunately the VC folder changed name with VC++ 8.0 to drop the version suffix, so it's,

C:\> "%VS80COMNTOOLS%..\..\vc\bin\vcvars32.bat"

And should you want to cross-compile 64-bit code on 32-bit windows, with VS2008 it's as easy as,

C:\> "%VS90COMNTOOLS%..\..\vc\bin\x86_amd64\vcvarsx86_amd64.bat"

That's still just a little too awkward for me to remember, so I have a .cmd file called SetVars.cmd that I use to setup the variables for whatever version of VC++ I'm testing with,

C:\> SetVars vc71

That covers setting up the enviroment variables. The next job is to invoke Visual C++ to build a solution. This can be done in one way under VC6.0-7.1 and two ways under VC8.0-9.0. The old way was to invoke DEVENV.com like this,

C:\> devenv /nologo /useenv "D:\Dev\my.sln" /build Debug

As of VS2005, you can now use the MSBUILD style tool called VCBUILD. Unfortunately this has a different command line format and cannot be used to build installer (.vdproj) projects. The equivalent command would be this,

C:\> vcbuild /nologo /useenv "D:\Dev\my.sln" "DebugWIN32"

Once again though I find this all a bit tedious and have wrapped this logic into another .cmd script called Build.cmd that just takes a solution filename. The choice of which tool to use, devenv or vcbuild, is based on the compiler that was configured by running the SetVars.cmd script.

C:\> Build "D:\Dev\my.sln"

The natural extension to this is building all my libraries, unit tests and applications from one command so I can see what I have broken. A straight FOR loop wrapped into another .cmd script called BuildAll.cmd helps reduce the clutter from this raw example,

C:\> FOR /R %I IN (*.sln) DO devenv /nologo /useenv %I /build Debug

The final script I use is for upgrading the projects and solutions from VS2003 to VS2005, VS2008 etc before building. By default I keep VS2003 version files in my source repository and then run the upgrade script followed by the build script to check compilation on later versions of Visual C++,

C:\> FOR /R %I IN (*.vcproj) DO vcbuild /upgrade %I
C:\> FOR /R %I IN (*.sln) devenv /upgrade %I

The one fly-in-the-ointment is that the Express editions of Visual C++ don't allow you to upgrade solutions - only projects. At least not from the command line. However you can do it through the GUI, which means invoking the IDE for each solution and manually clicking the wizard buttons!

C:\> FOR /R %I IN (*.sln) devenv /upgrade %I

The scripts are available on my website here.

Wednesday 10 June 2009

Class Generator v2.0 Released

The other tool which has been in Beta for an eternity but now finally released is v2.0 of my Class Generator - a simple tool for generating the skeletons of classes, interfaces etc. This was always just an internal tool, but I have now added a manual so that the substitution parameters are now officially documented along with the configuration file format. Yes, I know that using C++ for something like this is overkill and you could probably knock up some VBScript to do the job in half the time, but that misses the point that my tools are not about the destination but the journey.

The Release and Debug binaries are on my website here. The source code will be added just as soon as I've labelled and packaged it.

Monday 8 June 2009

Visual C++ Project Compare v1.0 Released

I've finally got around to releasing v1.0 of my tool for comparing Visual C++ project files to highlight inconsistencies in the settings between builds or other related projects. It's been in Beta since Xmas last year and the only change was to add the missing <Configuration> attributes to the comparison as it didn't show up mismatched "CharacterSet" settings for example.

The Release and Debug versions are already available and the source code will be released in a few days. The packages along with a screenshot and the manual are on my web site here in the Win32 section.