Saturday, October 28, 2006

C Add-Ins in Excel: Performance Comparison with VBA


As Excel is also applied in order to solve industry level business problems, and as therein the speed will be of the essence, the choice of programming in VBA could become unfeasible. In such cases critical parts can be developed in C, which is the topic of this post.

It is curious to note that while other language choices in Microsoft Visual Studio are well documented, writing an Xll in C – that is, registering it, exchanging data via XLOPER structure and running C API – is much more of an uncharted water.

There still exists some vital material: for instance a thorough book “Excel add-in development in C/C++, Applications in Finance” by Steve Dalton. On the net an example of a guide would be interpolation add-in by JChampion, which also goes over the basic concepts, and might be a better read, than Microsoft’s own material on the net.

The example is as follows. The idea was to test C at its best: to have it make 137 000 000 comparisons of text based on Excel data. (Text comparisons should have the biggest superiority of performance in C, because of how it handles the data type).
The data (which is interesting on its own!) comes from perhaps the biggest publicly available database of owners in Estonia –
the daily list of the owners of the stock of company Tallink, consisting of records on about 16000 entities. The add-in works as follows – it is an array formula that compares two snapshots of data and writes out stock owners, who are present in both periods. No optimization of the algorithm is done – just a plain one by one comparison.

The Microsoft Visual Studio 2005 project with C files and the compiled Xll as well as the ready made files on Tallink are downloadable. I would like to express my sincere gratitude to Jüri Vilipõld, who benchmarked the programs. The results showed, that depending on the computer, the program took about an hour to run if VBA operations accessed Excel cells throughout the operation, programs using VBA arrays trimmed that time down to about 40 seconds, depending on the implementation – one using the direct copying of the entire array and the other not. And the C add-in in turn was about 5.5 times faster, completing in less than 10 seconds. The results, thus, give us a picture of the speed of accessing Excel objects excessively, of using VBA or optimized array copying, and of using C.

1 comment: