Thoughts on C++ vs. Java and .NET Performance
I recently spent some time improving the performance of my company’s Fast Infoset library. The library has been written back in 2006 and is based on the XML library from the POCO C++ Libraries.
After spending some quality time in the debugger trying to find out what actually goes on in the parser (I did not write the original code myself), the first thing I noticed was an excessive amount of memory allocation and string copying going on. So now I had something to focus on. For various reasons, I prefer to use std::string to handle strings in all of the C++ code I write. First of all, std::string is the standard string class in C++, so it’s natural to use it in all interfaces where a string has to be passed around. Otherwise you’ll end up with what I call C++ string hell, where half of your code ends up doing conversions between different string implementations. Windows C++ and COM developers know what I’m talking about. Now, std::string is a fine string class, with only two drawbacks. First, as soon as you’re going down the std::string route, there’s no way out. Mixing it with another string implementation requires string conversions, resulting in endless copying and memory allocation nightmare. This is not easily fixed, and we’ll have to live with that. The second, bigger issue, is that copying std::string objects is very expensive. Sure, some time ago we had reference counted std::string implementations that tried to avoid memory allocation and copying through the use of copy-on-write mechanisms. But these were ill-fated as well, mostly because copy-on-write had to be implemented very conservatively so that copies were in many cases created even when not necessary. Also, implementing these strings in a thread-safe way required expensive locking. So, no reference counted std::strings for us (except for those stuck with GCC 3.x or Visual C++ 6).
So, with all that in mind, I tried to reduce std::string copying and memory allocation as much as possible. What I did was reusing std::string instances as often as possible. For example, for certain temporary strings needed for various purposes, I no longer create a std::string instance on the stack, but rather use a std::string instance variable in my class. Memory for that string is allocated once (I use reserve() to preallocate sufficient memory for typical strings), thus saving many memory allocations and deallocations. For example, previously, one std::string instance was created (and destroyed) for every element found in a Fast Infoset document. Now, there’s just one std::string instance created for the whole document. Consider a large Fast Infoset document with 100000s of elements and you can imagine what this means. There were a few other changes I made to the code (reducing heap allocations in other places, improving the implementations of various tables, etc.) but nothing brought as significant performance improvements as reducing std::string memory allocations and copying.
And this is where Java and .NET have a significant performance advantage over C++ (when using std::string). Strings in Java and .NET are immutable, which means that, once a string has been created, it can no longer be modified. This has a few implications for performance. First, it is never necessary to copy strings. Strings are reference classes in .NET and Java anyway, so when passing around a string, only a pointer needs to be passed. Compare this with C++, where, unless a pointer or reference is used, the string object is actually copied, resulting in a memory allocation and memory copy operation. There are many cases where all one has to do is to store and/or pass around immutable strings. XML and Fast Infoset parsers are a prime example: element and attribute names and character data strings are created once by the parser, and then never modified again. The lack of a standard immutable string class in C++ is a real drawback here, performance wise.
Another issue where Java and .NET have performance advantages over C++ and the C++ Standard Library is streams. Streams in Java and .NET are plain simple — all they do is transport raw bytes around. There’s no encoding, no localization and no formatting, this is handled by separate classes. Compare C++ iostreams, that include reading and writing with character encoding conversion (via locales) in one class, the stream buffer (formatting and localization is handled by stream classes, on top of stream buffers). If one wants to just read or write raw bytes from/to a file, there’s some overhead involved, due to locale support, when using std::fstream (or other stream (buffer) classes that use locales). It’s possible to implement stream buffers that do not use locales, but this requires extra work (e.g., the stream and stream buffer classes in POCO).
While well written C++ code is usually faster than equivalent Java or .NET code, some extra work (and good knowledge of the standard library internals) is required to write fast C++ applications dealing with lots of strings or stream-based I/O.
Samsung Bada
Samsung has released a new OS/platform for smartphones, based on C++. My initial excitement for the platform has vanished quickly, though, after looking at their introductory presentation for developers. First thing that caught my eye was “two-phase construction”. That immediately rang my alarm bells. This was followed by their explanation that one cannot use C++ exceptions due to “resource constraints” on embedded devices. Instead, one has to use a home-grown macro-based exception handling mechanism, as well as return value error codes. Now that explains the need for two-phase construction. Note to Samsung: Symbian called – it wants its design mistakes from the 90s back. Other things that I noticed were a lack of smart pointer usage (apparently, smart pointers are too resource hungry, or what) and a few other things that should send shivers down the spine of any C++ developer. And they have Java-like container classes as well. So, unfortunately, nothing to get excited about. Looks like iPhone, Symbian 9.x and Linux-based platforms like Maemo remain the only choices for C++ developers.
New Case Study on my Company Website
Read my company’s new case study to learn how the C++ libraries and tools from Applied Informatics helped building an innovative ticketing/admission control unit.
ESE Kongress

I’ll be giving a talk at this years Embedded Software Engineering Kongress in Sindelfingen, Germany, from December 8 to 10. The title of the talk (in German) is C++ für sicherheitskritische Systeme (C++ for safety-critical systems), and it will be given on December 9 from 9:45-10:30. Hope to see you there!