<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>[Smalltalk] &#187; .net</title>
	<atom:link href="http://obiltschnig.com/tag/net/feed/" rel="self" type="application/rss+xml" />
	<link>http://obiltschnig.com</link>
	<description>Notes from the virtual desk of Günter Obiltschnig</description>
	<lastBuildDate>Thu, 26 Aug 2010 14:42:44 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.5</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Thoughts on C++ vs. Java and .NET Performance</title>
		<link>http://obiltschnig.com/2010/03/01/thoughts-on-c-vs-java-and-net-performance/</link>
		<comments>http://obiltschnig.com/2010/03/01/thoughts-on-c-vs-java-and-net-performance/#comments</comments>
		<pubDate>Mon, 01 Mar 2010 10:32:17 +0000</pubDate>
		<dc:creator>Günter Obiltschnig</dc:creator>
				<category><![CDATA[C++]]></category>
		<category><![CDATA[.net]]></category>
		<category><![CDATA[java]]></category>
		<category><![CDATA[performance]]></category>

		<guid isPermaLink="false">http://obiltschnig.com/?p=37</guid>
		<description><![CDATA[I recently spent some time improving the performance of my company&#8217;s Fast Infoset library. The library has been written back in 2006 and is based on the XML library from the POCO C++ Libraries.
After spending some quality time in the debugger trying to find out what actually goes on in the parser (I did not [...]]]></description>
			<content:encoded><![CDATA[<p>I recently spent some time improving the performance of <a href="http://www.appinf.com">my company&#8217;s</a> <a href="http://en.wikipedia.org/wiki/Fast_Infoset">Fast Infoset</a> library. The library has been written back in 2006 and is based on the XML library from the <a href="http://pocoproject.org">POCO C++ Libraries</a>.</p>
<p>After spending some quality time in the debugger trying to find out what actually goes on in the parser (I did not write the original code myself), the first thing I noticed was an excessive amount of memory allocation and string copying going on. So now I had something to focus on. For various reasons, I prefer to use std::string to handle strings in all of the C++ code I write. First of all, std::string is the standard string class in C++, so it&#8217;s natural to use it in all interfaces where a string has to be passed around. Otherwise you&#8217;ll end up with what I call C++ string hell, where half of your code ends up doing conversions between different string implementations. Windows C++ and COM developers know what I&#8217;m talking about. Now, std::string is a fine string class, with only two drawbacks. First, as soon as you&#8217;re going down the std::string route, there&#8217;s no way out. Mixing it with another string implementation requires string conversions, resulting in endless copying and memory allocation nightmare. This is not easily fixed, and we&#8217;ll have to live with that. The second, bigger issue, is that copying std::string objects is very expensive. Sure, some time ago we had reference counted std::string implementations that tried to avoid memory allocation and copying through the use of copy-on-write mechanisms. But these were ill-fated as well, mostly because copy-on-write had to be implemented very conservatively so that copies were in many cases created even when not necessary. Also, implementing these strings in a thread-safe way required expensive locking. So, no reference counted std::strings for us (except for those stuck with GCC 3.x or Visual C++ 6).</p>
<p>So, with all that in mind, I tried to reduce std::string copying and memory allocation as much as possible. What I did was reusing std::string instances as often as possible. For example, for certain temporary strings needed for various purposes, I no longer create a std::string instance on the stack, but rather use a std::string instance variable in my class. Memory for that string is allocated once (I use reserve() to preallocate sufficient memory for typical strings), thus saving many memory allocations and deallocations. For example, previously, one std::string instance was created (and destroyed) for every element found in a Fast Infoset document. Now, there&#8217;s just one std::string instance created for the whole document. Consider a large Fast Infoset document with 100000s of elements and you can imagine what this means. There were a few other changes I made to the code (reducing heap allocations in other places, improving the implementations of various tables, etc.) but nothing brought as significant performance improvements as reducing std::string memory allocations and copying.</p>
<p>And this is where Java and .NET have a significant performance advantage over C++ (when using std::string). Strings in Java and .NET are immutable, which means that, once a string has been created, it can no longer be modified. This has a few implications for performance. First, it is never necessary to copy strings. Strings are reference classes in .NET and Java anyway, so when passing around a string, only a pointer needs to be passed. Compare this with C++, where, unless a pointer or reference is used, the string object is actually copied, resulting in a memory allocation and memory copy operation. There are many cases where all one has to do is to store and/or pass around immutable strings. XML and Fast Infoset parsers are a prime example: element and attribute names and character data strings are created once by the parser, and then never modified again. The lack of a standard immutable string class in C++ is a real drawback here, performance wise.</p>
<p>Another issue where Java and .NET have performance advantages over C++ and the C++ Standard Library is streams. Streams in Java and .NET are plain simple &#8212; all they do is transport raw bytes around. There&#8217;s no encoding, no localization and no formatting, this is handled by separate classes. Compare C++ iostreams, that include reading and writing with character encoding conversion (via locales) in one class, the stream buffer (formatting and localization is handled by stream classes, on top of stream buffers). If one wants to just read or write raw bytes from/to a file, there&#8217;s some overhead involved, due to locale support, when using std::fstream (or other stream (buffer) classes that use locales). It&#8217;s possible to implement stream buffers that do not use locales, but this requires extra work (e.g., the stream and stream buffer classes in POCO).</p>
<p>While well written C++ code is usually faster than equivalent Java or .NET code, some extra work (and good knowledge of the standard library internals) is required to write fast C++ applications dealing with lots of strings or stream-based I/O.</p>
]]></content:encoded>
			<wfw:commentRss>http://obiltschnig.com/2010/03/01/thoughts-on-c-vs-java-and-net-performance/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
