<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>What your mother never told you about graphics development</title>
	<atom:link href="http://zeuxcg.org/feed/" rel="self" type="application/rss+xml" />
	<link>http://zeuxcg.org</link>
	<description>Thoughts about current state of realtime computer graphics.</description>
	<lastBuildDate>Mon, 07 Nov 2011 16:21:19 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='zeuxcg.org' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://s2.wp.com/i/buttonw-com.png</url>
		<title>What your mother never told you about graphics development</title>
		<link>http://zeuxcg.org</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://zeuxcg.org/osd.xml" title="What your mother never told you about graphics development" />
	<atom:link rel='hub' href='http://zeuxcg.org/?pushpress=hub'/>
		<item>
		<title>Mesh optimization &#8211; Quantizing floats</title>
		<link>http://zeuxcg.org/2010/12/14/mesh-optimization-quantizing-floats/</link>
		<comments>http://zeuxcg.org/2010/12/14/mesh-optimization-quantizing-floats/#comments</comments>
		<pubDate>Tue, 14 Dec 2010 08:41:48 +0000</pubDate>
		<dc:creator>zeuxcg</dc:creator>
				<category><![CDATA[Asset pipeline]]></category>
		<category><![CDATA[Memory]]></category>
		<category><![CDATA[Optimization]]></category>
		<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://zeuxcg.org/?p=356</guid>
		<description><![CDATA[Over the next few posts I&#8217;d like to write about optimizing mesh data for run-time performance (i.e. producing vertex/index buffers that accurately represent the source model and are as fast to render for GPU as possible). There are several important &#8230; <a href="http://zeuxcg.org/2010/12/14/mesh-optimization-quantizing-floats/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=zeuxcg.org&amp;blog=15741095&amp;post=356&amp;subd=zeuxcg&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Over the next few posts I&#8217;d like to write about optimizing mesh data for run-time performance (i.e. producing vertex/index buffers that accurately represent the source model and are as fast to render for GPU as possible).</p>
<p>There are several important things you have to do in order to optimize your meshes, and one of them is packing your vertex/index data. Packing index data is trivial &#8211; for any sane mesh there are no more than 65536 unique vertices, so a 16-bit index buffer is enough; this is a small thing, but trivial to do. Reducing the vertex size is more complex.</p>
<p>In order to compress your vertex data you have to know the nature of your data (sign, range, special properties (like, is it a normalized vector), value distribution) and the available compression options. This is the topic for the next article; today I want to talk about quantization. <span id="more-356"></span></p>
<p>All methods of vertex compression that are trivially implementable on GPU involve taking the floating-point source data and storing it in a value with less bits of precision; usually the value is either an integer or a fixed-point with a limited range (typically [-1; 1] or [0; 1]). This process is known as quantization.</p>
<p>The goal of quantization is to preserve the original value with as much accuracy as possible &#8211; i.e., given a decode(x) function, which converts from fixed-point to floating-point, produce an encode(x) function such that the error, i.e. <code>abs(decode(encode(x)) - x)</code>, is minimized. Additionally it may be necessary to perfectly encode a finite set of numbers (i.e so that the error is zero) &#8211; for example, it is usually useful to preserve endpoints, i.e. if you&#8217;re quantizing pixel component values, you&#8217;re encouraged to encode 0 and 1 perfectly, or pixels that were previously fully transparent will start to slightly leak some color on the background, and pixels that were previously completely white will give a dark color if you exponentiate their intensity.</p>
<p>Note that the error function is defined in terms of both encode and decode functions &#8211; the search for quantization function should start with the decode function. For GPU, decode functions are usually fixed &#8211; there are special &#8216;normalized&#8217; formats, that, when used in a vertex declaration, automatically decode the value from small precision integer to a limited-range floating point value. While it is certainly possible to use integer formats and do the decoding yourself, the default decode functions are usually sane.</p>
<p>So, what are the functions? For DirectX 10, there are *_UNORM and *_SNORM formats. Their decoding is described in the documentation: for *_UNORM formats of n-bit length, the decode function is decode(x) = x / (2^n &#8211; 1), for *_SNORM formats of n-bit length the decode function is decode(x) = clamp(x / (2^(n-1) &#8211; 1), -1, 1). In the first case x is assumed to be an unsigned integer in [0..2^n-1] interval, in the second case it&#8217;s a signed integer in [-2^(n-1)..2^(n-1)-1] interval. </p>
<p>In for the UNORM case the [0..1] interval is divided in 2^n &#8211; 1 equal parts. You can see that 0.0 and 1.0 are represented exactly; 0.5, on the other hand, is not. The SNORM case is slightly more complex &#8211; the integer range is not symmetric, so two values map to -1.0 (-2^(n-1) and -2^(n-1) &#8211; 1).</p>
<p>This is only one example; other APIs may specify different behaviors. For example, OpenGL 2.0 specification has the same decoding function for unsigned numbers, but a different one for signed: decode(x) = (2x + 1) / (2^n &#8211; 1). This has slightly better precision (all numbers encode distinct values), but can&#8217;t represent 0 exactly. <a href="http://www.x.org/docs/AMD/R5xx_Acceleration_v1.3.pdf">AMD GPU documentation</a> describes a VAP_PSC_SGN_NORM_CNTL register, which may be used to set the normalization behavior to that of either OpenGL, Direct3D 10 or a similar method to Direct3D 10, but without [-1..1] range clamping (i.e. the actual range is not symmetrical).</p>
<p>Once we know the decoding formula, it&#8217;s easy to infer the encoding formula which gives the minimum error on average. Let&#8217;s start with unsigned numbers first. We have a [0..1] floating point number, and a 3-bit unsigned integer ([0..7] integer range).</p>
<p><a href="http://zeuxcg.files.wordpress.com/2010/12/unorm.png"><img src="http://zeuxcg.files.wordpress.com/2010/12/unorm.png?w=300&#038;h=105" alt="" title="Compressing a [0..1] float to 3-bit unorm" width="300" height="105" class="alignleft size-medium wp-image-363" /></a> First let&#8217;s mark all values that are exactly representable using the decode function on the 0..1 range (the top row of numbers, and black lines denote these) &#8211; just decode all integers from the range and draw a line. Now, in order to minimize the error, for every number we have to encode we have to pick the closest line, and select the corresponding number. I&#8217;ve drawn red lines that are exactly in the middle of corresponding black lines; all numbers between two red lines (which correspond to values in the row labeled &#8216;original&#8217;) will be encoded to the same number. The number each subrange should encode to is specified in the bottommost row.</p>
<p>Now we can visualize the encoding; all that&#8217;s left is to provide a function. Note that the encoding is not exactly uniform &#8211; the size of leftmost and rightmost subranges is half that of all other subranges. This is not a problem, since we&#8217;re optimizing for the minimal error, not for the equal range length.</p>
<p>The function is easy &#8211; if you multiply all numbers from the row &#8216;original&#8217; by 7 (2^n &#8211; 1), you&#8217;ll see that all that&#8217;s left is to apply the round-to-nearest function; since we&#8217;re limited to unsigned numbers, the encode function is encode(x) = int (x / 7.0 + 0.5). (which is a standard way to turn round-to-zero, which is the C float-to-int cast behavior, to round-to-nearest for positive numbers).</p>
<p><a href="http://zeuxcg.files.wordpress.com/2010/12/snorm.png"><img src="http://zeuxcg.files.wordpress.com/2010/12/snorm.png?w=300&#038;h=110" alt="" title="Compressing a [-1..1] float to 3-bit snorm" width="300" height="110" class="alignleft size-medium wp-image-364" /></a> Here is another image for the signed numbers, using Direct3D 10 rules. The range is [-1..1], we still have a 3-bit integer with [-4..3] range &#8211; we&#8217;re going to provide an encoding function that gives us the number in [-3..3] range. Using exactly the same reasoning as above, to encode the number we have to multiply it by 3, and then round to the nearest integer. Be careful &#8211; since float-to-int cast does a round-to-zero, or a truncate, the round function is slightly more complex. The encode function is as follows: encode(x) = int (x / 3.0 + (x &gt; 0 ? 0.5 : -0.5)).</p>
<p>Just for reference, three functions for quantizing values to 8 bits are:</p>
<pre class="brush: cpp;">
// Unsigned quantization: input: [0..1] float; output: [0..255] integer
encode(x) = int (x * 255.0 + 0.5)

// Signed quantization for D3D10 rules: input: [-1..1] float; output: [-127..127] integer
encode(x) = int (x * 127.0 + (x &gt; 0 ? 0.5 : -0.5))

// Signed quantization for OpenGL rules: input: [-1..1] float; output: [-128..127] integer
encode(x) = int (x * 127.5)
</pre>
<p>These functions are the perfect foundation for the next step: reducing the size of vertex buffer by reducing the vertex size. Until next time!</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/zeuxcg.wordpress.com/356/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/zeuxcg.wordpress.com/356/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/zeuxcg.wordpress.com/356/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/zeuxcg.wordpress.com/356/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/zeuxcg.wordpress.com/356/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/zeuxcg.wordpress.com/356/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/zeuxcg.wordpress.com/356/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/zeuxcg.wordpress.com/356/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/zeuxcg.wordpress.com/356/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/zeuxcg.wordpress.com/356/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/zeuxcg.wordpress.com/356/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/zeuxcg.wordpress.com/356/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/zeuxcg.wordpress.com/356/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/zeuxcg.wordpress.com/356/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=zeuxcg.org&amp;blog=15741095&amp;post=356&amp;subd=zeuxcg&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://zeuxcg.org/2010/12/14/mesh-optimization-quantizing-floats/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/b6b1c2c000b5e36a035cc78ff8f071d3?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">zeuxcg</media:title>
		</media:content>

		<media:content url="http://zeuxcg.files.wordpress.com/2010/12/unorm.png?w=300" medium="image">
			<media:title type="html">Compressing a [0..1] float to 3-bit unorm</media:title>
		</media:content>

		<media:content url="http://zeuxcg.files.wordpress.com/2010/12/snorm.png?w=300" medium="image">
			<media:title type="html">Compressing a [-1..1] float to 3-bit snorm</media:title>
		</media:content>
	</item>
		<item>
		<title>Exit code trivia</title>
		<link>http://zeuxcg.org/2010/12/06/exit-code-trivia/</link>
		<comments>http://zeuxcg.org/2010/12/06/exit-code-trivia/#comments</comments>
		<pubDate>Mon, 06 Dec 2010 17:47:33 +0000</pubDate>
		<dc:creator>zeuxcg</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://zeuxcg.org/?p=350</guid>
		<description><![CDATA[Whenever there is an automated process involved, such as asset/code building, unit testing, automatic version packaging, bulk log processing, etc., there often is a set of command-line tools which do their thing and return the result. Then there is a &#8230; <a href="http://zeuxcg.org/2010/12/06/exit-code-trivia/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=zeuxcg.org&amp;blog=15741095&amp;post=350&amp;subd=zeuxcg&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Whenever there is an automated process involved, such as asset/code building, unit testing, automatic version packaging, bulk log processing, etc., there often is a set of command-line tools which do their thing and return the result. Then there is a calling process (which may be as simple as a batch file, or as complex as IncrediBuild), which launches the tool and acts upon success/failure.</p>
<p>In the world of command-line tools, success/failure is represented with exit code. However, it is important to understand that exit codes are to be treated carefully. <span id="more-350"></span></p>
<p>Here is a rough set of guidelines to handling exit codes:</p>
<ul>
<li>The canonical success code is 0, not 1. This is also true for return codes of functions &#8211; 0 always makes success. Never return 1 from your command-line tool to communicate success &#8211; no caller will expect this.</li>
<li>Related to the above &#8211; there should be only one success code, i.e. everything else should be treated as error. There is no unambiguous encoding for several success values; the user probably does not care about details, the success is enough; for some system calls, like <code>system()</code>, cross-platform handling of different success values results in extra work (Windows returns the exit code as is, Linux returns a value that contains the exit code and additional information).</li>
<li>In utmost majority of cases you don&#8217;t need more than one error code either. The reasons are the same.</li>
<li>Even if you decide to use several error codes, do not use negative numbers. Some negative numbers may be used as special values for functions that normally return exit codes &#8211; in fact, one such number is -1; the family of <code>spawn</code> functions return -1 on error, so if you return -1 from your tool, the resulting error will be unexpected &#8211; we had one such case with SCons, where the matters were additionally complicated by the fact that -1 raised an OSError exception, which was swallowed by the SCons internals for some weird reason).</li>
<li>If the tool fails, returning an error code is not enough &#8211; you should output the additional error information, which should be as detailed as needed to be able to further investigate the issue (i.e. don&#8217;t return &#8216;file load failed&#8217; flag, print the name of file that the program failed to open, and the error code).</li>
<li>As a somewhat related thing, if the tool succeeds, prefer less verbose output. An ideal tool is the tool that outputs zero lines of information if it succeeded (which reduces the clutter, enables easier detection of warnings, and generally makes people pay attention to the problems in the automated process because they are the only thing that&#8217;s printed!). If you need debugging/statistics information, consider adding a separate command-line flag. If you need version information for diagnostics, output it when a special command-line flag is used, not for every build.</li>
<li>Be careful with batch files. It is very easy to accidentally lose an exit code in the batch file. In fact, if you can avoid batch files completely or make them one-liners that call your script interpreter of choice, do it; if you can&#8217;t, still try to go that way as far as possible.</li>
</ul>
<p>So basically, if you only use 0 (success) and 1 (failure) exit codes, return additional failure information via stdout/stderr, and don&#8217;t pollute stdout with things that are not indications of some problem, the users of your command line tool will love you.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/zeuxcg.wordpress.com/350/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/zeuxcg.wordpress.com/350/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/zeuxcg.wordpress.com/350/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/zeuxcg.wordpress.com/350/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/zeuxcg.wordpress.com/350/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/zeuxcg.wordpress.com/350/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/zeuxcg.wordpress.com/350/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/zeuxcg.wordpress.com/350/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/zeuxcg.wordpress.com/350/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/zeuxcg.wordpress.com/350/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/zeuxcg.wordpress.com/350/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/zeuxcg.wordpress.com/350/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/zeuxcg.wordpress.com/350/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/zeuxcg.wordpress.com/350/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=zeuxcg.org&amp;blog=15741095&amp;post=350&amp;subd=zeuxcg&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://zeuxcg.org/2010/12/06/exit-code-trivia/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/b6b1c2c000b5e36a035cc78ff8f071d3?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">zeuxcg</media:title>
		</media:content>
	</item>
		<item>
		<title>Optimizations that aren&#8217;t</title>
		<link>http://zeuxcg.org/2010/11/29/optimizations-that-arent/</link>
		<comments>http://zeuxcg.org/2010/11/29/optimizations-that-arent/#comments</comments>
		<pubDate>Sun, 28 Nov 2010 21:15:10 +0000</pubDate>
		<dc:creator>zeuxcg</dc:creator>
				<category><![CDATA[Algorithms]]></category>
		<category><![CDATA[COLLADA]]></category>
		<category><![CDATA[Optimization]]></category>

		<guid isPermaLink="false">http://zeuxcg.org/?p=340</guid>
		<description><![CDATA[We all like it when our code is fast. Some of us like the result, but dislike the process of optimization; others enjoy the process. However, optimization for the sake of optimization is wrong, unless you&#8217;re doing it in your &#8230; <a href="http://zeuxcg.org/2010/11/29/optimizations-that-arent/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=zeuxcg.org&amp;blog=15741095&amp;post=340&amp;subd=zeuxcg&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>We all like it when our code is fast. Some of us like the result, but dislike the process of optimization; others enjoy the process. However, optimization for the sake of optimization is wrong, unless you&#8217;re doing it in your pet project. Optimized code is sometimes less readable and, consequently, harder to understand and modify; because of that, optimization often introduces subtle bugs.</p>
<p>Since optimization is not a process with only positive effects, in production it&#8217;s important that optimization process follows certain guidelines that make sure the optimization does more good than bad. An example set of optimization steps would be: <span id="more-340"></span></p>
<ol>
<li>Make sure that the code you&#8217;re optimizing works. If possible, it should be covered by tests; otherwise one can resort to saving the results that the code produces, i.e. a data array for a particular input or a screenshot.</li>
<li>Measure the performance of the target code in a specific situation, for example on a fixed set of input data, or, in case of games, at the very beginning of the level, or measure the average/maximum timings across the whole level.</li>
<li>Verify that the measurements are precise enough, i.e. don&#8217;t have a very large variation between runs.</li>
<li>Verify that the performance is inadequate for your target requirements (you can&#8217;t start optimizing if you don&#8217;t know your target requirements). It&#8217;s important that the measured situation is common enough &#8211; ideally you should measure in the worst possible circumstances for the code, which are still possible in the target product (i.e. if the unit number cap is 1000, profile with 1000 units). If necessary, make several measures in different situations.</li>
<li>Record the timings/memory statistics/other performance-related information.</li>
<li>Optimize the code using any available means, starting with the ones that are easier to code and minimally affect maintainability. In game development, if there is a substantial gain that is necessary, maintainability reasons should probably be cast aside.</li>
<li>Check that the code still works (run unit tests, compare the results with that from 1.)</li>
<li>Measure using the same data from 2., compare the results, repeat the process if necessary.</li>
</ul>
<p></p>
<p>There are two absolutely crucial things here &#8211; make sure that the code still works, and have proper profiling before- and after- results. Often it&#8217;s useful to make a note of the results after each significant chunk of optimization, and save the results somewhere &#8211; some optimizations might get in the way later, and with the records you&#8217;ll probably be able to separate critical optimizations from less critical.</p>
<p>If you did not verify the code, it&#8217;s possible that the code now does something different &#8211; such optimization is usually bad (one exception is rendering algorithms, where usually you can replace &#8216;is exactly the same&#8217; with &#8216;looks something like&#8217; or even &#8216;is noticeably different, but the artists like it better/can live with it&#8217;).</p>
<p>If you did not profile the code, you don&#8217;t know if it works faster, and if it does, if it is considerably faster. Such optimization is worthless.</p>
<p>I have an actual story about that. Unfortunately, the information I have is incomplete &#8211; I have the code with an &#8220;optimization&#8221; that considerably decreases the actual performance, but I don&#8217;t have the change history. Still.</p>
<p>There is (was?) a COLLADA Exporter by Feeling Software, which, given an input Maya scene, produces a COLLADA XML document. This process is done at export time, which is either triggered by the artist manually, or is done automatically during the build process. The performance requirements for such tools are obviously different from the ones of a game &#8211; but optimizing the content pipeline response time is arguably equally important to optimizing game framerate, because faster iteration times and a good team mean more iterations, and more iterations mean more polished product.</p>
<p>Back at CREAT Studios, we used COLLADA pipeline for Maya/Max export; we tried to avoid touching the code, but sometimes we could not avoid it. An awesome export response time for a mesh is one second; a good one is ten seconds. We had some models that exported for several minutes. After some profiling several issues showed up &#8211; and here is one of them.</p>
<p>During the export, there are several parts of a document that can reference the same nodes from Maya DAG (Directed Acyclic Graph, pretty much the entire scene in Maya is a DAG); it is necessary to &#8216;sample&#8217; the said nodes (i.e. to get the values of some attributes for these nodes for different time values). Sampling can be slow in Maya, because it can involve complex updates of the DAG &#8211; to accelerate that, there is a special class, CAnimCache, that caches the sampling requests. The key for the sampling request is a pair (object, attribute), the value is the list of attribute values and several flags. object is represented as MObject, plug is represented as MPlug.</p>
<p>The cache is organized as follows: there is an associative container with the key being the object, and the value being a list of parts. Each part holds the attribute and the cached value:</p>
<pre class="brush: cpp;">
struct Part { MPlug plug; FloatList values; };
struct Node { MObject node; vector&lt;Part&gt; parts; };

struct Cache
{
    map&lt;MObject, Node*&gt; cache;
};
</pre>
<p>The code looks reasonable &#8211; the cache lookup is logarithmic in terms of object count and then linear in attribute count &#8211; objects usually have a modest amount of attributes, it should be fast enough. The cache key could probably be a pair of pointers, but oh well.</p>
<p>Still, somebody thought that this code is not fast enough. I do not know if the necessary performance tests were made &#8211; I guess they were not, or maybe the map was not a map but a vector when the change was made &#8211; anyway, somebody thought that this code is not fast enough, specifically that the map lookup is slow.</p>
<p>It&#8217;s easy to optimize the map lookup if we assume that the consecutive cache lookups happen with the same object, but with a different attribute &#8211; this is a reasonable assumption and it holds in practice. So, the code was modified and looked like this:</p>
<pre class="brush: cpp;">
struct Cache
{
    map&lt;MObject, Node*&gt; cache;
    Node* search;

    Cache(): search(NULL) {}

    bool FindCacheNode(const MObject&amp; node)
    {
        iterator it = cache.find(node);
        if (it != cache.end())
        {
            search = it-&gt;second;
            return true;
        }
        return false;
    }

    void CachePlug(const MPlug&amp; plug)
    {
        if (search == NULL || search-&gt;node != plug.node()) FindCacheNode(plug.node());
        if (search == NULL)
        {
            search = new Node(plug.node());
            cache.insert(plug.node(), search);
        }

        /* additional processing of the search node */
    }
};
</pre>
<p>Can you spot the problem?</p>
<p>At the first call to CachePlug, search is NULL, so the function FindCacheNode is called, which does not find the node. search is still NULL, so a new node is inserted; now search points to this node.</p>
<p>At the next call to CachePlug with a different MObject, search is non-NULL, but the node is different, so FindCacheNode is called again. It can&#8217;t find the desired node &#8211; after all, nobody inserted it! &#8211; so it returns false&#8230; <b>without resetting search to NULL!</b>. In fact, nobody ever resets search to NULL &#8211; so nobody adds new Node&#8217;s &#8211; so the map always has one element, and the parts vector contains all attributes of all nodes in the scene! As you can imagine, this makes all functions from the cache linear in terms of scene object count, and thus the whole export process quadratic. All functions still worked, but the export was slow for large scenes.</p>
<p>It is hard to reconstruct the sequence of events without a change history &#8211; however, one thing is certain. At some point here somebody did an optimization without any prior profiling (map lookup could not be a serious factor &#8211; after I fixed the bug, the functions from this class were nowhere near the profile top), and without any profiling after the change &#8211; otherwise he&#8217;d spot the bug.</p>
<p>The code travels in sometimes unexpected ways. A year ago I found the same issue in OpenCOLLADA, which inherited some code from Feeling Software exporter. (it was fixed after my report).</p>
<p>Optimization without profiling is wrong. Profiling without measuring and comparing the results is wrong. Please do not do either of that. And please, look at your code in the profiler once in a while, even if the performance is tolerable &#8211; you&#8217;ll find things you didn&#8217;t expect.</p>
<p>P.S. The credit to discovering the optimization bug actually goes to Peter Popov (of the Linux RSX fame).</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/zeuxcg.wordpress.com/340/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/zeuxcg.wordpress.com/340/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/zeuxcg.wordpress.com/340/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/zeuxcg.wordpress.com/340/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/zeuxcg.wordpress.com/340/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/zeuxcg.wordpress.com/340/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/zeuxcg.wordpress.com/340/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/zeuxcg.wordpress.com/340/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/zeuxcg.wordpress.com/340/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/zeuxcg.wordpress.com/340/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/zeuxcg.wordpress.com/340/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/zeuxcg.wordpress.com/340/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/zeuxcg.wordpress.com/340/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/zeuxcg.wordpress.com/340/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=zeuxcg.org&amp;blog=15741095&amp;post=340&amp;subd=zeuxcg&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://zeuxcg.org/2010/11/29/optimizations-that-arent/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/b6b1c2c000b5e36a035cc78ff8f071d3?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">zeuxcg</media:title>
		</media:content>
	</item>
		<item>
		<title>Z7: Everything old is new again</title>
		<link>http://zeuxcg.org/2010/11/22/z7-everything-old-is-new-again/</link>
		<comments>http://zeuxcg.org/2010/11/22/z7-everything-old-is-new-again/#comments</comments>
		<pubDate>Sun, 21 Nov 2010 21:45:27 +0000</pubDate>
		<dc:creator>zeuxcg</dc:creator>
				<category><![CDATA[C++]]></category>
		<category><![CDATA[Compilation speed]]></category>
		<category><![CDATA[Debugging]]></category>

		<guid isPermaLink="false">http://zeuxcg.org/?p=333</guid>
		<description><![CDATA[Debug information is the data that allows the debugger to, uhm, debug your program. It consists of the information about all types used in the program, of source line information (what instruction originated from what source line), of variable binding &#8230; <a href="http://zeuxcg.org/2010/11/22/z7-everything-old-is-new-again/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=zeuxcg.org&amp;blog=15741095&amp;post=333&amp;subd=zeuxcg&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Debug information is the data that allows the debugger to, uhm, debug your program. It consists of the information about all types used in the program, of source line information (what instruction originated from what source line), of variable binding information (to know where on the stack frame/in register pool each local variable is stored) and other things that help you debug your program.</p>
<p>There are two different ways to store the debug information for C/C++ code<span id="more-333"></span>: one follows the &#8216;separate compilation&#8217; model of C++ and stores debug information in the object file for each translation unit, another adopts the &#8216;everything is a huge database&#8217; model and stores debug information for the whole project in a single database. The first approach is the one taken by GCC; MSVC, on the other hand, uses the second approach by default.</p>
<p>Here&#8217;s how it works in practice: suppose you have an application project, <code>game</code>, that references two static library projects, <code>render</code> and <code>sound</code>. There is a single database file (which has .pdb extension) for each project &#8211; they usually are located in the same intermediate folder as object files &#8211; so in this example we have three PDB files, which by default are all called something like vc80.pdb, depending on the MSVS version &#8211; but, since you can change that, we&#8217;ll assume they&#8217;re called <code>game.pdb</code>, <code>render.pdb</code> and <code>sound.pdb</code>. While the files in all projects are compiling, the compiler computes the debugging information for the current translation unit and updates the corresponding .pdb file.</p>
<p>However, the debugger can&#8217;t work with multiple pdb files &#8211; it wants a single PDB file. So the linker, in the process of linking the final application, in our case <code>game</code> project, merges all PDB files in a single file &#8211; let&#8217;s call it <code>gamefinal.pdb</code>. The linker gets paths to all PDB files from object files (or from object files inside static libraries), reads debug information from them, generates a single PDB file, writes it to disk and stores the path to this file in the executable (exe or dll). Debugger reads the PDB path from the executable module and uses the debugging information from that file.</p>
<p>There are some nice properties of this system:</p>
<ul>
<li>The resulting debugging information is separate from the executable &#8211; you can generate it for all builds, including retail, but don&#8217;t redistribute the pdb. In fact, <b>please always generate the debugging information for all builds!</b> Prior to Visual Studio 2010 the default settings for Release configuration excluded any debug information, which is unfortunate.</li>
<li>The mechanism for discovering the &#8220;source&#8221; PDB files at link stage is flexible &#8211; I&#8217;ve described the default setup for freshly created projects, however you can modify it &#8211; you can have all projects update a single PDB file, or you can have 1 PDB per object file. Linker will work regardless of the setup.</li>
</ul>
<p>However, there is a problem &#8211; what if several files are compiled in parallel? In case they refer to the same PDB file, we have to use some synchronization mechanism. This concern (perhaps there were other reasons that I&#8217;m not aware of) led to the following design &#8211; there is a server process, called <code>mspdbsrv.exe</code>, which handles PDB file operations and ensures safe concurrent access. Compiler uses the server to update PDB files, linker uses the server to read source PDB files and update the final PDB file. Some operations are apparently asynchronous &#8211; you can sometimes observe that even though the linker process has exited, the final PDB file processing is not finished, which can lead to file access errors.</p>
<p>So, now everything works fine, right? Almost.</p>
<p>When you&#8217;re using distributed compilation, i.e. via IncrediBuild, the compiler processes are run on different machines. They update some PDB file locally, which is then transferred to your machine. However, this effectively disables the PDB server operations &#8211; instead of a single server process that updates all PDB files, there are now multiple server processes, one for each worker machine! This leads to disaster, which manifests in corrupted PDB files and can be easily observed if you try to use make/scons/jam/any other build system with MSVC + IncrediBuild + compiler-generated PDB files.</p>
<p>IncrediBuild has a special hack in order to make this work &#8211; when you compile the solution via Microsoft Visual Studio, IncrediBuild modifies the build command line by splitting the PDB file for each project into several files, making sure that all files with the same PDB name go to the same agent. You should be able to use the same hack for make/scons/jam, since you can declare that you tool behaves like cl.exe in IncrediBuild profile, but I don&#8217;t know the details and couldn&#8217;t get it to work.</p>
<p>It turns out that MSVC initially used the first debug information storage approach &#8211; i.e. it stored the debug information in object files. Moreover, this mode is still available via the /Z7 switch (this is the so-called &#8216;old style debug information&#8217;, or &#8216;C7 Compatible&#8217; in the MSVC GUI &#8211; you can find the setting in Project Properties -&gt; C++ -&gt; General -&gt; Debug Information Format). This has the following implications:</p>
<ul>
<li>Debug information is now local to translation unit &#8211; there are no races in case of concurrent compilation by design.</li>
<li>The PDB server is no longer used during the compilation, because it is not needed.</li>
<li>The linker reads debug information from object files directly, instead of looking for PDB path and opening the PDB (in fact, there is no PDB path in object files).</li>
<li>Static libraries contain embedded object files, so a static library file is now self-contained &#8211; it contains all information that&#8217;s necessary for linking</li>
</ul>
<p>Obviously, the compile and link file access pattern change greatly. The change in compilation/linking times is hard to estimate &#8211; on one hand, with /Zi all debug information was consolidated in a single PDB file (per project), now it&#8217;s scattered throughout object files (which, by the way, increases the size of intermediate files because now there is duplicate debug information), on the other hand the linker should read object files anyway, so locality should not be worse. Also, we eliminate a theoretical synchronization bottleneck (the PDB server), so multiprocess builds can get faster.</p>
<p>Here are my completely unscientific benchmark results on OGRE builds with cold cache in four build variants: /Zi (PDB files, single core build), /Zi /MP (PDB files, multicore build), /Z7 (no PDB files, single core build), /Z7 /MP (no PDB files, multicore build). For each configuration, I did a clean build of the OgreMain.dll using a new source folder every time, then I rebooted to force file cache cleanup, changed a single source file and did a build once again. Both compilation and linking times are included. The tests were done on a Core i7 920.</p>
<table>
<tr>
<th></th>
<th>/Zi</th>
<th>/Zi /MP</th>
<th>/Z7</th>
<th>/Z7 /MP</th>
</tr>
<tr>
<td>clean cl</td>
<td>6:45</td>
<td>1:51</td>
<td>6:32</td>
<td>1:32</td>
</tr>
<tr>
<td>clean link
<td>0:20</td>
<td>0:20</td>
<td>0:17</td>
<td>0:17</td>
</tr>
<tr>
<td>incremental cl</td>
<td>0:15</td>
<td>0:15</td>
<td>0:08</td>
<td>0:08</td>
</tr>
<tr>
<td>incremental link</td>
<td>0:17</td>
<td>0:17</td>
<td>0:24</td>
<td>0:24</td>
</tr>
</table>
<p>While there are some savings for the clean build, the total incremental build time is the same (which can be explained if this is the cost of reading old debug information &#8211; since it is moved from link time to compilation of the single changed source file). With that in mind, Z7 and Zi are probably more or less interchangeable &#8211; unless you need Edit &amp; Continue support, which is not supported with old-style debug information. Still, I like the /Z7 approach better.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/zeuxcg.wordpress.com/333/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/zeuxcg.wordpress.com/333/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/zeuxcg.wordpress.com/333/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/zeuxcg.wordpress.com/333/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/zeuxcg.wordpress.com/333/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/zeuxcg.wordpress.com/333/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/zeuxcg.wordpress.com/333/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/zeuxcg.wordpress.com/333/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/zeuxcg.wordpress.com/333/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/zeuxcg.wordpress.com/333/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/zeuxcg.wordpress.com/333/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/zeuxcg.wordpress.com/333/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/zeuxcg.wordpress.com/333/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/zeuxcg.wordpress.com/333/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=zeuxcg.org&amp;blog=15741095&amp;post=333&amp;subd=zeuxcg&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://zeuxcg.org/2010/11/22/z7-everything-old-is-new-again/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/b6b1c2c000b5e36a035cc78ff8f071d3?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">zeuxcg</media:title>
		</media:content>
	</item>
		<item>
		<title>#include &lt;rules&gt;</title>
		<link>http://zeuxcg.org/2010/11/15/include-rules/</link>
		<comments>http://zeuxcg.org/2010/11/15/include-rules/#comments</comments>
		<pubDate>Mon, 15 Nov 2010 19:45:25 +0000</pubDate>
		<dc:creator>zeuxcg</dc:creator>
				<category><![CDATA[C++]]></category>
		<category><![CDATA[Compilation speed]]></category>

		<guid isPermaLink="false">http://zeuxcg.org/?p=323</guid>
		<description><![CDATA[We&#8217;re stuck with C++, at least for another console generation. C++ has many quirks that I wish were not there, but there is no real alternative as of today. While modern languages tend to adopt the bulk compilation and/or smart &#8230; <a href="http://zeuxcg.org/2010/11/15/include-rules/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=zeuxcg.org&amp;blog=15741095&amp;post=323&amp;subd=zeuxcg&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>We&#8217;re stuck with C++, at least for another console generation. C++ has many quirks that I wish were not there, but there is no real alternative as of today. While modern languages tend to adopt the bulk compilation and/or smart linkers and so can have a proper module system and eat the cake too, C++ is stuck with header files (on the other hand, C++ builds are incremental and almost embarrassingly parallel). While the strategy of dealing with header files and staying sane seems more or less obvious, I&#8217;m amazed as to how many people still get this wrong. I hope that this post helps to clear the mud somewhat. The post applies to C as well, but is useless for people who are blessed to work with other languages. <span id="more-323"></span></p>
<p>The problem with include files is that the preprocessor is usually quite dumb &#8211; you tell it to include the file, it includes the entire contents of the file, recursively. If you don&#8217;t tell it to include the file but try to use the symbol from that file &#8211; you get a compilation error. If you tell it to include too many files, it includes all of them, and the compilation time suffers.</p>
<p>In general, the more a header is included in other files (including transitive inclusion, i.e. A includes B includes C means that A indirectly includes C), the more files you&#8217;ll need to recompile once the header changes. Iteration time is very important &#8211; which is a topic for another time &#8211; so we&#8217;d like to minimize the amount of header inclusion. This brings us to the first important rule: <b>Each file should include the minimum amount of files</b>. The rule helps ensure that your code builds fast.</p>
<p>Now, let&#8217;s suppose that the header file contains a class declaration. By the nature of C++, a class declaration won&#8217;t compile without some other declarations &#8211; for example if a class A inherits from a class B and contains a field of type C, then you have to give the compiler declarations of both B and C in the same translation unit (i.e. in the cpp file that you&#8217;re compiling &#8211; after preprocessor has done its work) &#8211; before A&#8217;s declaration. Now, there are two options here &#8211; you can either include the relevant header files in the header with A&#8217;s declaration, or force the user to always include B and C headers manually before A. The problem is that sometimes the user does not know about these dependencies (i.e. the field of type B can be private), sometimes the dependencies change, so every time you&#8217;re adding some declaration dependencies to your types you&#8217;re breaking user&#8217;s code, and, since declaration dependencies are transitive, often to include a single header you&#8217;ll need a dozen or more seemingly unrelated ones. For this reasons, it&#8217;s important for all headers to be self-contained &#8211; anybody should be able to include any header in any cpp file without compilation errors. Which brings us to the second important rule &#8211; <b>each file should include all dependent headers</b>, i.e. for each declaration that&#8217;s required by the compiler there should be a corresponding include. This rule helps ensure that the programmers stay sane.</p>
<p>These two rules together define the algorithm for proper header file authoring: for each required declaration, include a corresponding header in your header file; don&#8217;t include more headers than that. In order to guarantee that you did not forget the necessary headers, <b>make sure that your header file is the first #include in the corresponding source file</b>, except the common header, if your codebase has one.</p>
<p>Do not include a header for a dependency declaration where a forward declaration will suffice; <b>use forward declarations when possible</b> (if you&#8217;re not familiar with forward declarations, google it). Sometimes it pays off to go to extra lengths to remove header dependencies, using techniques like pimpl &#8211; this depends on the exact situation, but <b>avoid including heavy platform files, like windows.h or d3d9.h, to popular headers</b> (I&#8217;ve written about a way to make a slim version of d3d9.h in a <a href="http://zeuxcg.org/2009/03/22/miscellanea/">blog post</a>, scroll down to the last section).</p>
<p>With the rules above, there is only one thing left &#8211; since we can include a header twice accidentally (i.e. A depends on B and C, and B depends on C, so C is included twice into A), we&#8217;ll need some protection against that. So each file should include the guards against multiple inclusion. There are two methods for this &#8211; either use #pragma once or use header guards. #pragma once is a non-standard technique, that tells the preprocessor explicitly &#8220;don&#8217;t include this file more than once in a single translation unit&#8221;. Header guards can emulate the behavior using preprocessor defines:</p>
<pre class="brush: cpp;">
#ifndef FILE_NAME_H
#define FILE_NAME_H
...
#endif
</pre>
<p>Many people don&#8217;t know this, but #pragma once is widely supported in modern compilers. It&#8217;s superior to header guards in two ways: it can be faster than header guards (i.e. MSVC does not read the file with #pragma once more than once, but does read the file with header guards several times), and it&#8217;s foolproof &#8211; you don&#8217;t have to invent the identifier for a header so you can&#8217;t screw it. So <b>use #pragma once if you can, use header guards if you must</b>. If some compilers that you use don&#8217;t support #pragma once and you can&#8217;t convince the vendors to add the feature, <b>make sure that the header guards are unique using a deterministic generation algorithm</b>. For example, you can use something like &#8220;take the list consisting of the name of the project, and all components of the relative file path; convert all elements to upper case and join with underscore&#8221;, resulting with identifiers like THEGAME_RENDER_LIGHTING_POINTLIGHT_H. Do <b>not</b> use short file names alone, they are <b>not</b> unique! (unless your coding standard requires that). Oh, and if you don&#8217;t use an autogenerating macro, don&#8217;t put a comment after the #endif (i.e. #endif // THEGAME_RENDER_LIGHTING_POINTLIGHT_H) &#8211; such comments are only useful as a copy-paste history.</p>
<p>While using header guards allows you to have the same file included several times in a single translation unit, it also allows you to test whether the file was already included, i.e. #ifdef THEGAME_RENDER_LIGHTING_POINTLIGHT_H. <b>You should never conditionally exclude a section of a header file based on whether some file was included!</b> Doing this introduces the inclusion order dependency which is unnatural, and hard to debug without a preprocessor output. If you&#8217;re thinking about something like &#8220;oh, if the renderer interface was included, I should probably provide a light renderer class, but otherwise it would just add unnecessary clutter&#8221;, you should split your header file in two parts, and the second part should explicitly include the renderer interface, since it depends on it.</p>
<p>At least in game development, the language is frequently extended with some generally useful primitives that are used throughout the whole codebase. The most used one is probably an assertion macro (since the standard one sucks, you should have your own), but there are other examples &#8211; logging facilities, fixed-size types, min/max functions, various platform/configuration defines (&#8220;are we on a big-endian platform?&#8221;), memory management-related macros. It&#8217;s common practice to put all of those in a single common header file; you should control the size of this file (where by &#8216;size&#8217; I mean the cumulative size of all headers it includes, of course), and you should <b>make sure that each source file includes the common header before everything else</b> &#8211; otherwise you&#8217;ll get into trouble (sometimes you&#8217;ll spend several hours looking for the reasons &#8211; i.e. if you include a header that checks platforms endianness before the common file, you&#8217;re in the world of hurt).</p>
<p>Well, I think that&#8217;s all about header files; there are also the include paths though. In order to include the file, you have to specify the path to it &#8211; either a &#8220;relative to the current file&#8221; path, or &#8220;relative to one of the include directories&#8221; path. There are two important goals here:</p>
<ul>
<li><b>If you&#8217;re writing a library</b> &#8211; a relatively small one, i.e. not a platform like Unreal Engine &#8211; the header files should require minimal configuration, so ideally the user does not have to add include directories to compile or use your library. For such projects, <b>consider making all include paths current file-relative</b>.</li>
<li>Otherwise, include paths should be easily greppable &#8211; the path to the same file should ideally be the same in all other files. So <b>make all include paths include directory-relative</b>; moreover, try to make sure that <b>include paths are unambiguous</b> &#8211; i.e. that you don&#8217;t have two different representations for the same file path, like  and  inside render project.</li>
<li>Whatever rule you use, try to <b>make sure it&#8217;s consistent between different projects</b>, as much as necessary. Ideally even the include directories should be the same, i.e. include directories for the engine project should be a strict subset of include directories for the game project.</li>
</ul>
<p>And as a final advice &#8211; learn to use the preprocessor output (cl /E, gcc -E), learn to use the include output (cl /showIncludes, gcc -M), gather the codebase statistics (average size after preprocessing, most included header files, header files with largest payload, etc.) and optimize your codebase by eliminating dependencies and spreading the word. Nothing beats a sub-second iteration time.</p>
<p>Oh, did I mention that good header dependencies decrease the linking time?</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/zeuxcg.wordpress.com/323/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/zeuxcg.wordpress.com/323/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/zeuxcg.wordpress.com/323/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/zeuxcg.wordpress.com/323/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/zeuxcg.wordpress.com/323/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/zeuxcg.wordpress.com/323/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/zeuxcg.wordpress.com/323/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/zeuxcg.wordpress.com/323/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/zeuxcg.wordpress.com/323/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/zeuxcg.wordpress.com/323/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/zeuxcg.wordpress.com/323/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/zeuxcg.wordpress.com/323/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/zeuxcg.wordpress.com/323/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/zeuxcg.wordpress.com/323/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=zeuxcg.org&amp;blog=15741095&amp;post=323&amp;subd=zeuxcg&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://zeuxcg.org/2010/11/15/include-rules/feed/</wfw:commentRss>
		<slash:comments>23</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/b6b1c2c000b5e36a035cc78ff8f071d3?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">zeuxcg</media:title>
		</media:content>
	</item>
		<item>
		<title>Lua callstack with C++ debugger</title>
		<link>http://zeuxcg.org/2010/11/07/lua-callstack-with-c-debugger/</link>
		<comments>http://zeuxcg.org/2010/11/07/lua-callstack-with-c-debugger/#comments</comments>
		<pubDate>Sun, 07 Nov 2010 18:40:03 +0000</pubDate>
		<dc:creator>zeuxcg</dc:creator>
				<category><![CDATA[Debugging]]></category>
		<category><![CDATA[Lua]]></category>
		<category><![CDATA[Scripting]]></category>

		<guid isPermaLink="false">http://zeuxcg.org/?p=304</guid>
		<description><![CDATA[Lua is a very popular scripting language in game development industry. Many games use Lua for various scripting needs (data representation, UI scripting, AI scripting), and some go as far as write the majority of the game in Lua. At &#8230; <a href="http://zeuxcg.org/2010/11/07/lua-callstack-with-c-debugger/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=zeuxcg.org&amp;blog=15741095&amp;post=304&amp;subd=zeuxcg&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Lua is a very popular scripting language in game development industry. Many games use Lua for various scripting needs (data representation, UI scripting, AI scripting), and some go as far as write the majority of the game in Lua. At CREAT, we used Lua for all of UI scripting, and for AI and other game logic on some projects. And, well, there were times when the game crashed &#8211; and the callstack consisted mainly of Lua functions.</p>
<p>While there are probably very few bugs in Lua library code, and the language is safe so you can&#8217;t get buffer overruns or other madness only via script code, script code itself is useless, because it can&#8217;t do any interaction with the outside world &#8211; user, world state, scoreboard servers, etc. So naturally there is a Lua binding for some C/C++ functions, so that scripts can call them. Now, if one of these functions crashes &#8211; for example, because they got invalid input data &#8211; how do we trace the problem back to the script code? <span id="more-304"></span></p>
<p>Assuming we don&#8217;t want to modify C++/Lua code in any way, nor do we want to restart the game with tracing hook enabled &#8211; the easily reproducible bugs are often a luxury &#8211; we&#8217;re left with the following methods:</p>
<ol>
<li>If the external Lua debugger was attached, it&#8217;s likely that we&#8217;ll be able to get the callstack and the related information from it.</li>
<li>We can trick the game into calling a call stack dumping function (using lua_getstack and lua_getinfo).</li>
<li>We can get the call stack manually, by inspection of Lua data structures.</li>
</ol>
<p>It is possible that you don&#8217;t have a working Lua debugger, do not have it attached or that it does not work at the moment (oh, and the deadline was yesterday). I&#8217;m going to describe the last two approaches here.</p>
<p><big>Use a stack dumping function</big></p>
<p>This approach is superior to the third one because you can have arbitrarily complex logic in the stack dumping function &#8211; i.e. you can print local variables along with the call stack &#8211; and it&#8217;s less tedious. Just make sure your stack dumping function does not crash :) However, unless you have good debugger support for this, calling the function so that the program can work after the point can be problematic.</p>
<p>Anyway, at first you&#8217;ll need the function itself. The trivial implementation looks like this:</p>
<pre class="brush: cpp;">
void lua_stacktrace(lua_State* L)
{
    lua_Debug entry;
    int depth = 0; 

    while (lua_getstack(L, depth, &amp;entry))
	{
        int status = lua_getinfo(L, &quot;Sln&quot;, &amp;entry);
		assert(status);

		dprintf(%s(%d): %s\n&quot;, entry.short_src, entry.currentline, entry.name ? entry.name : &quot;?&quot;);
        depth++;
    }
}
</pre>
<p>In order to get local variable information, you&#8217;ll have to use lua_getlocal and ordinary functions for getting values from Lua stack; this is left as an exercise to the reader.</p>
<p>Now we have the function; you&#8217;ll have to make sure that the function is linked in your executable; just reference it from some other function like this:</p>
<pre class="brush: cpp;">
volatile bool x = false;
if (x) lua_stacktrace(NULL);
</pre>
<p>Now you have to call the function. If you&#8217;re lucky to have a debugger that can do this &#8211; for example, Microsoft Visual Studio can often do this from the Watch or Immediate windows &#8211; then just add the expression <code>lua_stacktrace(L)</code>, where <code>L</code> is the pointer to the Lua state (games often have a single Lua state, in which case I recommend you to save it to the global variable to make debugging easier).</p>
<p>Otherwise, you&#8217;ll have to save all registers and other relevant CPU state, setup the registers/stack so that you can call the function, set the instruction pointer to the first instruction of the function, add a breakpoint to the returning instruction of the function and hit F5. The function code will execute and stop on the breakpoint; here you have to restore all registers and CPU state, restore the instruction pointer and hit F5 again.</p>
<p>You don&#8217;t want to do that.</p>
<p>Seriously, it&#8217;s way too complex and chances are, you&#8217;ll screw something up so that the game will crash anyway. So I recommend to pick a thread you don&#8217;t care about anymore, setup the necessary stuff to call the function and call it &#8211; the thread will not work anymore, but you&#8217;ll have your callstack. I often used the approach to for post-mortem crash debugging, so the program is dead anyway.</p>
<p>Depending on the platform ABI, the relevant setup is different; for example:</p>
<ul>
<li>On x86, the argument is read from stack, using the esp register (esp + 4 should contain the pointer); for MSVC, add a watch <code>*(void**)(esp + 4)</code>, change the value to the lua_State pointer, get the address of the target function by adding a watch <code>lua_stacktrace</code>, go to the function in disassembly window, use &#8220;Set Next Statement&#8221; command on the first instruction, hit F5.</li>
<li>On PowerPC, the argument is read from register r3; add a watch <code>r3</code>, change the value to the lua_State pointer, go to the function in the disassembly window, use &#8220;Set Next Statement&#8221; or the equivalent command of the debugger on the first instruction, hit F5.</li>
</ul>
<p>You&#8217;ll see the call stack and the game will crash, but now you have additional context for the problem and can debug the crash further. If you&#8217;re using this method a lot, I suggest making a less trivial function, which is able to dump locals. Just in case, <code>dprintf</code> in the code above dumps the string to debug window (using <code>OutputDebugStringA</code>); use whatever debugging output available on your platform.</p>
<p><big>Inspect Lua data structures</big></p>
<p>The approach with calling the function is dangerous, since it can stop or corrupt the execution flow; also it requires code execution, which may be unavailable &#8211; for example, you can&#8217;t use it if you&#8217;re debugging via crash dumps on some platforms. Therefore it&#8217;s useful to know how Lua represents the call stack, so that you&#8217;re able to get the call stack information using the safe debugger features, i.e. object state inspection.</p>
<p>As before, I&#8217;ll assume you know the lua_State pointer; it&#8217;ll be referred to as <code>L</code>.</p>
<p>First, we&#8217;ll need to get low-level call stack information. It&#8217;s stored in an array of CallInfo structures, and <code>L</code> has three pointers to it: <code>base_ci</code>, <code>ci</code>, <code>end_ci</code>. Get the stack frame count with <code>L-&gt;ci - L-&gt;base_ci + 1</code> (let&#8217;s assume it&#8217;s 6), then display all of them with <code>L-&gt;base_ci,6</code> (this is a special watch expression, it&#8217;s supported by Microsoft debugger and PS3 debugger &#8211; debuggers for other platforms might have an equivalent feature).</p>
<p>Each callstack entry has two important fields: <code>func</code>, which points to a function object representing the call frame (we&#8217;ll get the function and source file from it), and <code>savedpc</code>, which points to a saved program counter (we&#8217;ll get the line from it).</p>
<p>Function object is a Lua object, which can represent either a Lua function or a C function. We can verify that the interesting entry is a function by checking that <code>L-&gt;base_ci[5].func-&gt;tt</code> equals 6 (LUA_TFUNCTION); after that we&#8217;ll check the type of function with <code>L-&gt;base_ci[5].func-&gt;value.gc-&gt;cl.c.isC</code>.</p>
<p>If it&#8217;s 1, then it is a C function; we can get the function pointer with <code>L-&gt;base_ci[5].func-&gt;value.gc-&gt;cl.c.f</code>, and that&#8217;s it. This function will be in the ordinary call stack of the relevant thread; also, the top stack entry should be the C function, unless you&#8217;re inspecting the state while Lua code is running inside the VM.</p>
<p>The previous frame in our case contains a Lua function (<code>L-&gt;base_ci[4].func-&gt;value.gc-&gt;cl.c.isC</code> is 0), so we&#8217;ll get the additional information for it. The Lua function contains a pointer to the prototype, which is stored in <code>L-&gt;base_ci[4].func-&gt;value.gc-&gt;cl.l.p</code> (it contains a pointer to the <code>Proto</code> object, which is <code>0x00330d80</code> in my case &#8211; I&#8217;ll use this pointer to reduce the watch expression complexity).</p>
<p>Now, we&#8217;re close. The prototype contains the source file path, you can get it with <code>(char*)(&amp;((Proto*)0x00330d80)-&gt;source-&gt;tsv + 1)</code>. It&#8217;s a string, and in Lua string data is situated right after the string header (you can also skip the char* cast and use the <code>,s</code> watch modifier). Now all we need is line information.</p>
<p>Remember <code>savedpc</code> from earlier? This is a pointer which points to some instruction in <code>((Proto*)0x00330d80)-&gt;code</code> array &#8211; you can get the instruction index like this: <code>L-&gt;base_ci[4].savedpc - ((Proto*)0x00330d80)-&gt;code</code>, which is 5 in our case (if you&#8217;re doing address arithmetics by hand, don&#8217;t forget to divide by 4 &#8211; this is the instruction size, thankfully all instructions in Lua are 4 bytes in size). However, this is the instruction that follows the call; we actually need the previous instruction to get the point of call, so the instruction index is 4.</p>
<p>Now all we have to do is to get the line number from <code>lineinfo</code> array: <code>((Proto*)0x00330d80)-&gt;lineinfo[4]</code> (which is 41 in our case).</p>
<p>That&#8217;s all &#8211; we know the source file, we know the source line &#8211; now we can repeat the process above for each call stack entry.</p>
<p>Some final remarks:</p>
<ul>
<li>Since Lua implements tail call optimization, the callstack will sometimes be unexpected &#8211; some entries will be skipped. You can check if that&#8217;s the case by looking at <code>tailcalls</code> field inside CallInfo: <code>L-&gt;base_ci[2].tailcalls</code>.</li>
<li>The first call stack entry (with the index 0) contains nil value; just ignore it.</li>
<li>In complex cases you&#8217;ll have several Lua states (multithreading, coroutines) &#8211; the process of stack unwinding is the same.</li>
<li>You can get local variable values too by using CallInfo <code>top</code> field and looking at function debug metadata; this is more complicated but doable.</li>
<li>If you&#8217;re writing an embeddable language, please make sure that in your product, getting a call stack is at least as easy.</li>
</ul>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/zeuxcg.wordpress.com/304/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/zeuxcg.wordpress.com/304/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/zeuxcg.wordpress.com/304/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/zeuxcg.wordpress.com/304/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/zeuxcg.wordpress.com/304/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/zeuxcg.wordpress.com/304/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/zeuxcg.wordpress.com/304/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/zeuxcg.wordpress.com/304/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/zeuxcg.wordpress.com/304/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/zeuxcg.wordpress.com/304/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/zeuxcg.wordpress.com/304/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/zeuxcg.wordpress.com/304/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/zeuxcg.wordpress.com/304/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/zeuxcg.wordpress.com/304/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=zeuxcg.org&amp;blog=15741095&amp;post=304&amp;subd=zeuxcg&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://zeuxcg.org/2010/11/07/lua-callstack-with-c-debugger/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/b6b1c2c000b5e36a035cc78ff8f071d3?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">zeuxcg</media:title>
		</media:content>
	</item>
		<item>
		<title>Moving on</title>
		<link>http://zeuxcg.org/2010/11/03/moving-on/</link>
		<comments>http://zeuxcg.org/2010/11/03/moving-on/#comments</comments>
		<pubDate>Tue, 02 Nov 2010 20:18:30 +0000</pubDate>
		<dc:creator>zeuxcg</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://zeuxcg.org/?p=298</guid>
		<description><![CDATA[The day has come &#8211; I&#8217;ve left CREAT Studios and started working at Saber Interactive as a PS3 (well, that was obvious) programmer (well, that was obvious too). I worked at CREAT for three years and a half; I&#8217;ve enjoyed &#8230; <a href="http://zeuxcg.org/2010/11/03/moving-on/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=zeuxcg.org&amp;blog=15741095&amp;post=298&amp;subd=zeuxcg&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>The day has come &#8211; I&#8217;ve left <a href="http://www.creatstudios.com/">CREAT Studios</a> and started working at <a href="http://saber3d.com/">Saber Interactive</a> as a PS3 (well, that was obvious) programmer  (well, that was obvious too).</p>
<p>I worked at CREAT for three years and a half; I&#8217;ve enjoyed it immensely &#8211; I had the privilege of working with some smart people, together we built an engine for next generation (then) consoles, and I&#8217;m quite proud of the results. During these years I&#8217;ve helped ship <a href="http://zeuxcg.org/projects/">a lot of PS3 projects</a> &#8211; though none of them were AAA (what does AAA mean anyway?), all of them are good games and some have interesting tech inside. On my last day I got into a TerRover match with my colleagues and only came to at 10 PM &#8211; it was that much fun.</p>
<blockquote><p>You should not have a favourite weapon. To become over-familiar with one weapon is as much a fault as not knowing it sufficiently well.</p></blockquote>
<p>Still there was a brave new world out there &#8211; I wanted to work on projects of larger scale, I wanted to see what other companies look like and to delve into unknown technology to further enhance my understanding of game development &#8211; and here I am. Count me excited!</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/zeuxcg.wordpress.com/298/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/zeuxcg.wordpress.com/298/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/zeuxcg.wordpress.com/298/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/zeuxcg.wordpress.com/298/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/zeuxcg.wordpress.com/298/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/zeuxcg.wordpress.com/298/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/zeuxcg.wordpress.com/298/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/zeuxcg.wordpress.com/298/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/zeuxcg.wordpress.com/298/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/zeuxcg.wordpress.com/298/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/zeuxcg.wordpress.com/298/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/zeuxcg.wordpress.com/298/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/zeuxcg.wordpress.com/298/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/zeuxcg.wordpress.com/298/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=zeuxcg.org&amp;blog=15741095&amp;post=298&amp;subd=zeuxcg&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://zeuxcg.org/2010/11/03/moving-on/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/b6b1c2c000b5e36a035cc78ff8f071d3?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">zeuxcg</media:title>
		</media:content>
	</item>
		<item>
		<title>Source code: Implementing Direct3D for fun and profit</title>
		<link>http://zeuxcg.org/2010/10/25/source-code-implementing-direct3d-for-fun-and-profit/</link>
		<comments>http://zeuxcg.org/2010/10/25/source-code-implementing-direct3d-for-fun-and-profit/#comments</comments>
		<pubDate>Mon, 25 Oct 2010 19:21:26 +0000</pubDate>
		<dc:creator>zeuxcg</dc:creator>
				<category><![CDATA[Direct3D]]></category>

		<guid isPermaLink="false">http://zeuxcg.org/?p=288</guid>
		<description><![CDATA[Almost a year and a half ago I blogged about several useful things that you can do with custom IDirect3DDevice9 implementations. I don&#8217;t know why I did not post the code back then, but anyway &#8211; here it is: dummydevice.h &#8230; <a href="http://zeuxcg.org/2010/10/25/source-code-implementing-direct3d-for-fun-and-profit/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=zeuxcg.org&amp;blog=15741095&amp;post=288&amp;subd=zeuxcg&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Almost a year and a half ago I blogged <a href="http://zeuxcg.org/2009/06/08/implementing-direct3d-for-fun-and-profit/">about several useful things that you can do with custom IDirect3DDevice9 implementations</a>. I don&#8217;t know why I did not post the code back then, but anyway &#8211; here it is:</p>
<p><a href="http://www.everfall.com/paste/id.php?rcctrznweaqt">dummydevice.h</a> &#8211; this is just an example of a dummy device implementation; it implements all device methods with stubs that can&#8217;t be called without a debugging break. This is useful for other partial implementations.</p>
<p><a href="http://www.everfall.com/paste/id.php?mtbg7y0yams3">deferreddevice.h</a> &#8211; this is the implementation of the device that buffers various rendering calls and then allows to execute them on some other device. Note that it lives in a fixed size memory buffer, which can be easily changed, and that it implements only a subset of rendering-related functions (i.e. no FFP).</p>
<p><a href="http://www.everfall.com/paste/id.php?aprffo3gzi7o">texturedevice.h</a> &#8211; this is the implementation of the device that works with D3DXCreateTextureFromFile for 2D textures and cubemaps (3D texture support is missing but can be added in the same way).</p>
<p>DL_BREAK is the replacement for __debugbreak, DL_ASSERT is a custom assertion macro (with neat (void)sizeof(!(expr)) trick that I hope everybody knows about by now), everything else should be obvious.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/zeuxcg.wordpress.com/288/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/zeuxcg.wordpress.com/288/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/zeuxcg.wordpress.com/288/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/zeuxcg.wordpress.com/288/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/zeuxcg.wordpress.com/288/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/zeuxcg.wordpress.com/288/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/zeuxcg.wordpress.com/288/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/zeuxcg.wordpress.com/288/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/zeuxcg.wordpress.com/288/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/zeuxcg.wordpress.com/288/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/zeuxcg.wordpress.com/288/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/zeuxcg.wordpress.com/288/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/zeuxcg.wordpress.com/288/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/zeuxcg.wordpress.com/288/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=zeuxcg.org&amp;blog=15741095&amp;post=288&amp;subd=zeuxcg&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://zeuxcg.org/2010/10/25/source-code-implementing-direct3d-for-fun-and-profit/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/b6b1c2c000b5e36a035cc78ff8f071d3?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">zeuxcg</media:title>
		</media:content>
	</item>
		<item>
		<title>Quicksort killer sequence</title>
		<link>http://zeuxcg.org/2010/10/25/quicksort-killer-sequence/</link>
		<comments>http://zeuxcg.org/2010/10/25/quicksort-killer-sequence/#comments</comments>
		<pubDate>Mon, 25 Oct 2010 19:07:35 +0000</pubDate>
		<dc:creator>zeuxcg</dc:creator>
				<category><![CDATA[Algorithms]]></category>
		<category><![CDATA[C++]]></category>
		<category><![CDATA[Sorting]]></category>

		<guid isPermaLink="false">http://zeuxcg.org/?p=281</guid>
		<description><![CDATA[Today I&#8217;m going to describe a not very practical but neat experiment, the result of which is a sequence that&#8217;s awfully slow to sort using Microsoft STL implementation; additionally, the method of generating such sequence naturally extends to any other &#8230; <a href="http://zeuxcg.org/2010/10/25/quicksort-killer-sequence/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=zeuxcg.org&amp;blog=15741095&amp;post=281&amp;subd=zeuxcg&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Today I&#8217;m going to describe a not very practical but neat experiment, the result of which is a sequence that&#8217;s awfully slow to sort using Microsoft STL implementation; additionally, the method of generating such sequence naturally extends to any other quicksort-like approach.</p>
<p>First, a quick refresher on how std::sort [in Microsoft STL] works. It is a variant of introsort with insertion sort for small chunks. It proceeds as follows: <span id="more-281"></span></p>
<ul>
<li>For small sequences (32 elements or less), it uses insertion sort, which has O(n^2) average complexity, but has a better constant than a quick sort;</li>
<li>For other sequences, a median of either three or nine elements, depending on the sequence size, is selected as a pivot;</li>
<li>The array is partitioned in place, resulting in three chunks: the leftmost chunk has all elements that are less than the pivot, the middle chunk has all elements that are equal to the pivot, and the right chunk has all elements that are greater than the pivot;</li>
<li>Left and right chunks are sorted recursively (actually, only the smaller one is sorted via a recursive call, but that&#8217;s not significant);</li>
<li>Finally, if the recursion depth is too big (more than 1.5*log2(N)), the algorithm switches to heap sort, which has a worst-case complexity of O(n*log(n)).</li>
</ul>
<p>This, given a careful implementation, results in a good general sorting function &#8211; it uses quicksort (which has a lower constant than heapsort), but falls back to heap sort on inputs that sort slowly with quicksort. However, due to unfortunate debug checks inside pop_heap function in MSVC2005 and 2008, the heap sort is quadratic in debug builds (this has been fixed in MSVC2010), so if we can make a sequence that&#8217;ll make quicksort quadratic, this introsort implementation will also go quadratic in debug builds.</p>
<p>Since all quicksort-like sorts only depend on the order between elements (they&#8217;re comparison-based), we can build the sequence of any type (i.e. a list of strings), and then make a sequence of some other type (i.e. integer list) with the same order; the number of comparisons will be the same.</p>
<p>Each quicksort-like sort has the following algorithm:</p>
<ol>
<li>Select the median(s) either using pseudo-random numbers or some fixed set of elements inside the given range;</li>
<li>Partition the range in several chunks, with rightmost chunk consisting of all elements larger than the largest median (my method can be naturally extended to multi-pivot sorts);</li>
<li>Recursively sort the chunks.</li>
</ol>
<p>Our goal, in order to make the worst possible sequence, is to maximize the size of the rightmost part; then the recursive call depth will be linear in terms of original element count, and the whole routine will be quadratic. To achieve that, we&#8217;re going to incrementally build the strings in the list with the following algorithm:</p>
<ol>
<li>Get the locations of median candidates for the first sorting pass (i.e. not including recursive calls);</li>
<li>One of them (the middle one, assuming that it&#8217;s moved appropriately) is the median (pivot); we append the following letters to all strings:
<ul>
<li>&#8216;a&#8217; to all median candidates to the left of the pivot;</li>
<li>&#8216;b&#8217; to the pivot itself;</li>
<li>&#8216;c&#8217; to all other elements.</li>
</ul>
</li>
<li>With the previous pass we maximize the amount of elements that are larger than the pivot; after this, we proceed recursively.</li>
</ol>
<p>In order to get the information about the median candidates, the median and the partition results, we need to slightly instrument the sorting function; I made the following interface:</p>
<pre class="brush: cpp;">
struct sort_context
{
	virtual bool less(const element&amp; lhs, const element&amp; rhs) { return lhs.last &lt; rhs.last; }
	virtual void partition_begin() {}
	virtual void partition_median(const element* med) {}
	virtual void partition_end(const element* right_begin, const element* right_end) {}
};

struct predicate
{
	sort_context* context;

	bool operator()(const element&amp; lhs, const element&amp; rhs) const
	{
		return context-&gt;less(lhs, rhs);
	}
};
</pre>
<p>The sorting function should call partition_begin before each sorting pass, partition_median after the median is selected, and partition_end after the array is partitioned, passing the range of the rightmost chunk.</p>
<p>Then we can implement the function that retrieves indices of median candidates:</p>
<pre class="brush: cpp; collapse: true; light: false; toolbar: true;">
std::pair&lt;std::vector&lt;size_t&gt;, size_t&gt; get_first_median_positions(element* data, size_t count)
{
	struct median_context: sort_context
	{
		bool inside;
		unsigned int counter;

		const element* median;
		std::vector&lt;const element*&gt; positions;

		median_context(): inside(false), counter(0), median(0)
		{
		}

		virtual bool less(const element&amp; lhs, const element&amp; rhs)
		{
			if (inside &amp;&amp; counter == 0)
			{
				positions.push_back(&amp;lhs);
				positions.push_back(&amp;rhs);
			}

			return sort_context::less(lhs, rhs);
		}

		virtual void partition_begin()
		{
			assert(!inside);
			inside = true;
		}

		virtual void partition_median(const element* med)
		{
			assert(inside);
			inside = false;
			if (counter++ == 0) median = med;
		}
	};

	// collect median data
	median_context c;
	sort(data, count, &amp;c);

	if (!c.median)
	{
		assert(c.positions.size() == 0);
		return std::make_pair(std::vector&lt;size_t&gt;(), 0);
	}

	// sort &amp; remove duplicates
	std::sort(c.positions.begin(), c.positions.end());
	c.positions.erase(std::unique(c.positions.begin(), c.positions.end()), c.positions.end());

	// convert from pointers to offsets
	std::vector&lt;size_t&gt; result(c.positions.size());

	for (size_t i = 0; i &lt; result.size(); ++i) result[i] = c.positions[i] - data;

	// get median position
	std::vector&lt;const element*&gt;::iterator median = std::find(c.positions.begin(), c.positions.end(), c.median);
	assert(median != c.positions.end());

	return std::make_pair(result, median - c.positions.begin());
}
</pre>
<p>a function that sorts the array and returns the partition information for the first pass:</p>
<pre class="brush: cpp; collapse: true; light: false; toolbar: true;">
std::pair&lt;size_t, size_t&gt; get_first_partition_right_modify(element* data, size_t count)
{
	struct partition_context: sort_context
	{
		unsigned int counter;
		const element* begin;
		const element* end;

		partition_context(): counter(0), begin(0), end(0)
		{
		}

		void partition_end(const element* right_begin, const element* right_end)
		{
			if (counter++ != 0) return;

			begin = right_begin;
			end = right_end;
		}
	};

	// get partitioning data
	partition_context c;
	predicate pred = {&amp;c};
	std::sort_instrumented(data, data + count, pred);

	// get indices
	return (c.begin == 0 &amp;&amp; c.end == 0) ? std::make_pair(0, 0) : std::make_pair(c.begin - data, c.end - data);
}
</pre>
<p>and finally the main function, that uses the above helpers:</p>
<pre class="brush: cpp;">
void update_array(element* data, size_t count)
{
	// get positions of the first median candidates (along with the median itself)
	std::pair&lt;std::vector&lt;size_t&gt;, size_t&gt; p = get_first_median_positions(data, count);

	if (p.first.empty()) return;

	// fill elements as follows:
	// - elements from median candidates before median get an 'a' appended
	// - median element gets a 'b' appended
	// - all other elements get a 'c' appended (so that they go into the right half after partition)
	std::map&lt;size_t, char&gt; actions;

	for (size_t i = 0; i &lt; p.second; ++i) actions[p.first[i]] = 'a';
	actions[p.first[p.second]] = 'b';
	char action_otherwise = 'c';

	for (size_t i = 0; i &lt; count; ++i)
	{
		std::map&lt;size_t, char&gt;::iterator ait = actions.find(i);

		data[i].last = (ait == actions.end()) ? action_otherwise : ait-&gt;second;
		*data[i].data += data[i].last;
	}

	// copy the elements to preserve the original data
	std::vector&lt;element&gt; copy(data, data + count);

	// get the right partition (left should be very small so we don't care)
	std::pair&lt;size_t, size_t&gt; partition = get_first_partition_right_modify(&amp;copy[0], count);

	// process the right half
	update_array(&amp;copy[0] + partition.first, partition.second - partition.first);
}
</pre>
<p>Note that as an optimization, the predicate only compares the last characters of the strings; since after each partition the contents of the right chunk consists of equal elements, the only difference is in appended character (which is one of &#8216;a&#8217;, &#8216;b&#8217;, &#8216;c&#8217;).</p>
<p>The only task that remains is to convert the string array to the integer array with the same order; this is straightforward, except that we have to use std::multiset for sorting since std::sort is slow on this set of data (which was the goal, after all :):</p>
<pre class="brush: cpp; collapse: true; light: false; toolbar: true;">
std::vector&lt;size_t&gt; generate_array(size_t count)
{
	// create element array with empty strings
	element* data = new element[count];

	for (size_t i = 0; i &lt; count; ++i)
	{
		data[i].data = new std::string;
		data[i].last = 0;
	}

	// update it to make worst possible order
	update_array(data, count);

	// make a sorted copy using std::multiset because std::sort is slow on this data (we prepared the data this way!)
	std::multiset&lt;element&gt; copy_set(data, data + count);
	std::vector&lt;element&gt; copy(copy_set.begin(), copy_set.end());

	// create an order remap
	std::map&lt;std::string*, size_t&gt; order;

	for (size_t i = 0; i &lt; copy.size(); ++i) order[copy[i].data] = i;

	// create an integer array with the same order
	std::vector&lt;size_t&gt; result;

	for (size_t i = 0; i &lt; count; ++i) result.push_back(order[data[i].data]);

	// cleanup
	for (size_t i = 0; i &lt; count; ++i) delete data[i].data;
	delete[] data;

	return result;
}
</pre>
<p>Here is <a href="http://www.everfall.com/paste/id.php?r4mk6lhujy0g">the full source code</a> for this post. It contains the above code for generating the killer sequence for a quick sort implementation, and additionally the instrumented sorting function from MSVC2008 STL. This code may not compile on other compilers because of the MS-specific parts of the sorting function itself, but otherwise should work fine.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/zeuxcg.wordpress.com/281/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/zeuxcg.wordpress.com/281/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/zeuxcg.wordpress.com/281/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/zeuxcg.wordpress.com/281/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/zeuxcg.wordpress.com/281/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/zeuxcg.wordpress.com/281/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/zeuxcg.wordpress.com/281/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/zeuxcg.wordpress.com/281/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/zeuxcg.wordpress.com/281/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/zeuxcg.wordpress.com/281/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/zeuxcg.wordpress.com/281/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/zeuxcg.wordpress.com/281/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/zeuxcg.wordpress.com/281/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/zeuxcg.wordpress.com/281/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=zeuxcg.org&amp;blog=15741095&amp;post=281&amp;subd=zeuxcg&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://zeuxcg.org/2010/10/25/quicksort-killer-sequence/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/b6b1c2c000b5e36a035cc78ff8f071d3?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">zeuxcg</media:title>
		</media:content>
	</item>
		<item>
		<title>AABB from OBB with component-wise abs</title>
		<link>http://zeuxcg.org/2010/10/17/aabb-from-obb-with-component-wise-abs/</link>
		<comments>http://zeuxcg.org/2010/10/17/aabb-from-obb-with-component-wise-abs/#comments</comments>
		<pubDate>Sun, 17 Oct 2010 18:23:22 +0000</pubDate>
		<dc:creator>zeuxcg</dc:creator>
				<category><![CDATA[Mathematics]]></category>
		<category><![CDATA[Optimization]]></category>

		<guid isPermaLink="false">http://zeuxcg.org/?p=268</guid>
		<description><![CDATA[This post is about a neat trick that is certainly not of my invention, but that should really be more well-known; at least, I haven&#8217;t heard of it till I stumbled across it while reading Box2D sources. There are a &#8230; <a href="http://zeuxcg.org/2010/10/17/aabb-from-obb-with-component-wise-abs/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=zeuxcg.org&amp;blog=15741095&amp;post=268&amp;subd=zeuxcg&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>This post is about a neat trick that is certainly not of my invention, but that should really be more well-known; at least, I haven&#8217;t heard of it till I stumbled across it while reading Box2D sources.</p>
<p>There are a lot of bounding volumes out there; the most widespread are certainly spheres and boxes, which come in two flavors &#8211; axis-aligned bounding boxes (AABB) with faces parallel to the coordinate planes, and oriented bounding boxes (OBB), which is essentially a AABB and an orientation matrix.</p>
<p>It&#8217;s common to use AABB in spatial subdivision structures, like octrees, kD-trees, ABT and so on &#8211; the intersection test between two AABB is pretty straightforward. However, when dealing with dynamic meshes, it is needed to recalculate the AABB of the mesh when the mesh transformation changes. <span id="more-268"></span></p>
<p>Assuming that the mesh has a local bounding box (which is an AABB), the usual way to get world-space AABB for the mesh is as follows:</p>
<ol>
<li>Get 8 corners of the mesh AABB</li>
<li>Transform all corners to the world space with the mesh transformation matrix</li>
<li>Find the component-wise minimum and maximum of the resulting 8 vectors</li>
</ol>
<p>However, there is a better way, which reduces the amount of floating-point operations to one quarter of the above. It&#8217;s easily derived once we slightly change the AABB representation &#8211; while AABB are commonly represented with two vectors, min and max, let&#8217;s assume that our box is represented with the center and extent vector:</p>
<p>min = center &#8211; extent<br />
max = center + extent</p>
<p>or</p>
<p>center = (min + max) / 2<br />
extent = (max &#8211; min) / 2</p>
<p>Now, the 8 corners of the original AABB are in the form of center + (&#xb1;extent.x, &#xb1;extent.y, &#xb1;extent.z). Transforming those by the matrix M is thus</p>
<p>M * (center + (&#xb1;extent.x, &#xb1;extent.y, &#xb1;extent.z))</p>
<p>Let&#8217;s expand the matrix-vector multiplication; the result looks like this:</p>
<p>M00 * (center.x &#xb1; extent.x) + M01 * (center.y &#xb1; extent.y) + M02 * (center.z &#xb1; extent.z) + M03<br />
(and likewise for other two components)</p>
<p>We can slightly rearrange the equation to get this:<br />
(M00 * center.x + M01 * center.y + M02 * center.z + M03) + (&#xb1;M00 * extent.x + &#xb1;M01 * extent.y + &#xb1;M02 * extent.z)<br />
(and likewise for other two components)</p>
<p>Now, the left part is shared by all 8 points, and is equal to M * center (i.e. to the AABB center, transformed to the world space); this is the center of the new AABB.</p>
<p>The right part is different for all points; however, it&#8217;s obvious that, since extent vector has non-negative components, that the minimum of the right part is reached when all of &#xb1;M00, &#xb1;M01, &#xb1;M02 are negative, and the maximum is reached when all of them are positive. Thus, the maximum of the right part is:</p>
<p>abs(M00) * extent.x + abs(M01) * extent.y + abs(M02) * extent.z<br />
(likewise for other two components).</p>
<p>Note that this is the matrix-vector multiplication, with the matrix being the component-wise absolute value of the original transformation matrix, and the vector being the extent vector (which has to be transformed as if it is a direction, i.e. without taking matrix translation into account).</p>
<p>The resulting code looks like this (this is F# with SlimDX math classes):</p>
<pre class="brush: fsharp;">
let matrix_abs (matrix: Matrix) =
    let mutable m = Matrix()
    for i in 0..3 do
        for j in 0..3 do
            m.[i, j] &lt;- abs matrix.[i, j]
    m

let transform_aabb_fast (aabb: BoundingBox) matrix =
    let center = (aabb.Minimum + aabb.Maximum) / 2.f
    let extent = (aabb.Maximum - aabb.Minimum) / 2.f

    let new_center = Vector3.TransformCoordinate(center, matrix)
    let new_extent = Vector3.TransformNormal(extent, matrix_abs matrix)

    BoundingBox(new_center - new_extent, new_center + new_extent)
</pre>
<p>Instead of 8 shuffles, 8 matrix-point multiplications and 8 vector min+max operations, we need to convert the AABB to and from center+extent representation (extent can be alternatively computed as aabb.Maximum &#8211; center) and do one matrix-point and one matrix-direction multiplications, which is usually faster.</p>
<p>In case the original mesh bounding volume was an OBB (in my experience, this is usually not necessary, as local-space AABB give a good enough approximation for common cases, but still), this can be applied in the same way &#8211; you&#8217;ll have to get a full transformation matrix by multiplying the OBB and mesh transformation matrices.</p>
<p>When I&#8217;ve seen this in Box2D, at first I did not understand why the code works at all &#8211; the meaning of component-wise absolute value is not immediately obvious. Now I know; and I hope that this was of some interest to you.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/zeuxcg.wordpress.com/268/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/zeuxcg.wordpress.com/268/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/zeuxcg.wordpress.com/268/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/zeuxcg.wordpress.com/268/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/zeuxcg.wordpress.com/268/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/zeuxcg.wordpress.com/268/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/zeuxcg.wordpress.com/268/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/zeuxcg.wordpress.com/268/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/zeuxcg.wordpress.com/268/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/zeuxcg.wordpress.com/268/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/zeuxcg.wordpress.com/268/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/zeuxcg.wordpress.com/268/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/zeuxcg.wordpress.com/268/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/zeuxcg.wordpress.com/268/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=zeuxcg.org&amp;blog=15741095&amp;post=268&amp;subd=zeuxcg&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://zeuxcg.org/2010/10/17/aabb-from-obb-with-component-wise-abs/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/b6b1c2c000b5e36a035cc78ff8f071d3?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">zeuxcg</media:title>
		</media:content>
	</item>
	</channel>
</rss>
