In the last article (Approximating slerp) we discussed a need for a fast and reasonably precise quaternion interpolation method. By looking at the data we arrived at two improvements to
nlerp, a less precise one and a more precise one. Let’s look at their implementations and performance!
Quaternions should probably be your first choice as far as representing rotations goes. They take less space than matrices (this is important since programs are increasingly more memory bound); they’re similar in terms of performance of basic operations (slower for some, faster for others); they are much faster to normalize which is frequently necessary to combat accumulating error; and finally they’re way easier to interpolate. In this post we’ll focus on interpolation.
Execution time in many programs is dominated by memory access time, not compute time. This is becoming increasingly true with higher instruction-level parallelism, wider SIMD, larger core counts and a lack of breakthroughs in memory access latency. A lot of performance talks now start by explaining that in a cache hierarchy the last-level cache miss is 100x or more expensive than a first-level cache hit, TLB misses are scary and contiguous data is great. But there is another beast that lurks in the depths of virtual memory subsystem, and its name is Page Fault.
Over the next few posts I’d like to write about optimizing mesh data for run-time performance (i.e. producing vertex/index buffers that accurately represent the source model and are as fast to render for GPU as possible).
Whenever there is an automated process involved, such as asset/code building, unit testing, automatic version packaging, bulk log processing, etc., there often is a set of command-line tools which do their thing and return the result. Then there is a calling process (which may be as simple as a batch file, or as complex as IncrediBuild), which launches the tool and acts upon success/failure.