We have successfully shipped the Metal rendering backend to millions of users, and I want to write a bit about that. There are varying opinions on Metal in the industry - some claim Metal would not have been needed if only Apple dedicated more attention to OpenGL and Vulkan, some say it’s the easiest graphics API that ever existed. Why even bother with Metal, some ask, if you can just write OpenGL or Vulkan code, and use MoltenGL or MoltenVK to the same effect? Here are my thoughts on the API.
Exactly ten years ago, the first version of my XML parser, pugixml, got released to the public.
In the last article (Approximating slerp) we discussed a need for a fast and reasonably precise quaternion interpolation method. By looking at the data we arrived at two improvements to
nlerp, a less precise one and a more precise one. Let’s look at their implementations and performance!
Quaternions should probably be your first choice as far as representing rotations goes. They take less space than matrices (this is important since programs are increasingly more memory bound); they’re similar in terms of performance of basic operations (slower for some, faster for others); they are much faster to normalize which is frequently necessary to combat accumulating error; and finally they’re way easier to interpolate. In this post we’ll focus on interpolation.
Execution time in many programs is dominated by memory access time, not compute time. This is becoming increasingly true with higher instruction-level parallelism, wider SIMD, larger core counts and a lack of breakthroughs in memory access latency. A lot of performance talks now start by explaining that in a cache hierarchy the last-level cache miss is 100x or more expensive than a first-level cache hit, TLB misses are scary and contiguous data is great. But there is another beast that lurks in the depths of virtual memory subsystem, and its name is Page Fault.