My name is Arseny Kapoulkine and this is my blog where I write about computer graphics, optimization, programming languages and related topics. I’m the author of pugixml, meshoptimizer, volk and other projects.

23 March 2024 Condvars and atomics do not mix

When using std::condition_variable, there’s an easy to remember rule: all variables accessed in wait predicate must be changed under a mutex. However, this is easy to accidentally violate by throwing atomics in the mix.

15 March 2024 LLM inference speed of light

In the process of working on calm, a minimal from-scratch fast CUDA implementation of transformer-based language model inference, a critical consideration was establishing the speed of light for the inference process, and measuring the progress relative to that speed of light. In this post we’ll cover this theoretical limit and its implications.

28 November 2023 It is time

I joined Roblox in August 2012; eleven years and 4000 commits later, it’s time to say goodbye. Today was my last day.

30 June 2023 Efficient jagged arrays

A data structure that comes up fairly often when working with graphs or graph-like structure is a jagged array, or array-of-arrays. It’s very simple to build it out of standard containers but that’s often a poor choice for performance; in this post we’ll talk about a simple representation/construction code that I found useful across multiple different projects and domains.

Crucially, we will focus on immutable structures - ones that you can build in one go from source data and then continuously query without having to change it. This seems like a major constraint but for many problems it is sufficient to build the structure once, and it makes significantly simpler and more efficient implementations possible.

28 April 2023 Fine-grained backface culling

Backface culling is something we take for granted when rendering triangle meshes on the GPU. In general, an average mesh is expected to have about 50% of its triangles facing away from the camera. Unless you forget to set appropriate render states in your favorite graphics API, the hardware will reject these triangles as early in the rasterization pipeline as possible. Thus, it would seem that backface culling is a solved problem. In this post, however, we’ll explore a few alternative strategies that may or may not improve rendering performance.

More…