Process 1 billion rows and save the world

Helge Holm

Workshop 3h

Green computing is a growing trend, both for the environment and for saving cloud costs. For developers, this means understanding how to write efficient software.

The popular “1 Billion Row Challenge” (1brc.dev) is a fun tool to get hands-on experience in how various coding techniques can make a difference. Cutting your processing time by 90% is easier than you might think!

First we will focus on writing code that can read an input file of 14 gigabytes in the first place, and perform necessary processing to get a basic solution for the “1 Billion Row Challenge”.

Then we will move on to learning how to use a profiler and basic techniques like memory mapped file access, zero-copy, allocation-free-processing and parallelization, to get the processing below 60 seconds.

The main workshop guidance will be for compiled languages (Zig, C++) on Linux. However, the principles used are applicable to any language on any OS, so it is perfectly fine to show up with your favorite language and environment. We will provide some basic resources and code examples for at least Java, C#, Python and Go, and may well add more if given advance notice.

Basic programming knowledge is required. If you can write a “Hello world!” program without assistance, you’re good to go. You will need to bring a laptop with a working development environment.