CPU Cache and Why It Matters for Performance

Each CPU core has its own L1 and L2 cache. The L3 cache is shared across cores.

The cache stores data in blocks called cache lines. Each line holds 64 bytes.

This is why array access beats linked list access.

False Sharing

Two cores may each have the same cache line in their local cache. But they work on different variables within that line. When this happens, the processor must track changes through the MESI protocol. The cores send messages to keep their copies in sync.

The variables seem independent—one in each core’s L1 cache. But they share the same cache line in L3. This creates communication overhead as the cores sync the line repeatedly.

When Communication Happens

Cache communication kicks in when:

A thread moves from one core to another (this connects to Go’s GPM model where it’s better to reuse the same M that ran a G before—it keeps the cache warm)
Two cores need the same cache line

Fixing False Sharing

What happens when multiple threads modify adjacent data? Like concurrent array updates, or when struct fields sit next to each other in memory and get modified at once?

The fix: pad your data structures. In Go, use anonymous _ []byte fields to fill space. Make sure the affected items fall into different cache lines. This breaks the sharing and eliminates the sync overhead.

Just add enough padding bytes so each item gets its own cache line. The performance gain often pays back the extra memory cost many times over.

Concurrency Programming Essentials FastHTTP: Building High-Performance HTTP Servers