The optimization is the linear memory layout of the nodes -- value speculation i...

Remnant44 · on July 12, 2024

The linear node layout is not the point at all.

It's serving two purposes here:

1) Providing us with a correct "guess" of what the next node is. 2) Ensuring that in all cases we're running from the L1 cache.

In real world code, you'd be correct -- getting things to run out of L1/L2 is the most important attribute. This is specifically about a micro-optimization that allows you to beat the obvious code even when running completely from cache!

pfedak · on July 12, 2024

The example is poorly chosen in terms of practicality for this reason, but otherwise, no, this is a poor summary that misses something interesting.

The memory layout isn't changing in the faster versions, and there are no additional cache misses. It's easy to convince yourself that the only difference between the naive linked list and assuming linear layout is the extra pointer load - but TFA shows this is false! The execution pipeline incurs extra costs, and you can influence it.

mistercow · on July 12, 2024

I think a more practical example might have been to have a mostly contiguous list with a few discontinuous nodes inserted randomly in the middle. That’s more like a real case, and exercises the advantages of linked lists over simple arrays, but should still perform well, since there would only be a few value speculation misses.