Intel contributes to Linux, how is this a problem?

fc417fc802 · 2026-03-03T23:17:51 1772579871

Wrong level of abstraction. NUMA is an additional layer. If the program (script, whatever) was written with a monolithic CPU in mind then the big picture logic won't account for the new details. The kernel can't magically add information it doesn't have (although it does try its best).

Given current trends I think we're eventually going to be forced to adopt new programming paradigms. At some point it will probably make sense to treat on-die HBM distinctly from local RAM and that's in addition to the increasing number of NUMA nodes.

Agingcoder · 2026-03-04T01:30:17 1772587817

Yes exactly.

The kernel tries to guess as well as it can though - many years ago I hit a fun bug in the kernel scheduler that was triggered by numa process migration ie the kernel would move the processes to the core closest to the ram. It happened that in some cases the migrated processes never got scheduled and got stuck forever.

Disabling numa migration removed the problem. I figured out the issue because of the excellent ‘a decade of wasted cores’ paper which essentially said that on ‘big’ machines like ours funky things could happen scheduling wise so started looking at scheduling settings .

The main numa-pinning performance issue I was describing was different though, and like you said came from us needing to change the way the code was written to account for the distance to ram stick. Modern servers will usually let you choose from fully managed ( hope and pray , single zone ) to many zones, and the depending on what you’ve chosen to expose, use it in your code. As always, benchmark benchmarks.

zadikian · 2026-03-04T02:43:25 1772592205

Guessing this is especially hard to automate with peripherals involved. I once had a workload slow severely because it was running on the NUMA node that didn't share memory with the NIC.

consp · 2026-03-04T07:24:30 1772609070

Isn't high grade SSD storage pretty much a memory layer as well these days as the difference is no longer several orders of magnitude in access time and thoughput but only one or two (compared to tha last layer of memory)?

Agingcoder · 2026-03-04T14:54:06 1772636046

Optane was supposed to fill the gap but Intel never found a market for this.

Flash is still extremely slow compared to ram, including modern flash, especially in a world where ram is already very slow and your cpu already keeps waiting for it.

That being said, you should consider ram/flash/spinning to be all part of a storage hierarchy with different constants and tradeoffs ( volatile or not, big or small , fast or slow etc ), and knowing these tradeoffs will help you design simpler and better systems.

fc417fc802 · 2026-03-04T10:06:30 1772618790

Sort of? Relative to 6 or more channels of RAM it's still quite abysmal but perhaps high bandwidth flash will change how things are done.

wmf · 2026-03-03T22:03:21 1772575401

Often the Linux scheduling improvements come a year or two after the chip. Also, Linux makes moment-by-moment scheduling and allocation decisions that are unaware of the big picture of workload requirements.