Optimizing Recommendation Systems with JDK's Vector API

lukev · 2026-03-05T21:30:04 1772746204

I had success on a similar problem by allocating native buffers for the matrices, then using a basic CUDA call. The actual work was 100x faster than my CPU baseline.

The bottleneck of course was fetching & loading relevant data to memory to start with.

zmmmmm · 2026-03-06T01:30:28 1772760628

The vector API is really interesting but so frustrating that it has taken so long to materialise. The inability of Java to properly utilise parallel compute - whether it's SIMD or GPU - has been a huge factor in dealing it out of being at the forefront of modern compute.

pjmlp · 2026-03-06T06:16:33 1772777793

Tornado VM exists for a few years already.

Additionally you can always do the Python way, write C++ libs, pack them into a jar, and call it a Java library.

aberoham · 2026-03-05T19:29:21 1772738961

"the remaining 2% were large batch requests", [which made up 50% of the work] .. who really watches that many shows on Netflix? What was in those batches, if someone is watching that much, why bother with serendipity at all? Most serendipitous thing you could do is shut off their subscription.

jayd16 · 2026-03-05T23:23:02 1772752982

Note that they likely mean the list of candidates is large not the user history. This is for an API so perhaps 2% of client requests implemented batch requests, providing the opportunity for batch processing of that request.

7777777phil · 2026-03-06T08:14:55 1772784895

This is a JVM infrastructure optimization, not a recommendation architecture breakthrough. At Netflix scale, most of the field runs GPU-accelerated ANN search (FAISS, ScaNN) for candidate retrieval.