Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

what's the state of the art in quantization methods these days that one might apply to a model like LLama 3? Any particular literature to read? Of course priorities differ across methods. Rather than saving space or speeding up calculations, I'm simply interested in static quantization where integer weights multiply integer activations (like 8-bit integers). (as for motivation, such quantization enables proving correct execution of inference in sublinear time, at least asymptotically. i'm talking of ZK tech)


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: