Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Cache-aware prefill–decode disaggregation for 40% faster LLM serving (together.ai)
1 point by roody_wurlitzer 23 days ago | hide | past | favorite


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: