Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Benchmarks published by the company itself should be treated no differently than advertising. For actual signal check out more independent leaderboards and benchmarks (like HuggingFace, Chatbot Arena, MMLU, AlpacaEval). Of course, even then it is impossible to come up with an objective ranking since there is no consensus on what to even measure.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: