Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

That's for an 8B model.


This is over trivializing it, but there isn't much more inherent complexity in training an 8B or larger model other than more money, more compute, more data, more time. Overall, the principles are similar.


Assuming linear growth to number of parameters that's 7.5 figures instead of 6 for 8x22B model.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: