Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
kaibee
on April 17, 2024
|
parent
|
context
|
favorite
| on:
Mixtral 8x22B
That's for an 8B model.
cptcobalt
on April 17, 2024
[–]
This is over trivializing it, but there isn't much more inherent complexity in training an 8B or larger model other than more money, more compute, more data, more time. Overall, the principles are similar.
lostmsu
on April 17, 2024
|
parent
[–]
Assuming linear growth to number of parameters that's 7.5 figures instead of 6 for 8x22B model.
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: