That's for an 8B model.

cptcobalt · on April 17, 2024

This is over trivializing it, but there isn't much more inherent complexity in training an 8B or larger model other than more money, more compute, more data, more time. Overall, the principles are similar.

lostmsu · on April 17, 2024

Assuming linear growth to number of parameters that's 7.5 figures instead of 6 for 8x22B model.