Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

80GB in 4bit.

But because it only activates one expert at a time, it can run on a fast CPU in reasonable time. So 96GB of DDR4 will do. 96GB of DDR5 is better.



WizardLM-2 8x22b (which was a fine tune of the Mixtral 8x22b base model) at 4bit was only 80GB.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: