We really should see what happens when Project Digits is finally released. Also, I would love in NVIDIA decided to get in the CPU/GPU + unified memory space.
I can't imagine the M3 Ultra doing well on a model that loads into ~500G, but they should be a blast on 70b models (well, twice as fast as my M3 Max at least) or even a heavily quantized 400b model.
I can't imagine the M3 Ultra doing well on a model that loads into ~500G, but they should be a blast on 70b models (well, twice as fast as my M3 Max at least) or even a heavily quantized 400b model.