Grayskull is make before LLMs being a thing. And their plan is like Groq, to distribute the compute graph across multiple processors to get higher effective memory and throughput by pipelining. But better-ish by having RAM so you can fit models on much less cards. Grayskull doesn't have this ability. The next generation Wormhole does by having 100GbE interfaces on the cards.
Also the CPUs on Grayskull is 32bit. Memory is addressed through the bank address so it works for now. But they'll have to upgrade to 64bit soon.
Also the CPUs on Grayskull is 32bit. Memory is addressed through the bank address so it works for now. But they'll have to upgrade to 64bit soon.