Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If those could be fed equally well (i.e. doubled decode width, reorder buffer sizes, register file ports), not much would be different, of course, but, from what I understand, that's significantly more expensive (given that the complexity of some of those things grow at O(n^2) or so (bypass network, connecting all pipes to all others? picking out the ops to run in a given cycle from the reorder buffer?), vs O(n) from just increasing width), so we probably won't be seeing 10 SIMD ports anytime soon.


You are probably right about the bypass network, but I don't see why ROB or decode would need to increase. Aren't avx512 instructions only "split" when already at a pipe in zen4? Also, my understanding was that the cpu can schedule avx2 instructions to the upper and lower part of the 512 wide pipes.


Indeed Zen 4 splits uops just as it passes them to pipes, but Zen 4 is already doing that, adding more ports doesn't mean you can do it twice (without, like, making those ports 128-bit (thus not gaining any throughput), or making a new AVX-1024).

Allowing accessing separate parts of 512-bit pipes makes sense, but that still then needs separate ports for each half, otherwise there's nothing to schedule the other half to. uops.info data[0] shows that 256-bit shuffle throughput is indeed double that of 512-bit, but seemingly both still increment either the FP1 or FP2 port (these overlap the regular four ALU port numbers!) so the AVX2 shuffles still have two ports to taget.

So the mapping between Zen 4's (perf-counter-indicated) ports is rather unrelated from available execution units (not in any way a new concept, but still interesting). Which would seem to indicate that perhaps like "vaddps zmm; vpermd zmm" can manage 1/cycle, while "vaddps ymm; vaddps ymm; vpermd ymm; vpermd ymm" would fight for FP2 (for reference, vaddps uses either FP2 or FP3)? Fun.

[0]: https://uops.info/table.html?search=%22vpermd%20%22&cb_lat=o...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: