> They shouldn't have killed an excellent processor (the Alpha)
Parallel Alpha systems are a pain to deal with, because they lack a form of expected synchronization that every other processor has: automatic data dependency barriers. On every other platform, if you initialize or otherwise write to a value, then make a pointer point to that value, you can expect that anyone reading through that pointer gets the initialized/new value. But on Alpha, another CPU can get the new value of the pointer and then the uninitialized/old value of what it points to.
Alpha is the sole reason why the Linux kernel "smp_read_barrier_depends" barrier exists and code has to use it; on every other platform, that barrier is a no-op.
Is there any evidence that not handling read-read dependencies in hw is crucial to alpha performance?
I'd guess that back when the alpha memory model was designed multiprocessors were quite rare, and designers didn't have such a clear picture of the tradeoffs that we do today (not saying today's understanding is perfect, just that it's better than what we had 30 years ago), and chose the weakest possible model they could come up with in order to not constrain future designers.
Parallel Alpha systems are a pain to deal with, because they lack a form of expected synchronization that every other processor has: automatic data dependency barriers. On every other platform, if you initialize or otherwise write to a value, then make a pointer point to that value, you can expect that anyone reading through that pointer gets the initialized/new value. But on Alpha, another CPU can get the new value of the pointer and then the uninitialized/old value of what it points to.
Alpha is the sole reason why the Linux kernel "smp_read_barrier_depends" barrier exists and code has to use it; on every other platform, that barrier is a no-op.