Portable thread libraries have existed since userspace threads were invented. That's really not news. But it's true that the C memory model (per the spec) was underconstrained. So various tricks like volatile had to be added, and when those weren't enough the libraries had to drop down to platform-specific assembly code like memory barriers, etc...
The point here is that they're just putting this into the standard and requiring specific semantics. This works because all CPU vendors have settled on a more or less consistent way of doing this -- synchronization is mature technology, basically. But it presumably also means that some older uniprocessor architectures won't be officially supported.
Portable threads libraries worked very well for people who were willing to use mutexes, which are implemented to include the memory barriers required to reason about code. Where it got nasty was when people couldn't afford the performance hit of mutexes (or imagined they couldn't take the performance hit, or just wanted to be ninjas) and tried to write safe code without mutexes by reasoning about how execution might get interleaved. C++ didn't provide as many guarantees about the ordering of side effects as a reasonable person would guess, so that typically didn't work.
Wise people threw up their hands and said, "I'm not a ninja, so I'll just use mutexes and coarse-grained locking, nothing tricky, and see if it's fast enough," and they were mostly fine. I think that's what most wise people will continue to do, and the people who will get the most use of out the new standard will be people who have very special performance requirements or who write high-performance concurrent libraries (such as container libraries) for mere mortals to use.
I don't see why older platforms can't be supported should the compiler vendor decide it's worth the trouble. Any architecture can be made coherent if you sacrifice performance. The compiler can conform to the standard by being very conservative. Just a question of cost.
The point here is that they're just putting this into the standard and requiring specific semantics. This works because all CPU vendors have settled on a more or less consistent way of doing this -- synchronization is mature technology, basically. But it presumably also means that some older uniprocessor architectures won't be officially supported.