You and I have very different definitions of "maximally optimized." I think of eliminating all possible wasted CPU cycles when I hear those words. In comparison, being able to remove spurious copies and atomic accesses is table stakes for claiming that you know a systems programming language. Most use cases of C and C++ today are in that state.
Well, clones can sometimes make it faster (e.g. when you use multiple threads that can work independently, synchronization will have a much higher overhead than a literally insanely fast, predictable memory-to-memory copy). And then the compiler may very well be able to elide the copy, it only needs it for semantics.
You’re right, I was exaggerating for effect. But the point is that we’re talking about learning the language. You don’t have to get it perfect on your first try or do everything the best way when you’re just getting started.