I had the same thoughts reading the book's preface and exploring the author's site (I had to apply a CSS override, my eyes can't seem to handle green on black for long anymore). And this thread is another instance. Most of me is just fine with the JVM's performance and I sometimes have to correct naive coworkers concerned that splitting up a method for testability and readability is going to be a performance problem -- due to the way the JIT works it may very well improve performance.
But part of me agrees and wants things to go further, since "machine code" is just an abstraction on the microcode. x86 is too high of a layer for what's actually going on in modern CPUs and I hate that to write the most optimal code by hand you have to layout the assembly in a way that coerces the CPU to do what you want at the lower layer (and coerce the compiler to layout the assembly that way too if you aren't working at the assembly layer). At least FPGAs offer salvation to mostly do what you want without lower interference (but there are no "pure" FPGAs, i.e. they actually contain dedicated hardware like DSP slices they can use to be competitive for common tasks rather than just pure programmable gates), but then you're going to be fairly application specific instead of having something more general purpose with lower level hooks or a meta protocol that can give you choices on the tradeoffs instead of being forced to use the one the creators chose. I guess the complaint is that there's just lots of performance left on the table that can't be captured generally.
But part of me agrees and wants things to go further, since "machine code" is just an abstraction on the microcode. x86 is too high of a layer for what's actually going on in modern CPUs and I hate that to write the most optimal code by hand you have to layout the assembly in a way that coerces the CPU to do what you want at the lower layer (and coerce the compiler to layout the assembly that way too if you aren't working at the assembly layer). At least FPGAs offer salvation to mostly do what you want without lower interference (but there are no "pure" FPGAs, i.e. they actually contain dedicated hardware like DSP slices they can use to be competitive for common tasks rather than just pure programmable gates), but then you're going to be fairly application specific instead of having something more general purpose with lower level hooks or a meta protocol that can give you choices on the tradeoffs instead of being forced to use the one the creators chose. I guess the complaint is that there's just lots of performance left on the table that can't be captured generally.