gcc simvm.c
time ./a.out
857419840
real 0m0.374s
user 0m0.359s
sys 0m0.003s
meanwhile:
gcc -O3 simvm.c -o fast.out
time ./fast.out
857419840
real 0m0.006s
user 0m0.001s
sys 0m0.003s
Comparing optimized compiled code against unoptimized compiled code is worthless. .NET does some optimizations as it runs, and JITs, your standard `gcc` does very few.
/me goes to comment on the blog.
edit: bah! Foiled!
>Hmmm, your comment seems a bit spammy. We're not real big on spam around here.
Please go back and try again.
edit2: holy cow, that spam filter hates me. I can't get anything to post...
Hah, nice. Didn't check the resulting code. Know what the other optimization levels do? I'm don't really know x86 assembler.
I do still side with the optimization though, in that the .NET-VM could do this as well. The loop is pretty simple, and can be optimized away. Why shouldn't it?
Your answer received lots of votes for what are basically assumptions.
The author of the article updated his post for -O3 by using a volatile variable for the for-loop test, which makes the compiler not optimize away the loop.
For .NET you can look at the generated assembly, and if it does optimize away the loop you can use the same trick.
No reasons for guessing. We are software engineers after all.
"volatile" forces the C compiler to issue load and store instructions for each access (although real-world C compilers are notoriously buggy here), so it makes the loop much slower. That's not a fair comparison.
He wasn't trying to write something faster than C, he was just establishing a good baseline. He makes that clear in the article when he specifically addresses the fact that compiling with optimizations removes the loops.
(I wonder, did the people who upvoted you actually read the article?)
Apologies on the first results, simvm.c had mistakenly not compiled, so those were the original loops. I had to modify the code to get it to compile (dropped it to "#define LONG") and here are the results:
This is a compiler, not an interpreter, as it uses .NET's dynamic assembly api to emit .NET IL which is then JIT'd to native code. The benchmark is also likely optimized away by static analysis.
For an actual benchmark of various bytecode interpretation schemes see:
These micro benchmarks also take some care to attempt realistic branch prediction rates.
Running these on current hardware, switch based interpreters still perform quite well. Direct threading gets you slightly more performance. I'd say that sticking to ANSI c and just using switch is a good plan. If you need more performance then you likely should go ahead and implement a native JIT of some sort or use JVM/.NET.
This phrase, what does this mean anymore? Do these things exist in the form of real, non-toy software anymore? What's the difference between compiling to an AST, byte code, or LLVM intermediate? I think these words have more to do with cultural expectations than with VM/compiler technology.
I'm sorry, but there is a qualitative difference between, eg, a Java application running on a modern JVM and a Python application executing through the CPython runtime, neither of which are toys.
(Although it's true that CPython is one of the few extant members of a dying breed.)
The connotations of "compiled" and "interpreted" might have plenty to do with vague expectations, but the words still mean something in the context of language implementation: even though it is reasonable to call the situations you list "compilation", it would not be accurate to call a language that was compiled to LLVM IR, then interpreted with lli, "compiled".
it would not be accurate to call a language that was compiled to LLVM IR, then interpreted with lli, "compiled".
Why not? I wouldn't see that as being too different from compiling a Java application and running it on "a modern JVM." With most examples of that I've seen, I think I could fairly characterize them as being "compiled."
For some reason, people think of things like MRI 1.8 as being "interpreted" and expect those things to be slow. One could just as well take the same language and run it on a tracing JIT VM. (after some considerable engineering) Semantically, the thing would still be "interpreting bytecodes," just doing it in a highly optimized way. Note that the step where the MRI reads the source and creates an AST is is fundamentally the same as parsing source code and outputting bytecode. There is nothing somehow special or sacred about an intermediate language in the form of bytecode.
Given the compiler/VM technology we have today, even the degree to which things are (or aren't) late-bound is more flexible than in the past.
> people think of things like MRI 1.8 as being "interpreted" and expect those things to be slow. One could just as well take the same language and run it on a tracing JIT VM. (after some considerable engineering)
Off topic, but in case you weren't aware, that "considerable engineering" exists, and is called JRuby. It works (almost) exactly as you describe. And yes, it is rather fast :-)
Regarding your Ruby example and the tracing JIT VM. I think the efforts of Mozilla regarding the TraceMonkey/JaegerMonkey efforts show that while tracing is indeed a very good optimization technique, it is not by all means clear that it performs well on all benchmarks, particularly when compared to V8, which is a traditional method-at-a-time JIT compiler.
While I agree that there is nothing special about bytecode, interpreting it is regarded to be faster than AST interpreters. AFAIR, I read a paper about that, too, but cannot for the love of god remember its name/authors; also my impression is that Ruby 1.9 uses a bytecode interpreter (with some optimizations, such as direct threaded code interpreter).
...particularly when compared to V8, which is a traditional method-at-a-time JIT compiler.
Actually, my understanding is that V8 automatically compiles all JS code on load, it has no interpreter. So in a sense it's "method at a time" (i.e. its methods are compiled as units, it's not a trace system) but it's not traditional, and you could make an argument that it's not even a JIT.
You are right, V8 does not have an interpreter. However, I think it is fairly accurately desribed as a JIT, because it compiles methods to native code when needed, i.e., before the first invocation/execution of any specific piece of code.
Other approaches, like JVM, are more accurately described as "dynamic compilation" sub-systems of the virtual machine.
At least that's my understanding of things, and I used to lump everything in that area into the "JIT" category, until (quite recently, actually) someone brought this small but important distinction to my attention. I don't see how one could make an argument not classifying V8 as a JIT compilation virtual machine, though...
Note, CPython hasn't strictly been an interpreter for a long long time. It compiles .py files into .pyc files, then runs them. It also does a fair bit of caching. However, it's virtual machine doesn't do much optimization.
Yeah, it pre-allocates lots of common data-structures, and reuses old ones rather than using new ones if possible, and I think it might do some stuff with iterators ... but it's nothing like PyPy or a modern JVM.
Regarding: "words still mean something in the context of language implementation"
Since nobody mentioned it before, the primary advantages of interpreters in the field of pldi are:
- ease of implementation
- portability
This has very important implications to the field of interpreter optimizations, too, since most of them are relatively easy to implement and portable. The ease of implementation advantage when compared to an optimizing native code compiler is usually very big, the same holds true for JIT compilers as well. Portability again is a nice thing to have for optimizations--particularly when compared to the portability of JITs, which need a dedicated native code generating backend for every supported architecture.
Languages can be both interpreted and compiled. However, there are clear differences between a compiler, and an interpreter. That's the fine print that everyone's missing when talking about the distinction between the two (or lack thereof).
If you mean by "qualitative difference" that they differ in some substantial way, please describe what that difference is, rather than asserting that it exists.
Java interprets bytecodes, Python interprets bytecodes. Java has JIT, Python has a JIT (`import psyco`). Where is the substantial difference between these two?
If all you mean by "qualitative difference" is that they use different bytecodes, then you're not saying very much.
Humans produce sounds, lions produce sounds. Humans have mouths, lions have mouths. Where is the substantial difference between the two?
Even conceptually, interpreters and compilers differ in certain aspects. For me, the most striking one is that compilers do not fully emulate the results of the given program (one could argue that they do to some extent, when performing optimizations - like when gcc -O3 transforms the whole process to a single printf() call).
There are many others, but I hope you get the point.
> Humans produce sounds, lions produce sounds. Humans have mouths, lions have mouths. Where is the substantial difference between the two?
I asked you a question regarding your previous claim. You responded with a downvote and a rhetorical counter but, again, no answer.
I'm mildly familiar with the JVM. I'm very familiar with Python's virtual machine implementation. I ask again: what is the substantial difference between the two? Rhetorical wizardry does not impress me: I actually want to know what your answer is. You said that there is a "qualitative difference" between the two and now have replied implying that it's a substantial difference. I don't understand your reticence to give a real answer here.
> Even conceptually, interpreters and compilers differ in certain aspects.
That has absolutely nothing to do with my question, and I don't see how it would substantially differentiate between the two implementations we're discussing.
My apologies for mistaking you with the OP. Still, neither one of you has actually answered the question I asked. You offered an analogy for rhetorical effect; surely you can see how that isn't actually an answer.
Humans produce sounds, lions produce sounds. Humans have mouths, lions have mouths. Where is the substantial difference between the two?
The endpoints are easy. My point is that there are so many things nowadays which are really somewhat in the middle, and there's no clear line of demarcation.
The VB "interpreter" emits opcode for the .NET VM. This codes is then optimized and jitted. This means that both these interpreters run native code in the end, with the difference that the VB version can take advantage of the optimizations implemented by the .NET JIT.
Indeed - frankly when I saw it I was surprised the guy had accepted this as valid in their contest - if he's allowed to use an offboard compiler the 'interpreter' could just as well emit some generated C to a file and pass it through GCC -O3.
His 'VB' emits byte code for the .NET VM which is highly optimized. Also, VB.NET runs on the same VM. VB.NET has a lot of 'features' that are performance nightmares but it's not that slow if you know what to avoid.
Oh, I didn't see that VB was run on mono, run that code on a windows machine and I bet it will beat the -O3 optimized C
/me goes to comment on the blog.
edit: bah! Foiled!
>Hmmm, your comment seems a bit spammy. We're not real big on spam around here.
Please go back and try again.
edit2: holy cow, that spam filter hates me. I can't get anything to post...