On Languages, VMs, Optimization, and the Way of the World

rayiner · on Aug 4, 2013

One thing that gets lost in these discussions is implementation complexity. The JVM is amazing, but it's also a phenomenally complex piece of code.

My recent interest has been in gradually typed languages: http://ecee.colorado.edu/~siek/gradualtyping.html, http://www.mypy-lang.org.

Gradual typing is somewhat different from optional typing as found in Dylan or Common Lisp, because it offers a consistent type system wherein a program that is fully typed declared will be statically typed. Dylan and Common Lisp compilers use extensive type inference systems to try and eliminate as many dynamic type checks as possible, but type inference is fairly complicated and its hard for the programmer sitting at his keyboard to guess where typechecks will be generated unless he has a lot of familiarity with a particular implementation. With a gradually typed language, you can be pretty sure that if you annotate certain syntactic elements with types, the compiler will generate code that does not check types at runtime. Or, in code that only partially declares types, you can be sure that type errors will ultimately be traceable code that does not have declarations (http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.142....).

The upside of gradual typing is two-fold: 1) you can generate pretty good code without a very sophisticated compiler; and 2) you can generate pretty good code without having a runtime optimization framework in place (as with the JVM, or the Dart VM or V8).

evincarofautumn · on Aug 4, 2013

We did this in an ActionScript 3 compiler at my last job. The problem is that since code is dynamic by default, you still need the “fallback” operations mentioned in this post, because people can (and do) make use of dynamic operations and you need to preserve behaviour. Also, ActionScript’s type system isn’t expressive enough to actually get to full static typing—there is only one function type, for example, which says nothing about its parameter or return types. But in principle, and in a language that was designed to support it at the outset, it’s a good way to do typing.

rayiner · on Aug 4, 2013

> ActionScript’s type system isn’t expressive enough to actually get to full static typing—there is only one function type, for example, which says nothing about its parameter or return types

Yeah, that's the problem in Dylan too. See: opendylan.org/~hannes/coll.pdf‎

gcr · on Aug 5, 2013

That sounds a lot like Cython, a Python-->C source translator. The more variables you annotate with C types, the more closely the output looks like idiomatic C. If you don't annotate any variables with types, then they're all treated as PyObject with the performance penalties you'd expect.

derleth · on Aug 5, 2013

Common Lisp is always fully typed, in that every object knows its type and the implementation won't allow actions which violate the type signatures involved.

C, with casts, is a lot closer to being optionally typed.

noelwelsh · on Aug 5, 2013

Types are better regarded as a static property of a program. In some (statically) typed language you can declare a type like

  type Index = Int

aliasing Index to the integer type. The representation of Index will be exactly the representation of an Int and a runtime you won't be able to tell them apart. At compile time, however, you won't be able to use an Index where an Int is expected and vice versa.

Common Lisp is better described as a safe language. Programs will only fail in defined ways. C is an unsafe language -- once you go off the rails anything can happen.

azakai · on Aug 4, 2013

The points all sound reasonable. I wonder about the details though, for example

> Dart is dynamically typed (or at least, types are optional and the VM doesn't care about them), but types are of a fixed shape. If programmers can tolerate fixed-shape types, Dart provides a very nice dynamic language that still can achieve the same optimizations as statically-typed Java or statically-compiled C/++.

Again, the principle sounds reasonable - predictable types allow Java/C-like performance - but it would be nice to see this supported by empirical results. For example

http://benchmarksgame.alioth.debian.org/u32/benchmark.php?te... , http://benchmarksgame.alioth.debian.org/u32/benchmark.php?te...

and

http://j15r.com/blog/2013/07/05/Box2d_Addendum

do not support the article's point. Dart's performance there is much like JavaScript's.

The one dynamic language that can in many cases compete with C and Java is Lua, in the LuaJIT implementation. But Lua is very dynamic, much like JavaScript, so again this is a result in the real world that does not seem to add up with the article's point.

m0th87 · on Aug 4, 2013

> You could probably make the claim that static optimization is a halting problem, and dynamic optimization eventually can beat it by definition since it can optimize what the program is actually doing.

It's not as if static type systems and JIT compilation are mutually exclusive concepts, so this argument makes no sense.

If JITing were a panacea, dynamically typed languages would be just as fast as statically typed, and lower-level languages like C would be long dead. This is clearly not the case.

mwcampbell · on Aug 4, 2013

I think his point there is that a JIT compiler can handle dynamism more effectively than an AOT compiler. A corollary is that the more static the language is, the less advantage JIT compilation has over AOT compilation.

PommeDeTerre · on Aug 4, 2013

Part of what I see m0th87 saying, though, is that practice has shown that these claims about JIT compilation handling certain cases more effectively than AOT compilation just don't really hold true.

JIT compilation may be better than some of the other approaches for implementing VMs or interpreters. It may also theoretically have the potential to be better than AOT compilation in some cases, too. But that potential doesn't really matter if we never see such benefits in real systems.

Real-world code written in C, C++, and Fortran and compiled ahead of time still basically always outperform Java bytecode executed by HotSpot, or JavaScript code executed by V8, or Lua code executed by LuaJIT, by a huge margin. This is even after a huge amount of effort has gone into systems like HotSpot and V8, for instance.

The theoretical benefits of JIT compilation do us no good if it's too costly or difficult to build effective implementations of such systems in a timely manner, for example. It does the wider community no good if such optimizations only really apply in a very small number of highly specific cases, as well.

A lot of us have heard these claims about JIT compilation for over a decade now, especially with respect to HotSpot, but we've yet to see these techniques consistently and reliably match AOT compilation in general, never mind outperform it. There is a lot of doubt out there due to this.

simula67 · on Aug 5, 2013

As much as I like C, it would be better if language choice does not depend on the performance it can deliver. It would be nice to see the JVM pull up closer to AOT compiled languages, but as this post painfully exposes, this seems implausible.

nickik · on Aug 5, 2013

I think there where C Jit but the did not win out for some reason. My think is, if we today would try to do a C jit with GCC resources it would probebly be just as fast.

There is simply nobody working on lowlevel language JITs.

cwzwarich · on Aug 4, 2013

One thing that dynamic optimizations have a problem dealing with is eliminating the overhead of indirection in data representation, e.g. from arrays of objects or nested objects. If all of the objects involved have a limited local lifetime, then you can just inline all of the functions involved and apply standard optimization techniques. But if the objects have a more global scope, then a JIT really can't do that much. There is some research into this (see http://ssw.jku.at/Research/Papers/Wimmer08PhD/ for an example), but as far as I know nothing is production quality.

trimbo · on Aug 5, 2013

> Given appropriate dynamic optimizations, there's no reason Java code can't compete with or surpass statically-typed and statically-compiled C/++,

Garbage collection.

I'm not sure I understand completely what he's trying to say though I find it hard to believe an article about "languages, VMs and optimization" doesn't once mention GC. You can heavily optimize any language to JIT, vectorize instructions, use SSE, whatever... but then that pesky garbage collector stops the world and takes 100 milliseconds...

barrkel · on Aug 5, 2013

The primary cost of GC is in memory, not time. GC doesn't need to stop the world; but to be efficient, it needs to be free to generate a lot of garbage to amortize the cost of tracing the live set. This is why GC is asymptotically faster than malloc/free (though not necessarily other approaches, like arenas). GC is also easier, in principle, to parallelize.

nickik · on Aug 5, 2013

> GC doesn't need to stop the world

As far as I know almost all GCs eventually have a full on stop the world case. The only expetion I know of is the azul C4 collecter but I think there the only ones.

barrkel · on Aug 5, 2013

Yes, you are agreeing with me, also mentioning an existence proof.

noelwelsh · on Aug 4, 2013

The important thing is for language users to recognize that nothing is free, and to understand the implications of language features and design decisions they make in their own programs.

I can agree with this statement. Other than that, I have some qualms with the post. For a start, it doesn't consider that compilation takes time. The great thing about JIT compilation is you know everything about the code being run. The bad thing is you have no time to make use of this information.

Then there is this:

Traditionally, static typing was the best way to guarantee we produced good CPU instructions. It gave us a clear picture of the world we could ponder and meditate over, eventually boiling out the secrets of the universe and producing the fastest possible code.

Huh, what? Any compiler author knows that many compiler optimization problems are NP-Complete. For example, graph colouring register allocation is, as the name suggests, NP-complete. You have to do register allocation whether you JIT compile or AOT compile, and you have to do it approximately in both cases. (IIRC typical JIT compilers use a linear scan graph allocation algorithm which is almost as good as graph colouring but much faster.)

Furthermore, in the languages he considers there are loads of optimization opportunities that cannot be statically expressed in the language (e.g. stack allocation of memory, immutability, SIMD operations). You can infer many of these statically, but statically typed languages also benefit from JIT optimization using extra information available at runtime in the same way that dynamically typed languages do.

Finally, a type theorist would have a fit over how he uses the word "typed" (and they would object to this comment as well.) Types don't exist a runtime -- representations do.

Update AOT compilers also allow code to be targeted to specific machines. An obvious example is a Linux distro available in 32-bit and 64-bit versions. GCC allows much finer distinctions than this.

AOT compilers can also make use of runtime information -- so-called "profile guided optimisation."

mwcampbell · on Aug 4, 2013

I think the reason why he mostly limited his comparison to JIT-compiled implementations is that the most popular languages, at least right now, are not very amenable to AOT optimization. So, taking JIT compilation as a given, he explored the language properties that help or hurt optimization.

There are always trade-offs, of course. Sometimes, to gain maximum performance without requiring the overhead of JIT compilation, it makes sense to use C++, so you can be explicit about stack versus heap allocation, boxed versus unboxed values, etc. But I think we can agree that many application developers want to work at a higher level of abstraction while still getting pretty good performance. In that context, one of the languages that he compared is probably a much better fit than C++.

qznc · on Aug 4, 2013

I agree with you, although it comes out unnecessarily rude.

However, JIT does not necessarily mean fast compilation (e.g. Linear Scan). A JIT might identify hot spots in the code and the optimize the hell out of that part with the same optimizations that GCC -O3 uses. Plus, it might have additional runtime information. Advanced JIT engines usually have multiple stages of optimization.

nickik · on Aug 5, 2013

Its a good point about jit time. It is however also true that even if you do it in the background (as for example Azul does, I think) you are not completly free of its problems. It is often hard to firgure out if it is worth the coordination of the background work. Im not complety sure but I think V8 does advanced optimisation on thread.

Also I am sure that Lua Jit does everything on a single thread. LuaJit is a special case most of the time since it works so fundamentally diffrent almost all other high performace jits in the wild. It only has one compiler and still works with a interpreter.

noelwelsh · on Aug 4, 2013

Rudeness is unintentional. It was written in a hurry.

jmatthews · on Aug 5, 2013

Perhaps parallel to your point but not only does compilation take time but development takes even more time.

The article essentially compares mutable and immutable values, and their effect on compilation and cycles. Sure the type system is a subset of the larger mutable/immutable issue.

The reality is that few implementations would use mutable values "under the hood" if they could make the actual use of the language appear to allow mutables.

Stated more simply, In a concurrent world the only benefit of mutable values is the programmers ability to reason about them.

lmm · on Aug 4, 2013

With a sufficiently advanced language (scala), fully static typing is exactly what I as a programmer want - it gives me a lightweight way to make assertions about my values everywhere in my program, which makes debugging a lot easier. The fact that values can change type in Javascript or Ruby I find to be a disadvantage.

sethhochberg · on Aug 5, 2013

I agree. For me, I really do enjoy writing Ruby programs - but when something isn't working as expected and its time to do some serious debugging, I prefer C++ all day.

cromwellian · on Aug 5, 2013

There is also an impact on memory use and startup latency imposed by JITs vs AOT. None of the JS VMs so far seem to do any kind of snapshotting. Since the types collected by runtime profiling are likely to be the same for most runs of a well-behaved app, it seems like you could send down the profile to the client JIT to allow it to converge quicker.

kodablah · on Aug 4, 2013

I have been playing w/ cross compiling a language to Dart. There is no bytecode[1] which actually makes it a tad easier. Dart does give you method_missing semantics w/ noSuchMethod. Granted a runtime method lookup table, if you were building Ruby on Dart, would lose the compilation optimizations on dynamic methods, but you'd still get them w/ the explicitly defined ones (I guess the same way as JRuby). When Dart has more runtime compilation abilities (beyond just spawnUri) I think it will be a great target for dynamic languages.

1 - http://www.dartlang.org/articles/why-not-bytecode/

seanmcdirmid · on Aug 4, 2013

This is similar to the design of the dynamic language runtime under Microsoft's CLR, which is able to accomplish this with bytecode by objects that implement IDynamicMetaObjectProvider.

sanxiyn · on Aug 5, 2013

I also recommend "Why Python, Ruby, and Javascript are slow". https://speakerdeck.com/alex/why-python-ruby-and-javascript-...

Summary: Things that take time are hash table lookups, allocations, copying.