Boxed types in OCaml, and other thoughts

reikonomusha · on July 1, 2023

This code is what OCaml needs to do in order to arithmetic in a function call. This isn't code for "just" x*x, but rather it's code for "multiply two ints passed into a function and return the product as an int".

Implementations like SBCL have a similar overhead for a function call containing arithmetic. However, if you're doing more arithmetic within the function, the bit shifts etc. are only done once as a part of the function's pre/postamble. In fact, fully unboxed 64-bit arithmetic with integers or floats can be done if these values aren't crossing function boundaries or into the garbage-collected heap. (The latter, a memory read/write, is way more costly than a bit shift anyway.)

The term of art is "inlining" or "open-coding" arithmetic. Raw arithmetic can be open-coded, but data formats must be respected across function call boundaries and in the heap. Therefore, if you inline the function containing the arithmetic, you eliminate the pre/postamble overhead too. This is the same as in C-like languages as well, it's just that their pre/postambles are different.

"Block" or "whole-program compilation" can also sometimes eliminate these extra instructions, at some modest expenses (compile wall clock time, implementation complexity, increased size of compilation units, multiple function entry points, modularity, etc.).

OCaml's built-in Flambda [1,2] is a project/technology that will automatically leverage some of the above techniques (especially inlining) to achieve higher performance.

[1] https://v2.ocaml.org/manual/flambda.html

[2] https://blog.janestreet.com/flambda/

WastingMyTime89 · on July 1, 2023

To add to your good comment, doing inlining intelligently can get tricky in language using a lot of polymorphic functions like Ocaml.

To squeeze out performance, you have to detect when polymorphic functions are actually used on int and decide if you want to compile and store a special cased version which avoid the unboxing-boxing shuffle. There is a balance to find between compile time, program size and speed of running code which is fairly hard to get right.

DonPellegrino · on June 30, 2023

This top bit is used by the garbage collector for performance. OCaml's GC is light enough that it simply does not feel like it exists at all. Integers also do not cause allocations.

Jane Street's blog explains the reasoning in detail[0]

[0] https://blog.janestreet.com/what-is-gained-and-lost-with-63-...

mkristiansen · on June 30, 2023

I think this line explains what I was facinated by,

"x * y is translated to CPU instructions (x >> 1) * (y - 1) + 1" namely that it now at least 3 extra highlevel operations to do simple multiplication.

keithalewis · on July 1, 2023

How does paying employees to spend time on this make sense for a hedge fund?

WastingMyTime89 · on July 1, 2023

That wasn’t done by Jane Street. This is part of the language from the start and was introduced by Leroy team at INRIA.

As to the point of studying boxing, well, it’s fairly basic compiler theory. It will be hard to avoid if you have even the slightest interest in PL.

reikonomusha · on July 1, 2023

Optimization of code is valuable in a lot of industries where time-to-complete or time-to-react is proportional to money spent/lost.

keithalewis · on July 1, 2023

Most of their cohorts just use C++ for that. Several Wall Street firms wrote their own language back in the day. Until the business people figured out IT people just liked to write languages and tell happy stories. They are in the money making business, not the language writing business. Division of labor.

I spent 3 years at Morgan Stanley writing A+ and loved it. A+ is dead but kdb lives on, and is widely used at top hedge funds.

ufo · on July 1, 2023

IIRC, some Smalltalks also use a similar trick.

rwmj · on July 1, 2023

This is a better explanation: https://web.archive.org/web/20180403212547/https://caml.inri...

You'd be better off measuring actual performance on real world tasks before inferring anything. Modern CPUs overlap execution of instructions very aggressively.

hayley-patton · on July 1, 2023

You can use fewer instructions if you tag integers with a zero (as SBCL does); then 2x * 2y = 4xy, which we only need to shift right one bit to get 4xy/2 i.e 2xy. But this takes a bit off your fixnum size, so you might shift an operand right, which also is just one instruction (as (2x/2) * 2y = x * 2y = 2xy).

Ericson2314 · on June 30, 2023

Yeah it's pretty bad, but Jane Street is working on adding unboxed types.

mkristiansen · on June 30, 2023

And to be honest, this might be the sorta thing that is optimized away when doing inlining. but it does seem interesting that even when the type is species, and very narrow, the boxed types being onwrapped and rewrapped by the called function, rather than the caller.

As a side note it took me a second to realize that the multiplication being done is not actually xx, but rather (x (x<<1))

zem · on July 1, 2023

if anyone hasn't delved into why, it's because the ocaml integer `x` is represented by the machine integer `(2x + 1)`. so what `x * x` compiles to is `((2x + 1) - 1) * ((2x + 1) / 2) + 1` which is `2x * x + 1` = `2(x * x) + 1`, or the machine integer representing the ocaml integer `(x * x)`.

tempodox · on July 1, 2023

To be precise, it's x * (x << 1) + 1. You need to set bit 0 to signal to the garbage collection that this is not a heap box.

c-cube · on July 1, 2023

Is it really bad? In the grand scheme of things this should have little impact. No branches. The pipeline will do these extra ops really fast. A bigger problem with perf in ocaml is the amount of indirection and the subsequent cache misses, which can amount to hundreds of cycles.

Ericson2314 · on July 2, 2023

The amount of indirection is the same problem as this. This is just a tiny example of it.

Solvency · on July 1, 2023

I'm confused.

In OCaml, the type system is strong and statically typed, ergo the type of a value is known at compile-time.

So...why would it need to do this to check if the type is an integer?

dpwm · on July 1, 2023

For garbage collection, which operates at runtime. Integers are unboxed; every other type requires additional data.

WastingMyTime89 · on July 1, 2023

Polymorphism. This trick allows Ocaml to pass unboxed integer to polymorphic functions.

It’s about compiled code by the way, a lot lower down the stack. That’s the joy of working on a compiler. You deal with both very high level concepts and very low level issues.

rickstanley · on July 1, 2023

Also, just out of curiosity, how would one go about removing out these extra steps? I don't program in OCaml, but I'm fascinated by it.

kaba0 · on July 1, 2023

OCaml uses a very very naive runtime representation for heap-allocated objects, so I guess that’s the reason.

mkristiansen · on June 30, 2023

I came across an interesting fact about OCaml and wrote up my thoughts.

uzerfcwn · on July 1, 2023

FYI some of your code blocks have very low contrast in dark mode: https://imgur.com/zH6iSMA

mkristiansen · on July 3, 2023

This is what I get for experimenting with going back into 'non-dark' modes on my laptop ":) thank you for pointing this out.