Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Boxed types in OCaml, and other thoughts (ajour.io)
58 points by mkristiansen on June 30, 2023 | hide | past | favorite | 25 comments


This code is what OCaml needs to do in order to arithmetic in a function call. This isn't code for "just" x*x, but rather it's code for "multiply two ints passed into a function and return the product as an int".

Implementations like SBCL have a similar overhead for a function call containing arithmetic. However, if you're doing more arithmetic within the function, the bit shifts etc. are only done once as a part of the function's pre/postamble. In fact, fully unboxed 64-bit arithmetic with integers or floats can be done if these values aren't crossing function boundaries or into the garbage-collected heap. (The latter, a memory read/write, is way more costly than a bit shift anyway.)

The term of art is "inlining" or "open-coding" arithmetic. Raw arithmetic can be open-coded, but data formats must be respected across function call boundaries and in the heap. Therefore, if you inline the function containing the arithmetic, you eliminate the pre/postamble overhead too. This is the same as in C-like languages as well, it's just that their pre/postambles are different.

"Block" or "whole-program compilation" can also sometimes eliminate these extra instructions, at some modest expenses (compile wall clock time, implementation complexity, increased size of compilation units, multiple function entry points, modularity, etc.).

OCaml's built-in Flambda [1,2] is a project/technology that will automatically leverage some of the above techniques (especially inlining) to achieve higher performance.

[1] https://v2.ocaml.org/manual/flambda.html

[2] https://blog.janestreet.com/flambda/


To add to your good comment, doing inlining intelligently can get tricky in language using a lot of polymorphic functions like Ocaml.

To squeeze out performance, you have to detect when polymorphic functions are actually used on int and decide if you want to compile and store a special cased version which avoid the unboxing-boxing shuffle. There is a balance to find between compile time, program size and speed of running code which is fairly hard to get right.


This top bit is used by the garbage collector for performance. OCaml's GC is light enough that it simply does not feel like it exists at all. Integers also do not cause allocations.

Jane Street's blog explains the reasoning in detail[0]

[0] https://blog.janestreet.com/what-is-gained-and-lost-with-63-...


I think this line explains what I was facinated by,

"x * y is translated to CPU instructions (x >> 1) * (y - 1) + 1" namely that it now at least 3 extra highlevel operations to do simple multiplication.


How does paying employees to spend time on this make sense for a hedge fund?


That wasn’t done by Jane Street. This is part of the language from the start and was introduced by Leroy team at INRIA.

As to the point of studying boxing, well, it’s fairly basic compiler theory. It will be hard to avoid if you have even the slightest interest in PL.


Optimization of code is valuable in a lot of industries where time-to-complete or time-to-react is proportional to money spent/lost.


Most of their cohorts just use C++ for that. Several Wall Street firms wrote their own language back in the day. Until the business people figured out IT people just liked to write languages and tell happy stories. They are in the money making business, not the language writing business. Division of labor.

I spent 3 years at Morgan Stanley writing A+ and loved it. A+ is dead but kdb lives on, and is widely used at top hedge funds.


IIRC, some Smalltalks also use a similar trick.


This is a better explanation: https://web.archive.org/web/20180403212547/https://caml.inri...

You'd be better off measuring actual performance on real world tasks before inferring anything. Modern CPUs overlap execution of instructions very aggressively.


You can use fewer instructions if you tag integers with a zero (as SBCL does); then 2x * 2y = 4xy, which we only need to shift right one bit to get 4xy/2 i.e 2xy. But this takes a bit off your fixnum size, so you might shift an operand right, which also is just one instruction (as (2x/2) * 2y = x * 2y = 2xy).


Yeah it's pretty bad, but Jane Street is working on adding unboxed types.


And to be honest, this might be the sorta thing that is optimized away when doing inlining. but it does seem interesting that even when the type is species, and very narrow, the boxed types being onwrapped and rewrapped by the called function, rather than the caller.

As a side note it took me a second to realize that the multiplication being done is not actually xx, but rather (x (x<<1))


if anyone hasn't delved into why, it's because the ocaml integer `x` is represented by the machine integer `(2x + 1)`. so what `x * x` compiles to is `((2x + 1) - 1) * ((2x + 1) / 2) + 1` which is `2x * x + 1` = `2(x * x) + 1`, or the machine integer representing the ocaml integer `(x * x)`.


To be precise, it's x * (x << 1) + 1. You need to set bit 0 to signal to the garbage collection that this is not a heap box.


Is it really bad? In the grand scheme of things this should have little impact. No branches. The pipeline will do these extra ops really fast. A bigger problem with perf in ocaml is the amount of indirection and the subsequent cache misses, which can amount to hundreds of cycles.


The amount of indirection is the same problem as this. This is just a tiny example of it.


I'm confused.

In OCaml, the type system is strong and statically typed, ergo the type of a value is known at compile-time.

So...why would it need to do this to check if the type is an integer?


For garbage collection, which operates at runtime. Integers are unboxed; every other type requires additional data.


Polymorphism. This trick allows Ocaml to pass unboxed integer to polymorphic functions.

It’s about compiled code by the way, a lot lower down the stack. That’s the joy of working on a compiler. You deal with both very high level concepts and very low level issues.


Also, just out of curiosity, how would one go about removing out these extra steps? I don't program in OCaml, but I'm fascinated by it.


OCaml uses a very very naive runtime representation for heap-allocated objects, so I guess that’s the reason.


I came across an interesting fact about OCaml and wrote up my thoughts.


FYI some of your code blocks have very low contrast in dark mode: https://imgur.com/zH6iSMA


This is what I get for experimenting with going back into 'non-dark' modes on my laptop ":) thank you for pointing this out.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: