Ask HN: Can an interpreted-only language be implemented in a JITted one?

yorwba · on Feb 25, 2018

Yes, but the JIT may not actually know how to optimize the resulting translated code, if it doesn't have the kinds of patterns the optimizer is looking for.

Based on the page you linked, it seems like there are 4 different kinds of language elements, and parsing depends on which of those kinds something evaluates to. I'd assume that in normal code, most expressions will always evaluate to the same kind, since otherwise the humans writing the code would get confused. In that case, it might be possible to eliminate most of the ambiguity through static type inference.

For programs where you are unable to eliminate that ambiguity at compile time, you could compile code into a series of of function calls that take the relevant runtime parameters and dispatch on the types to perform the appropriate action. If you do that in a way that's similar to the usual type-dispatch in the host language, the JIT compiler will likely optimize it using e.g. a polymorphic inline cache.

I hope that helps and you'll actually see my answer. If you do, I'd like to know what got you interested in that question. (Are you trying to optimize an existing J codebase?)

rscho · on Feb 25, 2018

Thank you for your very informative answer!

My job leads me to write a lot of ad-hoc scripts for medical research. It consists almost solely in chains of extremely boring pandas/R functions which quickly turn into a mess, because the raw data ALWAYS comes in non-regular form, with corner cases and irregularities everywhere that must be regularized before analysis (that is, of course, done with Excel by my boss...)

The key here, is that medical data consists in very varied small datasets in the vast majority of cases (typically < 10^6 cells), so code reusability is very low since the data irregularities are different every time.

My impression is that languages such as J/APL would shine for such applications, since you won't have to remember what your code meant a year later, and allows very concise code compared to the pages of Pandas/R I have to write to handle corner cases. That said, I would also definitely appreciate to freely mix J in my codebase as a DSL rather than count on FFI. I also hope to be able to achieve auto annotation of functions using J code through the nice Racket tools for language transformation. As for performance improvement, I'm not really counting on it since I suspect that the Racket JIT will not be any faster than the J interpreter that is so small that it fits entirely into cache memory.

In summary, big and regular data is all the rage, but let's not forget the armies of people handling small and irregular data :-)

yorwba · on Feb 25, 2018

If something more like APL is within the realm of possibilities, you might want to take a look at the co-dfns compiler (although some of it may be specific to GPU code): https://github.com/Co-dfns/Co-dfns

However, for 10^6 cells, even O(n^2) algorithms should run in a few hours at worst, so performance optimizations might not be too critical. I'd suggest first validating your idea with an existing interpreter. Building your own tools is always fun, but can also be a huge distraction that you probably shouldn't take on unless your existing options are not adequate.

gus_massa · on Feb 25, 2018

Can you give a simple example, that shows the problem parsing J? IIUC the example in the link, the parser has to know the arity of the functions. This can be done in Racket using macros that look like functions. (For example, functions with optional arguments are actually macros that look like functions.)

I'm not sure that JIT is so important here, unless you are planning to generate bytecode on the flight and JIT it.