How Erlang does scheduling (2013)

pron · on April 28, 2016

When Quasar was very young we had something very similar to Erlang's reduction-based preemption; we later disabled it in favor of preemption on communication only. The reason is this: suppose you have n cores. Now, suppose you have a single fiber that hogs the core for a while. If this occurrence is rare, reduction/time-slice preemption doesn't really help, because the scheduler has n kernel threads, and one of them would steal the tasks scheduled on the thread running the "runaway" fiber and all is well.

If, on the other hand this behavior is very common, and the number of fibers exhibiting it is k, then if k > n, then you're in trouble no matter what scheduling or preemption you do. k fibers that constantly ask for CPU they can't have (because k > n) means that work is getting delayed and you're well out of soft realtime territory.

So since n is small (say 8-60), and the number of fibers is large (10K-1M or more), the number of fibers that can reasonably often require a lot of CPU (and reduction/time-slice preemption) is very small compared to the total number of fibers, and those would be special, well-known, fibers anyway (remember that if a fiber needs a lot of CPU only occasionally, the many-threaded scheduler handles that gracefully even without preemption as long as most other fibers are well-behaved). In that case rather than base the scheduler designed for ~1M fibers around the behavior of ~10 fibers just doesn't make sense. It is far more efficient to simply run that CPU-hungry code in plain-old kernel threads, and let the kernel worry about their scheduling. In Quasar, unlike in Erlang, it's very simple to choose whether code should run in a user-mode-scheduled fiber or in a kernel-scheduled heavyweight thread, and so, after seeing that turning on reduction-based preemption had no positive effect in practice, we turned that feature off.

rdtsc · on April 27, 2016

Yap always a favorite by Jesper Louis Andersen

By the way, Erlang as of next release (19) will have dirty schedulers. That will make it even easier to integrate blocking C modules into Erlang VM. Can do it now of course but to do it right have to do your own thread + queue + locking to avoid blocking schedulers.

Here is article from same author about it:

https://medium.com/@jlouis666/erlang-dirty-scheduler-overhea...

Checkout more of his writings:

https://medium.com/@jlouis666

daveguy · on April 27, 2016

This is interesting. U have briefly looked at Erlang and I wonder a few things, for any Erlang gurus:

1) A lot of this sounds like OS style scheduling. Does this reduce complexity (maybe by Erlang dealing with various OS scheduling quirks?) or does this increase complexity because now you have to deal with both the Erlang scheduler and the OS scheduler.

2) It struck me as odd that he said, sending to a larger mailbox takes more resources. Does that mean sending to a mailbox with a larger incoming queue takes more as a way to balance queues or does that mean a mailbox with a greater capacity takes more?

3) The preemption and soft-realtime sounds most interesting. Are there foreign function interfaces that could allow you to get Erlang style concurrency with other languages? Python (and a lot of others, but especially python) are absolutely aweful at multitasking due to the Global Interpreter Lock (GIL), with no intention by the developers to fix it. Could you magically add the ability to preemptively slice Python code by adding it to an Erlang framework? Would the overhead of Erlang+Python not be worth it? Usually in a language like Erlang or Python you would call C for speed. Could you call C and get preemption and scheduling on that too?

felixgallo · on April 27, 2016

Not a guru, but:

1. As with any multithreaded application, if your OS scheduler is heavily loaded by other applications, you can experience issues. Erlang is designed to degrade fairly slowly and fairly gracefully before it has to give up. From a complexity standpoint, the programmer doesn't worry about schedulers of either type except in rare cases.

2. Sending to a larger mailbox doesn't take more resources, it takes more reductions. So if a process has a large mailbox, it costs the sender more to add yet another message into the mailbox. This is a form of backpressure and helps slow down the system in a gentle(r) way when a single process is bottlenecking.

3. You can get Erlang style concurrency out of pony (natively), JVM languages (scala/akka; quasar), but I doubt you could get it out of Python unless you reimplemented Python on the BEAM or Java VM. Generally I find personally that working directly in Erlang is fine and I don't need to combine in any ruby/python/perl, which are all slightly less expressive than Erlang.

Calling C code works two ways; you can call it out-of-process via what's called a port, which is basically a unix pipe to your C process; or in-process via a NIF, which is more dangerous because the scheduler has no control or understanding inside your C code. Carefully written NIFs, and NIFs written to take advantage of the dirty scheduling system, can thrive, but it's both an art and a science.

rozap · on April 28, 2016

I don't want to be pedantic, but I'm going to be pedantic (sorry!)

  You can get Erlang style concurrency out of ... JVM languages (scala/akka; quasar)

This is a little misleading. Yea, akka gives you actors, and on the surface they look similar to erlang processes (they have a receive block, and a ! function call), but I think the more subtle differences are what make erlang interesting, and ultimately really pleasant to work with 1) akka is not preemptive multitasking. actors are multiplexed across a thread pool, so it's possible to do blocking IO and screw a bunch of other actors up inadvertently 2) the whole gc thing that applies across the vm.

all sorts of wonderful things fall out of the two properties above, things which you need to work really hard to get in an akka world.

as far as foreign interfaces to python, it's certainly possible with something like jinterface, but I'm not really sure what the point would be. generally you leave erlang behind for similar reasons you might leave python behind (things that are cpu bound). jinterface exists because java does a whole bunch of stuff well that erlang falls down at, so they can complement each other nicely. I'm not sure there are as many of those cases with python.

phamilton · on April 28, 2016

It is my understanding that quasar gives you something awfully close to preemptive scheduling. It's not quite as strict as Erlang's reduction based appproach, but it essentially provides implicit cooperative scheduling. It allows you to do blocking IO without tying up execution.

On top of quasar, using Azul as the garbage collector will get you a GC without Stop-The-World behavior on the JVM. It's neither open source nor cheap though.

That said, I'm plenty happy with BEAM.

Matthias247 · on April 28, 2016

I wouldn't say Pony (which is very interesting!) has the same concurrency style as Erlang. Yes, the both allow to send messages to other processes/actors in an asynchronous (non-blocking) fashion. However Erlang supports also blocking operations (e.g. through a selective receive) beautifully and thereby allows to build blocking APIs on top of non-blocking messaging operations. Pony on the other hand is totally async. There is no way to block an actor for the result, e.g. in order to wait for a response. The only way to get some external input (response) is to give up the current branch of execution (return from the main behavior method) and wait for the next behavior to be executed. Pony is even more strict than other actor frameworks in this regard, because it even won't perform garbage collection inside an actor as long as it is actively running.

techdragon · on April 28, 2016

You can "magically" gain a lot of the things Erlang has, by using the same approach as Erlang.

Python in particular has several great ways to get the benefits of shared nothing, "green thread"/"VM internal processes"/Erlang style concurrency, powered by message passing.

My two favourite ones are.

Stackless Python - it's got a mixed reputation among the subset of Python programmers that know it exists, but if you're curious about Erlang style concurrency, it's at least worth reading about.

The Pulsar framework- https://pythonhosted.org/pulsar/ - it's a Python 3.4/3.5 asyncio powered Erlang style actor model concurrency framework... It's brilliant and deserves much more attention than it gets.

rdtsc · on April 28, 2016

> to get the benefits of shared nothing, "green thread"/"VM internal processes" ... Stackless Python

It is shared everything isn't it. Greenlet is core which was extracted out of stackless and both gevent and eventlet are based on it. Can get lightweight concurrency units sure. That's very far from Erlang.

To put it another way a thread (even a green one) + a thread safe queue does not an Erlang make ;-)

But is a popular trope usually, when frameworks claim "we do Erlang" it often means that.

Some of the things Erlang has can be found in various languages and frameworks. The best approximation is actually -- an operating system. OS processes are closest to how Erlang works. They are isolated from each other. Can have systemd or other things supervise them. Can send messages via pipes, unix sockets, files etc. But can't have 1M of those running.

Then like you mentioned can have green threads in other languages. But can't have low latency GC, CPU scaling, isolated memory for each concurrency unit. So not quite there.

> The Pulsar framework- https://pythonhosted.org/pulsar/

That does look pretty cool. I'll have to take a look. Thanks for the link!

BuckRogers · on April 28, 2016

Can you tell me why it's brilliant? I've looked it over and as far as I can tell there's nothing new here. It's main claim to fame is that it's built on asyncio / requires Python3 which makes it dead in the water for my PyPy projects.

loxs · on April 28, 2016

The biggest QOL feature of Erlang compared to other "concurrent" systems like node.js for example is that you have the comfort of executing blocking calls. Blocking for the caller that is. No need to pass callbacks. You write your code sequentially (for the caller) and the magic happens. You can wait as long as you want.

toast0 · on April 28, 2016

> 1) A lot of this sounds like OS style scheduling.

This is probably a mixed bag. On a machine that's devoted to software in Erlang, you will ideally run one scheduler (OS thread) per core pinned to a core, plus however many dirty/async/io schedulers needed/configured. You'll also have a couple other OS processes running (ntpd, sshd, crond, etc). So the OS will have an easy time with scheduling processes, the majority of the time you'll have one thread to run on the core, or one thread on the core waiting for io / other signals. On the other hand, you do have Erlang doing work that the OS could conceivably do; but I wouldn't really want to put one million threads into an OS scheduler (maybe it works, but I don't know).

> 2) sending to a larger mailbox takes more resources

As felix said, it costs more reductions, but there might be a real cost too; a larger mailbox probably means more senders sending (although it can also just mean slow processing), and if there are more senders sending to the process, there is more contention on the message queue lock, more info on how messages are added to a mailbox here[1]. There's not a concept of mailbox capacity, the queue is a linked list; any mailbox can expand until you run out of swap, or hit the configured memory limit, either way the whole OS process dies[2]. I'd guess the reduction cost for sending a message is a function of the number of messages in the queue, it's possible it's a function of the total size of all the messages, but I don't know if that's tracked.

Edit: i meant to address > 3) Are there foreign function interfaces that could allow you to get Erlang style concurrency with other languages

Not that I've seen. You can do NIFs, ports, or port drivers, but that doesn't really get what you're asking for. There are some other languages that run on the BEAM VM, maybe one of the Lisps you could take existing Lisp code and run it, but generally the BEAM languages are not available outside the BEAM. If you wanted to run an existing language within BEAM, that language would need to have hooks to embed in another program, and the hooks would need to let you do a small amount of work at a time; or you'd have to re-implement the language in a BEAM friendly manner.

[1] http://erlang.org/pipermail/erlang-questions/2011-September/...

[2] This isn't very great, and one of the areas where Erlang's isolation is weak -- one Erlang process can eat all of limited global resources, for example ram, ports, ets tables, atoms and the result is often the whole VM stops, either it directly exits, or core applications will shutdown when they hit a failure they can't handle (for example mnesia really needs to be able to have the ets tables and file ports it needs, and will shutdown when it can't get them, often this will result in your application crashing enough to get shutdown)

andy_ppp · on April 28, 2016

Really surprised at [2] given how robust BEAM is elsewhere. Seems it would be simple for a supervisor to be able to check sizes and kill processes over a certain size. However, actually thinking about it the hot swapping of code and supervision probably works because all the messages/ports/ets/atoms/results can sit there separately while the process can be replaced.

Thanks for the info!

ramchip · on April 28, 2016

Interestingly, there's discussion going on right now about adding a similar feature: http://erlang.org/pipermail/erlang-questions/2016-April/0889...

toast0 · on April 28, 2016

Generally, an erlang supervisor is just waiting for processes to die and react to it; I'm not sure it makes sense to also have it taking an active interest in processes, but you could quickly build something that periodically calls erlang:process_info/2 on all processes, and kills them if they're too big (for whatever value of too big).

That would get you protection from one process consuming too much ram, and maybe ports (assuming a process is going to link to its ports, you can examine the links), you could also scan ownership of ets tables. You can't really track which process is using up all the atoms though.

mietek · on April 27, 2016

> The cores may be bound to schedulers, through the +sbt flag, which means the schedulers will not "jump around" between cores. It only works on modern operating systems, so OSX can't do it, naturally.

That’s a bit out of date.

https://developer.apple.com/library/mac/releasenotes/Perform...

cpeterso · on April 27, 2016

Note that OS X's thread affinity APIs, unlike other operating systems, don't allow to pin threads to specific processors. You can only define an "affinity set" of threads that should be scheduled on the same processor. Similarly, you can spread threads to different processors by assigning them to different affinity sets.

I wrote some code to do this in Firefox for OS X, Linux, and Windows: https://mxr.mozilla.org/mozilla-central/source/xpcom/threads...

cpeterso · on April 27, 2016

How does Erlang's reduction-counting scheduler work with native-compiled code?

I believe Go's scheduler only preempts goroutines at function call entry points. A goroutine in a tight-loop, that doesn't call any other functions, could block the scheduler.

felixgallo · on April 27, 2016

native code can tell the schedulers how many reductions to take, but the schedulers are actually cooperative and not literally preemptive, so poorly written native code can lie, crash, consume a scheduler, etc.

cpeterso · on April 27, 2016

By native code, I meant Erlang bytecode compiled using HiPE or an LLVM-based JIT like [1], not Erlang calling out to NIFs. I assume the generated code would need to insert preemption checks.

[1] http://www.erlang-factory.com/upload/presentations/516/SF-JI...

felixgallo · on April 27, 2016

I believe the JIT work was experimental. Functions compiled with HiPE are still subject to the reduction laws and calling into a HiPE function still provides a preemption opportunity, so HiPE native functions are essentially indistinguishable from erlang functions.

cpeterso · on April 28, 2016

Interesting. Thanks!