Is Parallel Programming Hard, and, If So, What Can You Do About It? [pdf]

aout · on April 3, 2015

Since it's a really big, technical book here are some quick quotes I thought were useful in general:

"This book is a handbook of widely applicable and heavily used design techniques, rather than a collection of optimal algorithms with tiny areas of applicability."

"As a rough rule of thumb, use the simplest tool that will get the job done. If you can, simply program sequentially. If that is insufficient, try using a shell script to mediate parallelism."

"[...] parallelism is but one potential optimization of many. A successful design needs to focus on the most important optimization. Much though I might wish to claim otherwise, that optimization might or might not be parallelism."

It's a very good handbook for advanced developers who use native languages. It basically adresses any problem you might encounter when dealing with sync, locks etc...

Near page 300 the author writes about higher level languages and clearly states the shortcomings of these new types of programming. Not sure what to think about it. Are we really discussing the usage of high level "scripts" to handle heavy computations in the 10 next years? I'd love to hear a math or physics Ph.D about this.

EDIT: I mean, Wolfram is kind of a "ultra high level scripted platform" and seems to work pretty well. Isn't that the future?

gh02t · on April 3, 2015

Well, scripting languages are actually pretty popular even now. I work in uncertainty quantification for nuclear engineering. One common task is to run the same simulation many times with slightly different parameters. Scripting languages like Python are popular for this and they give you easy parallelism.

The current pattern in research is usually to hammer out a prototype in a scripting language, profile, then migrate hot code paths to Fortran or C++. That's only if necessary though, you can get remarkable performance most of the time just using NumPy and similar.

I'm a long time user/lover of Mathematica and now what I guess is the Wolfram language, definitely an expert on it. There's one thing I know about it, which is that it isn't the future, at least not in my industry. Open source is surprisingly popular in the NE research community and as long as WL stays closed, it won't be significant.

acaloiar · on April 3, 2015

Bioinformatics largely takes a similar approach. Because the field is rooted in Statistics, statistical theories are often prototyped in GNU R by Ph.Ds without formal computer science backgrounds. While statistically sound, R implementations often lack the performance properties necessary to crunch large datasets. This is where those of us with backgrounds in computation will optimize and parallelize code using lower level languages. There is no shame in starting from a scripting language. I often prototype ideas using R, Perl or Python before finalizing them in C, Java, OpenCL, etc.

rprospero · on April 3, 2015

You have summoned a physics Ph.D.

I do quite a bit of parallel numeric work, but I'm usually trying to take a calculation measured in hours and change it to one measured in hours, so I'll defer to my colleagues with the multi-year computations when they inevitably arrive to correct my misunderstandings.

While I haven't used the Wolfram language specifically, I did quite a bit of my thesis in Mathematica 8 and became quite familiar with its abilities and limitations. On the one hand, the high level scripted platform did do an amazing job of parallelizing certain classes of computation with no outside effort. However, when you left those classes of problem, it wasn't really any help. In my own work, I had a particularly nasty integral that I needed evaluated numerically. The integral itself only took about an hour to run, but I'd need to run it with multiple sets of parameters that could easily eat up a week of computer time. Now, since I had multiple parameters, Mathematica would intelligently run multiple integrals at the same time, so I could have eight calculations done every hours, instead of one, and it would take a day to run, instead of a week. However, I ran into a situation where the parameter for each integral would depend on the results of the previous one, so running them in parallel was no longer an option. I had to go down to a lower level and speed up the integral itself.

On one level, this should have been trivial. It was a simple, one dimensional integration, evaluated via Monte-Carlo. Let each worker thread pick it's own integration points and reduce the results at the end. It's trivially parallelizable. Except Mathematica wouldn't do it. It was brutally resistant to any attempt at speeding up the integration. I eventually wrote my own parallel Monte-Carlo integrator in another language and used that, because the high level scripted platform was too high level for what I wanted to do.

My colleagues ran into similar issues. They often had large, time consuming calculations that had a recursion somewhere in the process that caused Mathematica to jump back to serial processing. Sometimes I could find a different representation of the problem that the system could skip. More often, I didn't have the time or domain expertise to force Mathematica to see that the task could be run in parallel.

So, on the question of whether an "ultra high level scripted platform" is the future, my answer is "the future of what?" You can only get to high level by choosing a domain to model and the higher your level, the more specific the domain. Now, to prevent misunderstanding, I should mention that these are some massive domains that could benefit greatly from a high level, parallel model. I cannot count the number of times I asked "How do I parallelize this trivially parallelizable program" and not been able to find any answer more specific that "trivially". Yet, even with the advances, the moment that you leave that problem domain, you're still going to be going back to lower level techniques. To steal from the author, we can get big wins by shaving the Mandelbrot set, but there will still be people who need those little side valleys.

kd0amg · on April 3, 2015

Now, to prevent misunderstanding, I should mention that these are some massive domains that could benefit greatly from a high level, parallel model. I cannot count the number of times I asked "How do I parallelize this trivially parallelizable program" and not been able to find any answer more specific that "trivially".

This is a large part of my motivation for working on data-parallel languages -- the parallel programming tools we have are generally good at making easy problems hard. I agree that the domain of highly regular data-parallel computation is large (given the attention going to GPGPU), enough so to warrant its own high-level language. APL already has a convenient and flexible (user-level) programming model for this but carries along enough ad hoc limitations and weird corner cases to make a good parallelizing compiler infeasible.

Maybe someone will eventually come up with a general-purpose parallel language that offers a smooth transition from the easy parallelism to the hard (concurrent) parallelism, but so far everything I've seen that accommodates concurrency complicates the non-concurrent cases in doing so.

chengiz · on April 3, 2015

There is also a 1-column pdf (http://kernel.org/pub/linux/kernel/people/paulmck/perfbook/p...) for those annoyed by the 2-column format. The margins are enormous, but at least you dont have to scroll down then up.

smosher_ · on April 3, 2015

Thanks, my pdf reader can trim margins and make two columns out of one, but it can't make one column out of two.

chengiz · on April 5, 2015

Which reader is this?

smosher_ · on April 5, 2015

Okular

JoeAltmaier · on April 3, 2015

This is about large-calculation parallelism. It covers bus architectures and language primitives.

There's another whole parallel (multi-threaded) conversation that's not about raw compute speed, but about latency. Does your app keep running when somebody holds a mouse button down? Does the API stay live when data is being crunched? Does your service handle hundreds or thousands of client requests serially or in parallel?

Those problems have completely different solutions from those addressed in the OP. The language-primitive approach doesn't begin to touch the issues involved.

randcraw · on April 3, 2015

That's not what I see. The book delves into the coordination of shared resources in a shared space (like RAM). This could take the form of task parallelism for speedup, or more likely in this case, task parallelism for concurrency and resource sharing. Since there's so little discussion in the book of issues related to speedup (task decomposition or resource allocation or load balancing), I'm inclined to think it's of most use to someone building a shared memory RDBMS or the like, where semaphores and locks are critical to avoidance of bugs and data corruption.

JoeAltmaier · on April 3, 2015

Those things are of course at bottom of any multi-task discussion. But they are so far below the actionable layer for service designers as to be moot. For that we need to talk about transactional data, about restartable processes, about context and state and where its kept, from a language and data structure point of view.

lmeyerov · on April 3, 2015

Joe, we normally call that concurrency, not parallelism.

Language primitives do help there as well: we do both types of computations at Graphistry, and for the concurrency side, use promises, FRP, and occasionally, processes.

wasd · on April 3, 2015

That's really helpful. Do you have any suggestions on how to learn more about the latter?

JoeAltmaier · on April 3, 2015

I wish I did. I've spent a lifetime coming up with solutions including languages, app environments and service models that abstract away the difficulties. There exist such models now (didn't 15 years ago) that do some of the work, but they're always designed around some particular problem (clustering and provisioning for instance) and do only part of the job.

The whole job, eg for a service model, includes RAS stuff like failing over and restarting client requests transparently, WITHOUT requiring the service code itself to deal with any of that. I guess Ehrlang and kin are the most recent take on that, meaning an app environment that completely abstracts the parallelism and fault recovery.

mietek · on April 3, 2015

If you mean Erlang, please note it’s been released in 1986, and made open-source in 1998. The underlying actor model was introduced by Carl Hewitt et al in 1973.

http://worrydream.com/refs/Hewitt-ActorModel.pdf

ColinWright · on April 3, 2015

There should be a warning of some sort that this is a 7.5 MB book, not just an article. It just wiped out my mobile roaming.

There's a lot to be said for hiding technical details from the non-technical general public, but there's something to be said for having occasional warnings.

malkia · on April 3, 2015

What's next? Stop signs?

mietek · on April 3, 2015

So that’s how we arrived at “Caution: Slippery When Wet”…

ZenoArrow · on April 3, 2015

The hard part of nearly any parallel activity that aims for order over chaos is synchronisation. It strikes me that part of the problem with parallel programming is that the margin for error with regards to synchronisation can be very slim.

Some would say that we need to improve our code to fit into this margin for error, and I can agree with that, but what if we look at the issue from the other direction. What methods could we employ for increasing our margin of error? What might these methods look like?

If we treat the goal as improving adjustment to synchronisation mistakes, we'll first need to know when adjustments should be made. One way to do that is to predict how long each process is likely to take, then measure the success of this model against real world performance. In this way, a system could get better at synchronisation the longer it runs for. It doesn't matter if the original guess was way off, as long as you correct it with feedback.

skj · on April 3, 2015

> What methods could we employ for increasing our margin of error? What might these methods look like?

These methods and their applications are the field of distributed computing :)

faragon · on April 3, 2015

The book is good. BTW, the title is a bit misleading, in my opinion: handling paralelism at low level is hard, but parallel programing is hard only because of the languages we use are not transparent enough, and because many problems are hard to split in parallel tasks.

fullwedgewhale · on April 3, 2015

I've got the book in my queue to read, but parallel programming is hard. I think it pretty much still stands that there is no way to show that parallel code works, other than to do a formal proof of each section of the code.

ArkyBeagle · on April 3, 2015

Might be worth looking at "Doing Hard Time" by Bruce Powell Douglass. It's drenched in "executable UML" flavor but the toolchains that existed before Rose RT included less of that.

fullwedgewhale · on April 3, 2015

Thanks, I'll take a look at that.

terapinterapin · on April 3, 2015

You could have done the recent Heterogeneous Parallel Programming MOOC ( https://class.coursera.org/hetero-004 ).

PopsiclePete · on April 3, 2015

Don't use a language that doesn't provide you with the tools to handle it?

I don't have many issues with concurrent code in Clojure, or C#, or Go.

But in pre-1.8 Java or Python or Ruby or C++, I'm guessing you're going to have a really, really bad time...

ArkyBeagle · on April 3, 2015

Ignoring issues like potential non-reentrance of code and other library issues, C++ is perfectly capable of being used to make programs that use pthreads. It's not a bad time at all...

Python apparently has threading and I've seen "from miltiprocessing import.." in Python before.

Tcl also has threads, or you can just write things to be event driven. I have never used Tcl to implement a multiprocessor solution, but others have. Because sockets are a first-class object in Tcl, that approach is quite fruitful. I use something you could roughly call "coroutines over a network" daily in Tcl.

deeviant · on April 3, 2015

A huge portion of my concurrent programming experience in the C-realm(not C++, C), and I spent very little time battling the language. I really don't think the language is the biggest aspect to concurrent programming, although it can certainly affect the calculus of level-of-effort.

To be honest, and this may just be that I'm somewhat of a C-centric developer, glib gave me everything I needed for concurrent programming, including the development of a massive, successful and fully concurrent power plant SCADA system.

jbooth · on April 3, 2015

java.util.concurrent has been around since java 1.5

dschiptsov · on April 3, 2015

Learn you some Joe Armstrong's books..)