Concurrent vs Parallel vs Sequential
This blog is intended to contain a series of tutorials and posts about using a concurrency library for Haskell: Communicating Haskell Processes, or CHP for short. CHP is a message-passing library that is a bit like MVars and so on in Haskell (but I hope to show that CHP is more powerful) and also has some similarity to Erlang. I’m going to be assuming that you, the reader, know Haskell already and are looking for ways to write concurrent programs.
There is often confusion around concurrency, parallelism and the difference between the two. Peyton Jones says in his “Awkward Squad” paper that parallelism is about performance and is deterministic (will always produce the same results), whereas concurrency is about design and is non-deterministic.
As an example, consider the stock market. On any given day there are thousands of traders buying and selling from each other. Each trader is acting broadly sequentially, but the group of them put together are an interacting concurrent system. If you took the same set of traders and the same stock levels and re-ran the system, you would not be surprised to get a different set of trades during the day. The traders are a concurrent system. In contrast, consider the matter of producing statistics about a day’s trading. There might be millions of trades to process, so you may consider splitting the work between several computers. But you would expect that no matter how you divided up the work, you should always get the same result — anything else would indicate a bug! The challenge is how to divide up the work to get it done fastest. This is a parallel processing problem.
CHP actually somewhat blurs the line between the two. It supports concurrent programming, but if you design your concurrent system well, you can usually get parallel speed-up when the program is executed on a multicore machine. Anyone who has played with parallel programming will know that getting such a speed-up is not always easy, and often a program designed to go faster on multiple cores or multiple machines may actually go slower than it did on one core! So you may wonder why it is worth bothering with a concurrent programming library — why not just try for parallel speed-up later on, if and when you need it, but program sequentially until then. The answer is important:
It is much easier to sequentialise parallel code than it is to parallelise sequential code.
So logically, it is best to start by writing your code as concurrently as possible, and if that turns out to slow your program down, you can sequentialise the concurrent code. If you started with sequential code, bolting on parallelism later is not very viable. The functional programming community have been trying to automatically parallelise their programs since the 1980s, and it does not yet seem to have paid off.
When I say that you should start off programming concurrently, this does not mean counting the number of cores you’re likely to have available (2–8, at the time of writing) and trying to split your program into roughly that many concurrent processes. I mean you should write your program with tens, hundreds, thousands, maybe even millions of concurrnet processes (depending on what fits your program). If it turns out to be too concurrent, you can sequentialise parts. Write your program with as much concurrency as you can find, then worry about performance later. I hope to induct you into this concurrent mindset with this blog, using CHP in Haskell.