CHP vs CML, Forking and Picky Receivers
CHP is a Haskell library that implements message-passing concurrency — and so is CML, a Haskell implementation of the concurrency features of Concurrent ML. Recently, someone asked on reddit what the differences are between the CHP and CML libraries. So in this post, I will perform a comparison by going through the API of the CML Haskell package and seeing how to implement the same functionality in CHP.
CML Events in CHP
CML is organised around Event types. An Event represents a sort of future synchronisation, that can be engaged in using the function sync :: Event a -> IO a. CHP doesn’t have this distinction as such; all events are monadic actions, and you engage in them when you execute them in the CHP monad. So in effect, anything of type Event a in CML is just CHP a in CHP. CML has a choose :: [Event a] -> Event a function for choosing between such events, which is identical to CHP’s function alt :: [CHP a] -> CHP a.
CML has a wrap :: Event a -> (a -> IO b) -> Event b function that allows you to specify a post-synchronisation operation. Given that we have already seen that the Event type is the CHP monad and that we would execute the operation in the CHP monad rather than IO, this operation in CHP would be something like wrapCHP :: CHP a -> (a -> CHP b) -> CHP b. I’m sure many Haskell programmers will recognise this as the type of monadic bind (>>=), and indeed monadic bind has the same semantics of wrap; it can choose and synchronise on the left-hand event, then later execute the right-hand operation with the result.
CML has a guard :: IO (Event a) -> Event a event for specifying a pre-synchronisation operation that is run every time a synchronisation is attempted on the given event. I have not come across this particular feature before, but it would be interesting to see how to achieve a similar effect in CHP if anyone has a use case for the guard function. CML also has a wrapabort :: IO () -> Event a -> Event a that allows you to specify an action to execute if the event is not chosen. Like guard, I am not sure what I might use this function for. It would be relatively easy to wrap up an alt to execute abort actions for non-chosen guards, though.
CML has a “fork and forget” spawning operator, spawn :: IO () -> IO ThreadId, that is really a synonym for forkIO. There are philosophical issues here. Fork and forget, which is present in many concurrency frameworks, can be very useful to set off processes that you just want to set going and not interact with, or to launch daemon processes that should run for the program’s lifetime. One problem with it is that the semantics are not always immediately clear — to what extent are the processes forgotten? For example, what do you think the output of this program will be?
import Control.Concurrent main = forkIO (threadDelay 1000 >> putStrLn "Hello World?")
The question is: will the Haskell program wait for all the forked processes to finish before the main program finishes, i.e. will the text be printed? The answer seems to be: no; on my GHC 6.10.3 system, if I compile the above with -threaded and run it, there is no output, even if I take the delay down to zero. So if you want to wait for a particular forked process to complete before your program ends, you’ll need to add mechanisms to do so yourself. Or rather: when your main program finishes, you had better be sure that all the forked processes have finished, or that you are sure that you don’t care if they are still running.
To help with such issues, CHP offers runParallel, which we’ve seen a lot, but also something that I call scoped forking. There are two basic operations in scoped forking, centred around a monad transformer called ForkingT. (You can use ForkingT on any monad that is a member of the MonadCHP type-class, i.e. that has CHP at the bottom. The most obvious thing is to have the ForkingT CHP monad, but you can also have ForkingT (StateT Int CHP) or similar.) You fork off a process within this transformer using fork :: MonadCHP m => CHP () -> ForkingT m (). This forks off the first parameter, and is an action in the ForkingT transformed monad. You initiate one of these blocks using forking :: MonadCHP m => ForkingT m a -> m a. The important thing is that when the ForkingT block (the first parameter) finishes its own execution, the call to forking does not return until every forked process has also finished. So you can fork off as many processes as you like in the forking block, but at the end, you must wait for them all to finish.
So between runParallel and forking (the only mechanisms for running processes concurrently in CHP), there is always a defined, clear point in your program at which you will wait for any concurrent sub-processes to terminate. When your outer-most runCHP call finishes, it is not possible for these to still be any CHP processes running, so there is no confusion or race hazards involving termination semantics as there is with fork-and-forget functions such as forkIO.
Back to CML — the last item to cover is channels. CML has a type Channel a. It’s not clear from the documentation whether these channels support multiple readers and/or multiple writers, but let’s assume for simplicity that they don’t, and thus that this is the same as CHP’s OneToOneChannel a. There is a channel :: IO (Channel a) function for creating channels which is the same as CHP’s oneToOneChannel :: CHP (OneToOneChannel a). There is also a function to write to a channel, transmit :: Channel a -> a -> Event (), which is the same as CHP’s writeChannel :: Chanout a -> a -> CHP () except that CHP has the concept of channel-ends which CML does not. CHP uses channel-ends to make it clear that when a process is passed a Chanout end, it will be writing to that end of the channel, and cannot read from it. In this way, the directionality of the channels becomes apparent in the types, and there is some static guarantee of the patterns of usage of channels.
CML’s operation for reading from a channel is more interesting: receive :: Channel a -> (a -> Bool) -> Event a. The second parameter is a function that allows you to decide whether you want to engage in reading from the channel, based on the value being offered. Interestingly, the CSP calculus contains this feature, but it is not something I have added to CHP. I believe Erlang effectively also offers something like this (by pattern-matching on messages in a process’s message queue). I can see uses for this, and in CHP you would have to implement something like this by sending values on different channels, and choosing based on the channel the value was being communicated on rather than the value itself. I should perhaps add this to CHP in future, but for now this is a useful CML feature that CHP does not support. Without this aspect, receive is equivalent to CHP’s readChannel :: Chanin a -> CHP a.
The main difference between the features of CML and CHP are that CML is a nice small library, and CHP is a larger library with many more features (barriers, clocks, conjunction, parallel operators, poison, more channel types, traces, and the new higher-level features like the connectable type-class and behaviours). I think CHP subsumes CML for the features I want, but obviously I’m very biased. CML does have the benefit of operating in the IO monad, which can be more convenient than CHP’s eponymous monad. The reason for having the CHP monad is to support poison (more cleanly than using exceptions) and tracing, two features not present in CML.
In terms of the implementation, CHP uses STM and has a lot of logic to support some of the extra features (especially conjunction), but only uses as many threads as you would expect (one per thread of execution). CML uses MVars and has relatively simple algorithms, although they do spawn a lot of threads — from what I can understand of skimming the code, choose spawns off a new thread for every possibility being offered. The transactional events library does something similar. In contrast, CHP spawns off no threads for an alt call, and just performs one (potentially expensive) memory transaction then blocks until it is woken up by another thread offering to synchronise with it. That said, I expect CML may be faster than CHP for common cases.
Differences aside, CML and CHP are similar in their approach to concurrency (I can’t remember if CML was influenced by the CSP calculus, but it certainly has a similar approach to concurrency), so many of the design patterns should transfer between the two libraries — especially if CML’s developers were to add barriers in a future version.