## Progression 0.3: Bar Charts and Normalisation

The CHP posts are currently on a bit of a hiatus while I write up a couple of papers on CHP, and prepare for going to the SIGCSE conference as part of my day job. In the mean time, it seems that a few people have picked up and begun using my Progression Haskell benchmarking library. The problem with users is that they want new features. So, spurred on by a couple of emails, I’ve created a new release of the library: version 0.3.

The changes in version 0.3 are all centred around the graphing (which in turn caused changes to the config and command-line options). John Millikin asked if I could support having versions as the groups on the X axis, and benchmarks as the lines — the opposite to the current arrangement. He made the point that with few benchmarks and many versions this would provide a better visualisation. I’ve exposed the function that maps raw data into graphable data in the configuration, and have also added default (exposed) implementations for Progression’s original transformation and the one that John wanted. Furthermore, I added a function that supports normalisation of the data (scaling by the corresponding benchmark times for the latest recorded version — perhaps I should make this configurable, too?). Normalisation should help if the benchmarks take noticeably different amounts of time — it is used by default, but configurable on the command-line.

Sebastian Fischer also emailed me to offer his opinion that my line graphs were inferior to bar charts, both in theory and in terms of visibility. He’s right, and I have now corrected this by offering a choice between line graphs and bar graphs — I’ve actually made bar graphs the new default. I also made Progression print out its gnuplot script when it draws a graph — that way, if you know gnuplot and want to tweak something further manually, you can start with that script and work from there.

All of the above is configurable through Progression’s Config data type, and via command-line options. Here is a demonstration of a normalised bar graph, with benchmarks as the major groups and versions as sub-groupings (i.e. a graph that uses the new defaults) — click it for a larger version:

As ever, the new version of Progression is available on Hackage — you can also get the source from its public darcs repository: `darcs get http://patch-tag.com/r/twistedsquare/progression`

## Optimising CHP

Recently I have been optimising the core algorithms in CHP. Barrier synchronisations and channel communications in CHP use the same “event” algorithm that supports all of CHP’s features, including choice between events and conjunctions of events. Conjunction really makes the choice algorithms difficult. Here’s a quick example to illustrate. Some terminology: each process wanting to synchronise is “offering” a set of offers — each offer-set is a choice (disjunction) of offers, only one of which should be completed. Each offer is a conjunction of a several events (if you’re not using conjunction, this will be a single event). Here is a test (written in a testing EDSL) from CHP:

testD "Links 3"$do [a, b, c, d, e] <- replicateM 5$evt 2 p <- offer [b] q <- offer [b&c&d&e] r <- offer [c&a, d&a] s <- offer [e] t <- offer [a] always$??

So each of a–e is a two-party event, and there five processes making offers (each list is an offer-set, each list item is a different offer, each offer is a single event or conjunction of them). What should be the result? The event b can complete if p’s single offer and q’s single offer are taken, but for that to work, we’ll also need to be able to complete c, d, and e, in that case. If we look at process r, c can complete if a does (so we’ll need process t), but then d can’t complete as well, because only one of r’s offers may complete.

So the answer is that those offer-sets can’t complete yet, and all processes will have to wait. We want to be able to figure that out quickly, especially in the most common-case where each process is offering just one offer, with just one event (i.e. they want to perform a single channel communication), and the other common case where a process is offering two or three single-event offers. So we want speed — but we also need correctness. Having a bug in such a core algorithm would make CHP very unstable. So far it seems that CHP has been correct in its algorithm, which is why I have been hesistant to optimise it before now. Ideally I would like to prove the algorithm correct but, for now, tests will have to do.

I knew from a recent exchange with a user of the library that CHP was horribly slow when offering choices between appreciable numbers of offers. Ten offers by one process was slow, and twenty would virtually halt the program, which wasn’t right at all. So I set up a simple set of micro-benchmarks; a single channel communication, and others with one “stressed” process offering to communicate with 2, 5, or 10 other processes (who were all eagerly waiting to communicate to the stressed process, and nothing else). None of these things should be slow. I developed my recently-released Progression library to help me see what difference the changes would make.

#### The Optimisation

The old algorithm would look through all the connected offer-sets available (reading them from all the STM TVars) and then see if it could find a completion. I noticed that the offer-sets were being searched in an arbitrary order, rather than by the most likely to complete; a judicious sort at the right point provided surprising speed-up:

On the one hand, I’m pleased to have made such a gain so easily, and on the other I’m embarrassed that such a gain was so readily achievable, and hadn’t been done before. But I wasn’t done yet. The algorithm was needlessly expensive, and this resulted from the way I had decomposed the problem. I had one function that read in the value of all the offer-sets (in the STM monad), one that trimmed these offer-sets to iteratively remove offers that couldn’t complete (a pure function), one that searched for remaining possible completions (also pure) and one that wrote the results to the appropriate places (again, in the STM monad).

This approach previously seemed to break the problem down nicely (and separated the effectful parts from the non-effectful parts), but it’s horribly slow if you take a step back. All offer-sets are read in, even if the first is ready to complete. So you may touch far more TVars than you needed to, then spend time pruning the offers when the first was obviously ready to complete. So I rewrote the algorithm; I now have two (mutually recursive) functions in the STM monad that combine the reading in of offer-sets and the searching (without explicit iterative pruning). This yielded great benefits over the previous algorithm for larger choices; here is the difference between the three versions so far on a *logarithmic scale*:

(As a side note: what horrific time-complexity must my previous algorithm have had to display this curve on a log-scale?!) To my surprise, the details of the algorithm became shorter after I rewrote it to be faster. This was perhaps because I created a new monad to write my algorithm in (a caching, backtracking search monad) which pulled out some of the common logic. With a few extra tweaks, I was able to reduce the speed even further. Here is the final optimisation I settled on, compare to that initial rewrite (back on a linear scale):

See how the performance is now roughly the same for each simpleChoice benchmark, no matter how many offers are being chosen between (because typically in this benchmark, the first one is always ready). I remain a little uneasy about releasing these changes. I have tested the algorithm a lot, but it’s still a rewrite of the core of CHP. I’ll release this soon as CHP 2.0.1 (there is no change in the API, hence the minor version increment). It should be generally faster, but please let me know if you have any odd behaviour that may indicate a bug. (I also plan to do some more optimisation in the future, but this batch will do for now.)

#### Optimisation Techniques

I don’t want to go into all the gory details of what I did to optimise the library, as that would involve explaining the full algorithm — which be a much longer post than I have time for. The changes that brought the most gain were algorithmic. Inserting the sort to change the search order made a big difference, and rewriting the algorithm to exit the search as soon as possible (and touching as few TVars as possible) made most of the remaining difference. After that, I was making changes such as adding a little caching of certain calculations to make sure that events and offers were never unnecessarily processed twice (by using a StateT monad on top of STM). My monad is a backtracking monad (well, sort of — I use an Applicative Alternative instance to explicitly denote searching of several possible paths, with something a lot like a MaybeT monad) but the caching is preserved across the backtracking as you would hope. Another change I made (but may revisit) was switching from using Map and Set from the standard libraries to using ordered lists for the same purpose. In my common cases these maps and sets will be 1-3 items, which is why I was able to see some speed-up from a simpler structure. I originally put these ordered lists in a newtype, but it turned out that removing the newtype wrapper also gained me some speed-up.

It has not escaped my notice that supporting conjunction in CHP’s events algorithm is really an instance of the “other” CSP: a Constraint Satisfaction Problem. But it does have some interesting performance characteristics (e.g. reading from less events can help a lot) and some useful aspects that I can optimise on (e.g. if one process was offering event “e” anywhere in its offers, and you choose one without “e”, “e” can no longer complete; this is not as obvious as it sounds), which is why I have my own algorithm for resolving the choices.

## Progression version 0.2 released

Version 0.2 of Progression is now available on Hackage. There’s a few minor changes; here’s the relevant entry from the ChangeLog:

Fixed a bug where the argument to the -c and -p options were not split on commas as the documentation described. Also added in settings for: drawing the graph with a logarithmic Y axis, plot size, and the way the benchmarks are sorted on the X axis.

Most changes to Progression are likely to touch the Config types, which will therefore entail a major version number bump (in accordance with the PVP).

## Progression: Supporting Optimisation in Haskell

#### Progression

I have recently been working on optimising CHP. In order to optimise a program, you need a set of benchmarks/tasks for which you want to improve performance. Then, you need to record how long the original version takes to complete the benchmark, record how long a changed version takes to complete the benchmark and compare the two to see if your change was sufficiently beneficial. This process is tedious, and staring at two columns of numbers is not always instructive, so I constructed a small Haskell library to help with optimisation. I’ve called it Progression, and it is now available on Hackage. Progression allows you to specify some benchmarks, run them, and graph their performance against each other; so what you get is a graph like this:

Each line is a different version of my CHP library, and each group of points is a different benchmark. (The versions were made in the order purple, blue, green, red; I ultimately stuck with the green version.)

#### Using Progression

Progression uses the excellent Criterion library for doing the benchmarking, so it is used in a similar fashion to Criterion. You construct a small wrapper program that defines/imports your benchmarks and passes them to the Progression library to be run. For example, here is the end of my file containing my CHP benchmarks:

runAll = defaultMain.bgroup "".map (uncurry bench.second runCHP_) main = runAll [("SimpleChannel", simpleChannel) ,("SimpleChoice 2", choice 2) ,("SimpleChoice 5", choice 5) ,("SimpleChoice 10", choice 10) ]

My runAll function turns the second part of each pair from a `CHP ()` type into `IO ()`, then I map the (Criterion) function bench over the list, use the (Criterion) function bgroup to join them all together, then pass them to the (Progression) function defaultMain. I compile this file into an executable.

When you run the program, you can either pass in settings via the command-line arguments, or if they are not present, you will be prompted with an interactive tab-completing prompt (thanks to the easy-to-use Haskeline library). There are three main settings:

- which benchmark groups you want to run (if you only have one group, you won’t be prompted),
- what to label the recording you are about to make, and
- which previous labels you want to compare against in the graph.

Once you’ve entered these two or three items, the benchmarks will all be run by Criterion, the results stored into a CSV file (using the handy txt-sushi library; this way you can easily manually inspect the data or further process it in a spreadsheet program), which is then combined with the previous stored CSV files and fed to GNUplot. (Unfortunately the gnuplot binding in Haskell is not in great shape at the moment, so I’m using a plain system call to run it — if you want graphs, make sure you have GNUplot installed on your system.) The program creates a graph like the one shown at the beginning of the post — by default in the file plot.png (but again, configurable by command-line option). If, later on, you want to just graph previous results against each other, you can do that by running the program with “-m graph”. On the graph you get points plotted at the means (staggered for each benchmark to aid readability and comparison) and error bars for the bounds that Criterion gives — by default these are the 95% confidence intervals, but that is configurable, too (by passing through an option to Criterion).

#### Installing Progression

1. Make sure gnuplot is installed on your system and available in your path. Gnuplot is very likely to be in your package manager if you’re on Linux.

2. `cabal update && cabal install progression`

Note that by default, Criterion uses the Chart library for plotting (a feature of Criterion that Progression does not use), which is in turn dependent on gtk2hs, which can be problematic for some people to install. If you get messages about cairo not being installed and you don’t want to install gtk2hs, you can install Criterion without this support for plotting (and thus without the gtk2hs dependency). The command `cabal install criterion -f-Chart` should do this (the minus between f and Chart is crucial), but unfortunately it seems after that, that you must install progression manually by downloading it and running Setup.lhs (I had hoped `cabal install progression` would work but that seems to attempt to satisfy the Chart dependency even though criterion is by then installed).

#### Sharing Progression

I’ve now uploaded the progression repository onto patch-tag. You can get a copy using `darcs get http://patch-tag.com/r/twistedsquare/progression` if you want the very latest version or wish to contribute some patches.

#### Issues with the 0.1 release

I realise that the graph is technically invalid. I shouldn’t connect the points with a line because the X-axis is discrete, not continuous. However, without the line (i.e. with just points and error bars) it’s much less readable at a glance, and a bar chart with error bars didn’t seem too readable either when I tried it. The graph display still isn’t perfect though; it works best when you have benchmarks that take roughly similar times, and if you make one huge saving on one benchmark, as I did (a factor of about 100), this throws off the whole display. Normalising all the times, so that one of the versions has all its times normalised to one, would partially fix the problems. Also, if you run a lot of benchmarks, the CSV files do start to litter the directory; I wonder if I should store them in a subdirectory.

#### Pipe Dream

That’s the summary of Progression. I hope that Progression helps people who are optimising Haskell programs (especially alongside profiling and tools such as ThreadScope, that can help pinpoint possible candidates for optimisation). But the work-flow is not perfect. Often, not all the benchmarks are written and complete when you start benchmarking; you both make changes and add new benchmarks as you work. Currently, when graphing, Progression ignores any benchmarks for which it does not have data for every recording being graphed (this should perhaps be fixed). Ideally, you would re-run the old version of your library/program (for example, the original version before you began optimising) with the benchmarks to get some data. For this, we need access to an old version. All developers store their projects in version control (and with Haskell, that’s typically darcs) so we could automatically pull out the old version from there rather than making the programmer do it themselves.

So perhaps what we would do is dig through the version history for the latest tag starting “OPT:” (which is the “original” for this current round of optimisation), then re-run that version with the latest benchmarks. In fact, why restrict it to just the original? We could look back to find the last tag with “OPT:”, then work forwards from there, looking for patches, marking out those starting “BENCH:” (the ones that introduce new benchmarks). We would then try and create versions of your program from the OPT tag forwards, with all the different versions since then, but always featuring all the BENCH patches. This would give us complete benchmark data for all old versions, and would also mean you wouldn’t have to tell Progression what the labels are, it could just use the patch names/dates. We could also try different combinations of patches (if you’ve recorded A, B, and C, where B depends on A, try A, A&B, A&B&C, A&C) to see how different independent changes combine together to find the fastest combination.

I’m not sure how much work that would be, but it sounds like it might form quite a useful tool. Or, it might turn out to be too over-the-top and awkward — it might be best to just stick to Progression’s current straightforward design. Comments are welcome below, as are any questions or feedback about the current release of Progression.

## Exploring a Communicating Sequential Processes EDSL

#### Translating Communicating Sequential Processes notation into program code

In my last post I showed the Communicating Sequential Processes (CSP) description of a pipeline of processes; here it is in full:

FULL(prev, next) = FULL_WILLING(prev, next) |~| FULL_STATIONARY(prev, next) FULL_WILLING(prev, next) = prev.empty /\ next.move -> EMPTY (prev, next) [] prev.move /\ next.move -> FULL (prev, next) [] prev.empty /\ next.canstay -> FULL (prev, next) [] prev.canstay /\ next.canstay -> FULL (prev, next) [] prev.muststay /\ next.muststay -> FULL (prev, next) FULL_STATIONARY(prev, next) prev.empty /\ next.muststay -> FULL (prev, next) [] prev.muststay /\ next.muststay -> FULL (prev, next) [] prev.canstay /\ next.muststay -> FULL (prev, next) EMPTY(prev, next) = prev.empty /\ next.empty -> EMPTY (prev, next) [] prev.muststay /\ next.empty -> EMPTY (prev, next) [] prev.move /\ next.empty -> FULL (prev, next) GENERATOR(next) = ON(next) ; OFF(next) ; OFF(next) ; GENERATOR(next) ON(next) = next.move -> SKIP [] next.canstay -> SKIP OFF(next) = next.empty -> SKIP END(prev) = prev.empty -> END(prev) [] prev.muststay -> END(prev) [] prev.move -> END(prev)

I can translate that into Haskell code that uses my CHP library fairly mechanically; here is a snippet:

site :: (Chanin Platelet, EnrolledBarrier, EnrolledBarrier, EnrolledBarrier) -> (Chanout Platelet, EnrolledBarrier, EnrolledBarrier, EnrolledBarrier) -> CHP () site (prevMove, prevEmpty, prevCanStay, prevMustStay) (nextMove, nextEmpty, nextCanStay, nextMustStay) = empty where fullWilling platelet = alt [(syncBarrier prevEmpty<&>writeChannel nextMove platelet)>>empty ,(readChannel prevMove<&>writeChannel nextMove platelet)>>=full.fst ,(syncBarrier prevEmpty<&>syncBarrier nextCanStay)>>full platelet ,(syncBarrier prevCanStay<&>syncBarrier nextCanStay)>>full platelet ,(syncBarrier prevMustStay<&>syncBarrier nextMustStay)>>full platelet ]

Thankfully, CHP already allows us to get quite close to the original CSP, but it is still more verbose than the original. I wanted to see if I could get the code looking as much like the original CSP as possible, and this post details how I went about it.

#### Enter EDSLs

Many people have used Haskell as a host language for Embedded Domain Specific Languages. As far as I understand the term, an EDSL is where you craftily warp the host language to look like a completely different language with all sorts of frighteningly clever hacks. (I’m being very disingenuous to all the designers of very sensible and useful EDSLs, but for me the one that always springs to mind is Lennart Augustsson’s BASIC EDSL.) This post is my small foray into EDSLs, to get the CHP implementation looking as much like CSP as possible. I’m not sure that what I have is actually an EDSL, as it’s fairly specific to this particular piece of CSP, but these are at least EDSL-like techniques.

#### Cobbling together a CSP EDSL

I have a semantic hitch from the start. CSP’s “prev.move” can be viewed as a communication of the “move” value on the “prev” channel. If you take the latter view, the CSP code above is choosing between different messages on the same channel; something that is not possible in CHP (CML has it, but CML doesn’t have conjunction). So instead, we will have to implement the code as a group of events, where “prev.move” is a separate event to “prev.canStay”. There are four possible “sub-events” of prev and next (move, canStay, mustStay, empty) so we need a tuple of four events, with a different direction on the move channel for incoming platelets compared to outgoing platelets (CSP doesn’t have this notion of delineating direction); the remainder of the events can be symmetric two-party barriers:

type In = (Chanin Platelet, EnrolledBarrier, EnrolledBarrier, EnrolledBarrier) type Out = (Chanout Platelet, EnrolledBarrier, EnrolledBarrier, EnrolledBarrier)

Our implementation is going to add important features to the original CSP: we are going to keep track of information about the platelets (so if we are full, we know what we are full with) and we are going to report our status each time frame to a visualisation process. We will come back to that later, but this is why we have channels carrying the Platelet type. To access the latter three events of our quadruples, we use special accessor methods. We pick out the barriers and synchronise on them; CSP does not syntactically separate events and the synchronisation on them (we do in CHP to allow for poisoning, which CSP does not feature), so we shouldn’t here either:

canStay, mustStay, empty :: (a, EnrolledBarrier, EnrolledBarrier, EnrolledBarrier) -> CHP () empty (_, x, _, _) = syncBarrier x canStay (_, _, x, _) = syncBarrier x mustStay (_, _, _, x) = syncBarrier x

We want to be able to refer to the CSP event “next.canStay” using the identical CHP code “next.canStay”. If we look at the types, the first part, next, is our tuple of events, and canStay is an accessor on it. So we want to translate “next.canStay” into “canStay next”, effectively. So we hide the Prelude’s definition of the dot operator and redefine it to do just that:

(.) :: a -> (a -> b) -> b a.b = b a

These mechanisms work fine for the barriers; we will also need a way to use our channels. The difficulty here is that we want to refer to the channel (which will have a different type at each end) and in one the receiving instance access it as an implicitly-synchronising event (like the barriers above), and in the sending instance to leave it is some sort of value that we can then send a value on (borrowing CSP’s “!” operator for sending). So our accessor must be polymorphic both in its argument and return. Hence we need type-classes, with multiple parameters and functional dependencies:

class MoveConnector c r | c -> r where move :: (c, EnrolledBarrier, EnrolledBarrier, EnrolledBarrier) -> r instance MoveConnector (Chanin a) (CHP a) where move (c, _, _, _) = readChannel c instance MoveConnector (Chanout a) (Chanout a) where move (c, _, _, _) = c

This takes care of all that we need. “prev.move” will be of type `CHP Platelet` and “next.move” will be of type `Chanout Platelet`. Our ! operator is straightforward:

(!) :: Chanout a -> a -> CHP () (!) = writeChannel

The pieces of our jigsaw puzzle are almost there. Haskell’s >> sequence operator is already a close enough stand-in for CSP’s -> prefix operator, and the syntactic difficulties of using [] as the choice operator mean that CHP’s <-> will do instead. We will, however, use the CSPc /\ operator to mean the conjunction of the two events. We will return the value from the first of the two, because we only ever want to save the value we receive from the platelet before us (which always appears on the left-hand side of the conjunction in the CSP):

(/\) :: CHP a -> CHP b -> CHP a (/\) p q = fst<$>(p<&>q)

Now we can write our site process:

site :: Chanout (Maybe (Platelet, Bool)) -> In -> Out -> CHP () site obsOut prev next = emptySite where emptySite :: CHP () emptySite = obsOut!Nothing>>((prev.empty/\next.empty)>>emptySite)<->((prev.mustStay/\next.empty)>>emptySite)<->((prev.move/\next.empty)>>=full) full :: Platelet -> CHP () full platelet = do r <- liftIO$randomRIO (0, 1::Double) if r>0.05 then obsOut!(Just (platelet, True))>>((prev.empty/\next.move!platelet)>>emptySite)<->((prev.move/\next.move!platelet)>>=full)<->((prev.empty/\next.canStay)>>full platelet)<->((prev.canStay/\next.canStay)>>full platelet)<->((prev.mustStay/\next.mustStay)>>full platelet) else obsOut!(Just (platelet, False))>>((prev.empty/\next.mustStay)>>full platelet)<->((prev.mustStay/\next.mustStay)>>full platelet)<->((prev.canStay/\next.mustStay)>>full platelet)

The generator and end are also short:

plateletGenerator :: Out -> CHP () plateletGenerator out = forever$on>>off>>off where on = do platelet <- Platelet<$>(liftIO$randomRIO (0.5, 1)) (out.move!platelet)<->(out.canStay) off = out.empty plateletEnd :: In -> CHP () plateletEnd prev = forever$(prev.empty)<->(prev.mustStay)<->(prev.move>>return ())

So we have implemented our CSP model, and made it look quite like the CSP itself. The behaviour is just as seen on previous videos, but now our sticky platelets are implemented tickless. There has been work before on compiling CSP by Fred Barnes, that supports a lot more features of CSP than the quick hacks shown here; this was just a fun exercise to get my code looking more like the CSP that it came from.

## Sticky Platelet Pipeline — Finally Tickless

#### Tick and Tickless

A little while ago I posted an example of a simulation, with platelets moving down a pipeline together. When platelets bumped into each other, they stayed stuck together forever in a *clot*, and moved (or stayed put) together. The original code used a tick barrier, as many of my simulation examples do, to keep the simulation in lock-step. An alternative way to keep the simulation in lock-step is to make sure that every process in the pipeline communicates *exactly once* with its neighbours each timestep, which makes the tick barrier redundant. In this post I will show how to make my previous platelet example tickless, using this alternative method. I will be working in the formal Communicating Sequential Processes algebra, CSP (with a small extension); a Haskell implementation will follow in my next post.

#### The Ticking Version, in CSPc

We will start by looking at the ticking (original) version of the simulation in CSP. In fact, I will be using CSPc (CSP with conjunction); CSP doesn’t have the idea of conjunction in it, so I am using /\ as an added conjunction operator (akin to the logical conjunction operator) that conjoins two events into a single event that will occur when, and only when, both of its constituent events occur. All of the processes in this post are site processes: they represent a piece of space in the pipeline that does not move, and may or may not be occupied by a single platelet. A site that is occupied by a platelet is said to be full; otherwise it is empty. Let’s begin with the full site process:

FULL(prev, next) = FULL_WILLING(prev, next) |~| FULL_STATIONARY(prev, next)

The full process is making an internal choice between being willing to move this time-step and being stationary for the time-step. The internal choice means that FULL will make the decision itself and no other outside process (collectively referred to in CSP as its environment) can influence the decision. In our original Haskell implementation we made the choice randomly; a site would be stationary in 5% of the time-steps, and willing in the other 95% of time-steps. The stationary process refuses to do anything until the end of the time-step at which point it loops round to become the FULL process again:

FULL_STATIONARY(prev, next) = tick -> FULL(prev, next)

In contrast, the willing process offers two choices. It will move forwards if the process before it in the pipeline signals that is empty, or if the process before it in the pipeline is willing to move too. (So it will move if the process before it is empty, or full and willing, but not if the process before it is full and stationary.) We specify this using our conjunction operator:

FULL_WILLING(prev, next) = prev.move /\ next.move -> tick -> FULL(prev, next) [] prev.empty /\ next.move -> tick -> EMPTY(prev, next) [] tick -> FULL(prev, next)

(/\ binds most tightly above, then ->, and [] binds least tightly.) This model is an accurate rendering of my original CHP program, but it contains a race hazard. It is possible that a site that is willing to move in a time-step does not do so and ticks instead; if all the sites in a clot (a contiguous group of full cells) were willing, they could just tick repeatedly and never move at all. The details of the CHP library’s implementation prevented this occurring in my original CHP program (and hence I did not realise there was a race hazard), but it is a sin to rely on a library’s implementation of synchronisation for your concurrent program to be correct. (If I had been able to model-check this code, I could have discovered this problem; see the summary at the end.) The problem could be removed if we gave priority to the movement event; see the previous discussion of priority on this blog and Gavin Lowe’s paper on implementing priority with a small extension to CSP.

The empty site is as long as the full site, but that is because it is repetitive:

EMPTY(prev, next) = prev.move -> ((next.empty -> tick -> FULL(prev, next)) [] (tick -> FULL(prev, next))) [] next.empty -> ((prev.move -> tick -> FULL(prev, next)) [] (tick -> EMPTY(prev, next))) [] tick -> EMPTY(prev, next)

The empty site is willing to optionally accept a movement from behind it in the pipeline and/or signal to the process ahead of it that it is empty, before ticking. Like the full site, this again has the race hazard that it could tick without accepting the movement of a willing platelet from behind it.

To wire up our pipeline, we start with *N* EMPTY sites in a row (with the next event of each connected to the prev event of the following process as you would expect) synchronising on the tick event together with a GENERATOR at the beginning, but the END process does not need to synchronise on the tick event — the latter two processes being defined as follows:

GENERATOR(next) = ON(next) ; OFF(next) ; OFF(next) ; GENERATOR(next) ON(next) = next.move -> tick -> SKIP [] tick -> SKIP OFF(next) = tick -> SKIP END(prev) = prev.move -> END(prev)

The generator sends out a platelet every three steps (and again has the aforementioned problem that EMPTY and FULL have). The END process doesn’t need to synchronise on the tick event because it all it does is synchronise on move as frequently possible; the FULL process already rate-limits itself to one move event per time-step, so this is acceptable behaviour. In this ticking CSP version, we don’t really need this END process at all, but it’s instructive to include it because our tickless version will need one. The CSP up to this point makes up the full model of our original clotting program (minus the wiring, which isn’t very interesting).

#### The Tickless Version, in CSPc

The design of the tickless version is more complicated than the original version. In a simulation with a tick event, we can use implicit signalling. Each process will offer to perform some actions, then eventually it must agree to synchronise on the tick event (if all the processes didn’t eventually tick, we’d get deadlock!). So you can gain information if you offer a choice of event A with your neighbour, or tick with everyone, and the tick happens. This means that your neighbour did not choose to offer event A to you before it offered tick. We often use this implicit information in the simulation. In the previous platelets code, a full site not willing to move would not offer a movement, and would wait for a tick. A full site willing to move would offer to move with its neighbours, but if tick happened instead, it knew that one of its neighbours was not willing to move, and they had implicitly agreed to stay put by synchronising on tick instead. (The idea is good, but my use of it above led to the problem where willing platelets may not end up moving — this can be solved in the tickless version, though.)

If we want to remove the tick event from our pipeline, we therefore have to add more events between the processes, to allow them to explicitly communicate what they used to implicitly communicate. Peter Welch suggested a possible solution — but he saw that his suggestion had the problem now revealed in the original version, mentioned above. I was able to improve on his idea to remove the problem, and I describe my improvement to his solution here. The tickless version involves introducing two new events (besides move and empty) that indicate further information: canstay and muststay.

If there was only one stay value, then that is all that full stationary sites would offer. But willing full sites would also have to offer to stay, in order to synchronise with stationary neighbours (if one platelet in the clot stays they all must). So all willing sites would offer to stay, and this could allow a clot of willing platelets to agree to stay even though they were all willing to move. This is the same problem as we had in our ticking version. To remove the problem, we differentiate the events offered by a full stationary site (who will offer muststay) and a full willing site (who will offer canstay). Here is how they are used in the new FULL_WILLING process:

FULL_WILLING(prev, next) = prev.empty /\ next.move -> EMPTY (prev, next) [] prev.move /\ next.move -> FULL (prev, next) [] prev.empty /\ next.canstay -> FULL (prev, next) [] prev.canstay /\ next.canstay -> FULL (prev, next) [] prev.muststay /\ next.muststay -> FULL (prev, next)

The first two cases are just as before, in the original version. The middle case is for when the process is at the beginning of the clot; it synchronises on empty with the process behind it, and canstay with the process ahead of it in the clot. The last two cases can be thought of as perpetuating the canstay/muststay event through the pipeline.

The new FULL_STATIONARY process is as follows:

FULL_STATIONARY(prev, next) prev.empty /\ next.muststay -> FULL (prev, next) [] prev.muststay /\ next.muststay -> FULL (prev, next) [] prev.canstay /\ next.muststay -> FULL (prev, next)

The first case is for if this process is at the beginning of the clot; it synchronises on empty with the process behind it, and muststay with the process ahead of it. Looking up at the FULL_WILLING process, we can see that any FULL_WILLING process (from the last case) and any FULL_STATIONARY process (from the middle case immediately above) that synchronises on muststay with the process behind it will also synchronise on muststay with the process ahead of it. So if the process at the start of the clot synchronises on muststay, all processes ahead of it in the clot will also synchronise on muststay (by induction).

The third case of the FULL_STATIONARY process indicates that the processes behind the stationary one may offer canstay, and it will then offer muststay to all the processes ahead of it. The canstay event will only be offered from the previous process if it is in the FULL_WILLING state (FULL_STATIONARY only offers muststay to the process ahead of it, and we will see shortly that EMPTY only offers empty to the process ahead of it), which must then synchronise either on canstay with the process behind that (which, again, must be a FULL_WILLING process) or empty (which means it’s the start of the clot). All the full processes after FULL_STATIONARY, following the logic in the previous paragraph, will synchronise on muststay regardless of their state.

The new EMPTY process is as follows:

EMPTY(prev, next) = prev.empty /\ next.empty -> EMPTY (prev, next) [] prev.muststay /\ next.empty -> EMPTY (prev, next) [] prev.move /\ next.empty -> FULL (prev, next)

All cases offer the empty event to the process ahead of it. It will accept from behind: the empty case (when the previous site is empty), the move case (when the previous site is full and able to move forward) and the muststay event (when the previous site is part of a clot that cannot move). It does *not* accept the canstay event, which is crucial, for reasons explained in the next section.

The new ON, OFF and END processes are:

ON(next) = next.move -> SKIP [] next.canstay -> SKIP OFF(next) = next.empty -> SKIP END(prev) = prev.empty -> END(prev) [] prev.muststay -> END(prev) [] prev.move -> END(prev)

You can think of as ON as being the “forward half” of a FULL_WILLING site that is receiving empty from behind it; similarly, OFF is the forward half of an EMPTY site and END is the “backward half” of an EMPTY site.

#### Proofs

Since conjunction is an extra feature in CSP, there is no direct model-checking support for it. (We have designed a mapping from CSPc to CSP, but that causes a state space explosion and does not yet have automated tool support.) I will offer instead, inductive proofs about clots. By proving a statement for the beginning site, and optional middle/end sites based on the neighbour behind them, this should inductively cover all non-empty clots. This can be done by considering the pairings of prev and next events, to see when offering a set of events from the previous site, what might be offered to its next neighbour.

So, let us consider a clot of willing platelets. The site at the beginning of the clot can only synchronise on prev.empty (as that is all the empty site before it will offer). Therefore the site at the beginning of a clot will only offer move or canstay to the next site. Any middle site that synchronises on move or canstay with the previous site will offer the same thing to the next site. So inductively the last site of the clot will only offer move or canstay. We can see that the empty site following the clot will only accept move, not canstay, so a clot of willing processes may only move and may not stay put. This solves the problem that we had with the ticking version, and is why the EMPTY process does not offer to synchronise on canstay. (This result also shows that any line of sites at the beginning of a clot will only offer move or canstay to the sites ahead of it.)

Now let us consider a clot with one or more stationary platelets somewhere along its length (but not the beginning). We have seen in the previous paragraph that the willing sites at the beginning of the pipeline will offer move or canstay to the first stationary site in the clot. This stationary site appearing after these willing sites will only accept canstay, and will then offer muststay ahead of it. We can see that all full sites, stationary and willing, will only synchronise on prev.muststay with next.muststay, so regardless of the stationary/willing state of sites ahead of the first stationary site, muststay will be the event offered at the end of the clot. The empty site will accept this, and so a clot with one or more stationary sites after the beginning will all agree to stay. If a stationary site is at the beginning, it will synchronise on prev.empty and next.muststay, and then the rest of the clot will also synchronise on muststay, so again the clot will stay put. Thus any clot with one or more stationary sites will stay put.

#### Summary

So we have a system, expressed in a few lines of CSPc, of a sticky platelet simulation without a tick event. I will show a translation to CHP in the next post, which works the same as the original version (minus the potential problem). This work is interesting from a formal perspective because we have no direct support to model check this CSP, due to the conjunction extension. We have devised a mapping from CSPc to CSP, but it generates a large number of events; I believe it would be in the order of 4^N for this problem. We don’t have a tool to generate the CSP just yet, and even if we could, I suspect the model checker may choke on that size of problem. However, by taking advantage of conceptual features of the simulation, namely clots, I was able to perform some inductive reasoning about the problem. The reasoning was aided by the neat symmetry of the problem; each site in the pipeline offered a pair of (prev, next) events in conjunction, which could be thought as a sort of normal form.

From a practical perspective, it can be seen that this is not really a very concurrent simulation. The chained conjunction along the pipeline means that all processes must resolve their actions for the time-step together, and really the power of the simulation is in the resolution of all the choices (which could be written as a single sequential/functional piece of code: transforming the list of all platelets at time-step T to the list of all platelets at time-step T+1), not in the concurrency of the sites. The advantage of the way we constructed our solution is that we have encoded the behaviour of each site by only referring to the two neighbours of a site. These are local rules, that when resolved for all sites, produce the emergent behaviour of sticky platelets bumping, forming clots, and advancing down the pipeline together. There is no global visibility of the system in our code (only in the run-time system) to complicate things. This investigation of emergent behaviour is part of the ongoing CoSMoS research project that uses process-oriented technologies to produce this kind of simulation, and which builds on the original work of the TUNA project from which this blood clotting example is taken.

## The Process Composition Monad

#### Process Wiring

Process-oriented programming, as embodied in the CHP library, implements programs as parallel compositions of components. It’s a pleasantly recursive concept; the components themselves are usually parallel compositions of further components. These parallel compositions involve connecting up (or wiring up, to use a more tangible metaphor) the processes so that they can communicate with each other. This can sometimes look a bit spaghetti-like; at worst, it involves declaring all the channels and barriers (effectively naming every wire in the system!) and passing them to the right place, and enrolling on the barriers at the right point. One wrong use of a name (e.g. this recent problem on haskell-cafe, which can all too easily happen) can ruin your program, and the type system can’t protect you if the channels carry the same type.

One way to safe-guard against mis-wiring is to capture common wiring patterns in combinators. Just as you should never write a function that alters every item in a list using explicit recursion (you should use map instead), you should never wire up a pipeline long-hand, and should instead use the connectable combinators. Similarly, there are enrolling helper functions that can ease the use of barriers.

#### Problems with Combinators

Sometimes the combinators in CHP can remain more awkward than I would like. For example, CHP has an `enrollAll` function. For this post, I’m going to simplify it to have this type:

enrollAll :: Barrier -> [EnrolledBarrier -> CHP a] -> CHP [a]

It takes a barrier, then enrolls all the processes in the list on the barrier and runs them all in parallel, returning a list of results. Now imagine that you have two barriers (`a` and `b`), and a list of processes (`ps`) of type `[EnrolledBarrier -> EnrolledBarrier -> CHP a]` and you want to enroll all the processes once on each barrier. It turns out you can’t do this with `enrollAll`; the function doesn’t compose. So I added `enrollAllT`:

enrollAllT :: Barrier -> [EnrolledBarrier -> a] -> ([a] -> CHP b) -> CHP b

This function does compose; we can now run our processes using one of these two lines:

enrollAllT a ps (enrollAll b) enrollAllT a ps (\ps' -> enrollAllT b ps' runParallel)

There are several other functions like `enrollAll` (such as the function for wiring up a pipeline of processes) that shared the same problem of not allowing repeated composition, and that could have similar changes made to fix it. This seems to be a common pattern (partially compose some processes, then delegate to a further composition function), and so I set about trying to capture it somehow. This being Haskell, it was no great surprise that I ended up with a monad.

#### The Composed monad

I came up with the `Composed` monad (in lieu of a better name). Its definition can be written as:

newtype Composed a = Composed { runWith :: forall b. (a -> CHP b) -> CHP b }

I found this slightly confusing at first; I didn’t expect the type `a`, the return type of the monadic action, to be on the left of the `runWith` function looking like a parameter. What this type says, though, is that if you give me a `Composed a` block and a function that can take its return type `a` and turn it into a `CHP` action returning type `b` (i.e. give me: `a -> CHP b`), I will give back a parcelled up `CHP b` action, by feeding my return value to the function you give me. The monad instance is as follows:

instance Monad Composed where return x = Composed ($x) (>>=) m f = Composed (\r -> m`runWith`((`runWith`r).f))

This monad crops up all over the place. It can be written as forall b. ContT b CHP a, the continuation-passing monad transformer on top of CHP. (Dan Piponi has explained why ContT’s cousin Cont has been called the mother of all monads.) Edward Kmett, in turn, labels this the Codensity monad of CHP. (I’m not sure if the continuation-passing monad is usually used how I end up using it here, though.) The monad can be executed with `runWith`, or with a slightly easier helper function:

run :: Composed [CHP a] -> CHP [a] run p = p `runWith` runParallel

The `Composed` monad is also an instance of `MonadCHP`, meaning you can lift CHP actions into it:

instance MonadCHP Composed where liftCHP x = Composed (x >>=)

#### The Two Styles of Use

I can use this new monad in two different styles. One is to return enrolled items from function calls, using functions such as:

enrollR :: Barrier -> Composed EnrolledBarrier enrollR b = Composed (enroll b)

(Point-free style is not possible due to the rank-2 type.) You could then write programs such as:

do a <- liftCHP newBarrier b <- liftCHP newBarrier a0 <- enrollR a b0 <- enrollR b b1 <- enrollR b return [p a0 b0, q b1]

The order in the monad here represents a sort of scoping. When you enroll on the barrier using enrollR, any items later in the monadic “sequence” are within the scope of that enrollment. (After I implemented this, I saw similar code in Maciej Piechotka’s aforementioned haskell-cafe post, so perhaps this is a common way to handle this idiom via a monad.)

That is a fairly simple use of the monad, where the functions return enrolled items and then we wire up. We can also support a slightly different style, with functions such as this:

enrollAllR :: Barrier -> [EnrolledBarrier -> a] -> Composed [a] enrollAllR b ps = Composed (enrollAllT b ps)

Here, the function does not return a list of enrolled barriers, but instead the list of partially wired up processes (of some type `a`, which is typically something along the lines of `EnrolledBarrier -> Chanout Int -> CHP String`, i.e. a CHP process requiring further parameters). So our earlier `enrollAllT a ps (enrollAll b)` line is written in this monad as:

run (enrollAllR a ps >>= enrollAllR b)

Or, for those who like point-free style:

run (enrollAllR b <=< enrollAllR a$ps)

#### Summary

The difference can perhaps best be summarised by showing the types of two versions of `enrollR`:

enrollR :: Barrier -> Composed EnrolledBarrier enrollRS :: Barrier -> (EnrolledBarrier -> a) -> Composed a

The first style returns the enrolled barrier to use later on, whereas the latter takes some sort of process and passes the enrolled barrier into it, returning the process, now with one more argument taken care of. I am still exploring which style is best. The first style seems easier for simple functions, but for more complex functions the latter seems more elegant. I also have slight reservations about how easy it will be for users to get their head around the use of this monad, in either style. Comments are welcome.