Archive
Transposing Sequential and Parallel Composition
Recently, I spent some time discussing traces with Marc Smith (we previously wrote two papers on the matter). Something that came up is the idea of representing a sequential composition of parallel processes as a parallel composition of sequential processes. Let me explain…
We, as programmers, often start off reasoning sequentially and then add on reasoning about parallel processes later — usually with much difficulty. Think of producing a control-flow graph for an imperative program, for example. If you have a sequential program, it is usually quite straightforward to form a flow graph for the program. When you try to add parallel processes, you need multiple flow graphs side by side. And they will interact, so really you need to link them together at the points of interaction. It quickly gets quite involved, and leaves you wishing that you didn’t have to involve concurrency.
What Marc and I found is the opposite: our reasoning was straightforward for parallel processes, but sequential composition was a headache. We were trying to represent the trace of a program as a tree, with each node representing either a parallel composition or a process without parallelism. In fact, due to the associativity and commutativity of parallel composition, this type of tree can always be flattened into an unstructured bag of parallel processes:

We then wanted to link processes in the tree that communicated with each other, to work out information about dependence and to look at neighbourhoods of connected processes. That was all fine, but gets quickly difficult when you need to consider sequential composition, specifically the sequential composition of parallel compositions.
Imagine that one of the tree nodes is first a program without parallel composition, and later becomes the parallel composition of two sub-processes, i.e. P = Q;(R || S). We don’t have a way to represent sequence in our neat parallel trees above, and this causes our representation to become awkward. We can’t have a simple ordered sequential list of trees, because if two processes have sequence information, they can proceed in either order, so really we would need a sequential lattice of trees, which is getting nasty.
At this point we wished we didn’t have to deal with sequential composition — at least, not sequential composition that can contain parallel composition. And via a small transformation, it turns out that we don’t. Consider the process: a; ((b; b’) || c); d, where everything in there is assumed to be a communication. We can refactor this into a set of parallel processes (P1 || P2 || P3):
P1 = a; par1start ; par1end ; d
P2 = par1start ; b; b’; par1end
P3 = par1start ; c ; par1end

The key insight is really this: if you are running multiple processes in parallel, it is the same if you start them at the outset of the program, then synchronise with them to start them, as if you set them going when you need them. So even though the process doing c won’t happen until a has, we don’t wait to set that process off partway through the program. We set that process (P3 above) off straight away, then send it a message (par1start) when we want to really set it going. The par1end message makes sure that P1 doesn’t do anything more until P2 and P3 have completed: all three processes synchronise together on the par1start and par1end messages.
With this transformation, we never have any parallel composition inside sequential composition in our traces. All we have is one giant parallel composition of every process in the system, where each process is a sequential composition of events. There is no more nesting than that; we have flattened the process network into an unstructured bag of lists of events. In fact, we can now represent the processes as a graph (with links formed by the communications: the dotted lines in the diagram above). The semantics of the process are unchanged, and all the information about the ordering dependencies is still there if you want it (embodied in the unique parstart/parend communications: the italicised events in the graph above). This form of representation actually makes it really nice to track dependencies and relations throughout the process network, by following the communication links.
Addendum
You could even go one step further, and transform the system into a collection of parallel processes, where each process had at most three events in sequence, where at most one of those would be an event from the original program, i.e. the above would become:
P1A = a ; seq1
P1B = seq1; par1start ; seq2
P1C = seq2; par1end; seq3
P1D = seq3; d
P2A = par1start; b; seq4
P2B = seq4; b’ ; par1end
P3 = par1start ; c ; par1end
I’m not sure how useful this is, though.
Visualising Your Process Network
One advantage of CHP’s process-oriented style of programming is that it lends itself well to visualisation. I don’t make as much use of it on this site as I should, but some previous posts show diagrams with processes (as nodes) joined by channels (as edges). I already have some tracing support in CHP, and I had wondered if this could provide a quick and dirty way to graph a process network. This post shows the results of adding a little support to CHP for process visualisation, which I’m now debating whether or not to keep.
It turned out to be slightly better to add a new trace type (the process-graph trace) to CHP than to build a graph from the existing structural traces. So from the user’s point of view, all that would be added to the library was a new trace type; no changes to an existing program would be required other than switching tracing on and using the process-graph as the trace type. (I also added a labelling function for processes, that works with both structural and process-graph traces.)
Hooking the process-graph trace into the existing tracing infrastructure was so straightforward that it’s not really worth writing about. Unexpectedly, the most difficult part of making this functionality useful was finding something that can draw the process networks well. Process networks tend to be hierarchical, in that each process (node) may contain a sub-network, composed to any depth. These processes are connected via edges. My first inclination for graph visualisation was to reach for graphviz. So I wrote some code to spit out the graphviz file. Graphviz has a dot tool for drawing graphs in lines with rank. On a simple benchmark called numbers, it does quite well:
For larger examples, or those with cycles, its rank-basis can be awkward. Here’s the dining philosophers, rendered in a slightly unusual manner:
Most of the other graphviz tools can’t deal with hierarchical graphs — only the fdp tool can. This seemed promising; a force-directed approach that supported sub-graphs. Here’s the numbers example from above:
Clumsy. And here’s the dining philosophers:
Eugh! This is a tiny bit better when you liberate it from its subgraph (at least, it might be halfway decent if not for the node labels):
But there is a clear solution for the dining philosophers (a star topology with the security guard at its centre), so if fdp makes a hash of this graph, I don’t hold out much hope for more complex process networks. I don’t want to write my own tool for this, but I may if I have to (I’d actually quite like an interactive viewer so that you can drag things around like you can in prefuse, and perhaps involve animation of the network). Any suggestions for beating fdp into shape, or for alternative tools would be most welcome!
Observed Occurrences Only
I said that this technique was quick and dirty. The main flaw (or is it a feature?) is that only the observationally determinable process network is graphed. If you pass a channel to two processes who never communicate on it, that link won’t show up in the graph. The graph is a trace: a record of what happened, not a blueprint of what might have happened. This is not an invalid approach, but I expect it is not what most people want: most would like to see their process network drawn as it is in their program design with all channels shown, not just those that were used. But that tool is more complicated, and will have to wait for another day. (You can extend this hack to do something more like that if you use an annotation and generic programming to capture all processes’ parameters and look through them for channel ends…)
Change and Movement Over Time
The fallacy of drawing the process network in a single image is that it is not static for the duration of the program. Processes can start and end, come and go. Not only that, but CHP channels can be freely communicated down other channels, meaning that programs can dynamically rewire their topology at run-time (I should write some posts about that at some point). So really a proper tool should include support for visualising the process network as it changes over time. But again, that will have to wait for another day.
Concurrent Testing and Tracing: Useful Output for Test Failures
Programmers write testcases for programs to check that the programs work correctly. When a test-case fails, a programmer wants to know two things: first, why specifically did the test fail (which assertion did not hold?), and secondly, what is wrong with the program (or the test case!)?
Last Friday I released CHP 1.5.0, which adds better support for testing, tracing, and the combination of the two. Tracing refers in theoretical terms to recording a computation’s history: practically, it means keeping a log of all the CHP synchronisation events (e.g. channel communications) that happened in your program. CHP supports several types of tracing, but for this post I will use the simplest type: a CSP trace, which is just a linear sequence of events that occurred.
Let’s explore the combination of testing and tracing in practice. Previously I used an example involving a four-item pipeline to find the mode (most common element) of a list. Let’s use this example again, but since this is a post about testing, we’ll introduce an error: we will forget to include the sort process at the front of the pipeline.
To use the tracing framework, you have to augment your code slightly to label parts of your code. This involves using arrLabel instead of arr, and *<<<* instead of <<< (see the remarks at the end of this post for why, and for a way to shorten the tedious labelling):
badMode = (arrLabel "group" group) *>>>* (arrLabel "maxByLength" maxByLength)
*>>>* (arrLabel "head" head)
where
maxByLength = maximumBy (comparing length)
We can then join this pipeline onto one that takes a string, and splits it into words — making a pipeline that looks for the most frequent word:
mostFreqWord = (arrLabel "makeLower" makeLower) *>>>* (arrLabel "words" words) *>>>* badMode
where
makeLower = map toLower
Finally, we can provide a simple test sentence for the pipeline that should pick out the most frequent word, “the”:
testFreqWord = testCHPInOut (const (== "the")) (runPipelineLabel mostFreqWord) "The quick brown fox jumps over the lazy dog" main :: IO () main = runTestTT testFreqWord >> return ()
If we run this output, what do we get? Well, the test case fails, so we are told that, but we are also automatically given the trace of the failing test case:
### Failure: testCHP Failure; trace: < _c4, makeLower->words."the quick brown fox jumps over the lazy dog", words->group.["the","quick","brown","fox","jumps","over","the","lazy","dog"], group->maxByLength.[["the"],["quick"],["brown"],["fox"],["jumps"],["over"],["the"],["lazy"],["dog"]], maxByLength->head.["dog"], _c5 > Cases: 1 Tried: 1 Errors: 0 Failures: 1
So how do we read this trace? The trace is delimited by angle brackets, and is a comma-separated sequence of events (I’ll look to tighten up the grammar in future to allow for automated parsing). The _c4 is an unnamed event at the start (this is the channel used in the test framework to inject the data). Then the following events are channel communications of the form channel-name.data-value. So makeLower->words is the channel name, and "the quick brown fox jumps over the lazy dog" is the data value. The channel names have been automatically generated based on the two processes that the channel connects, and the data value has been printed out using show. We can use this trace to immediately see which values were passed between each process pair, and thus track down the problem. You can see from looking at the trace that group is not grouping the two “the”s together, which is because the list is not sorted. We can easily fix this by putting in the missing sort process. The trace with sort added now works as we would expect:
< _c5, makeLower->words."the quick brown fox jumps over the lazy dog", words->sort.["the","quick","brown","fox","jumps","over","the","lazy","dog"], sort->group.["brown","dog","fox","jumps","lazy","over","quick","the","the"], group->maxByLength.[["brown"],["dog"],["fox"],["jumps"],["lazy"],["over"],["quick"],["the","the"]], maxByLength->head.["the","the"], _c6 >
Tracing is useful for tracking down problems and strange behaviours in your system, so integrating it with the test framework made a lot of sense. You can trace any program, not just test-cases, but I will explore that at a future date. You may observe that tracing (at least a CSP trace) is not too far different from adding print (or Debug.Trace.trace) statements around your program. The main differences are:
- Channels are named when you join together processes, so you don’t have to modify the definition of the original processes to get intelligible trace output (unless you want to trace the internal channels).
- Tracing can be turned on or off at run-time (by changing the top-level call to runCHP), rather than at compile-time with pre-processor macros — but if tracing is switched off, you do not incur any overhead.
- The order in the trace is guaranteed to reflect the real ordering of the channel communications, so you can’t end up with an unrepresentative order that usually happens when you put print statements in a concurrent program.
Remarks: There are several side notes to add to this post. Many readers will probably recoil at my introduction of a *>>>* combinator. This has type:
(*>>>*) :: Show b => ProcessPipelineLabel a b -> ProcessPipelineLabel b c -> ProcessPipelineLabel a c
The show constraint is the reason that I could not declare an arrow instance — so any fix to this problem had to involve declaring a similar combinator outside the arrow, unless I’ve missed another approach.
The naming of the processes can be simplified with the use of a pre-processor. Here are the same process declarations, named using the C pre-processor:
#define PR(p) (arrLabel #p p)
badMode = PR(group) *>>>* PR(maxByLength) *>>>* PR(head)
where
maxByLength = maximumBy (comparing length)
mostFreqWord = PR(makeLower) *>>>* PR(words) *>>>* badMode
where
makeLower = map toLower
Unfortunately, GHC with the -cpp flag doesn’t support this stringize operator so you have to run the C pre-processor on the file first as part of your build process before feeding it to GHC, or use the -F flag with a wrapper script.
Showing the data being passed down the channel obviously forces the data, which may potentially affect the behaviour. Whether you show the data in the trace can be turned on and off — and you can also supply a custom function to show your data if, for example, you are passing around infinite lists (you could just print the first three items). I’ve tried to give the gist here rather than all the details, but the CHP documentation explains the different ways you can record a channel communication in a trace.
I will present this example at the fringe session at the CPA 2009 conference this evening here in Eindhoven. If anyone is interested, last year’s paper on tracing is available in my list of publications, with this year’s publication to be added shortly.




