## Processing without Buffering

While giving a talk last week, someone asked about obeying the arrow laws with CHP processes. The long and the short of it is that the arrow laws mean that `arr id >>> arr id` should be equivalent to `arr id`. But with the instance I have in chp-plus, the former is two processes and the latter one — and each process introduces one place of buffering. If you put in the same input you’ll get the same output with both, but with different buffering behaviour (this is already pointed out in the documentation). The same issue came up again later in the day with process fusion (more on that another time).

#### Extended Inputs

The way to resolve the problem of how much buffering to introduce is quite simple in retrospect: don’t introduce any buffering, anywhere. Then there is no issue of adding or removing it with the laws. We can easily eliminate buffering in CHP using extended input. Extended input means that when you read from a channel, you keep the writer waiting longer than usual, while the reader performs an extra action. The API in CHP is quite straightforward:

extReadChannel :: ReadableChannel r => r a -> (a -> CHP b) -> CHP b

The function takes an incoming channel end, an action to perform with the value, and then performs an extended input using your given action. Note that if you need the read value after the action, you’ll have to return it yourself as part of the “`b`” type.

Here is the relevant extended identity process, compared to the identity process (minus poison handling):

id, extReadId :: Chanin a -> Chanout a -> CHP () id = forever (readChannel input >>= writeChannel output) extReadId = forever (extReadChannel input (writeChannel output))

They share the same type but have slightly different behaviour. The key thing is that two of these extended identity processes composed next to each other have the same behaviour as one of these extended identity processes: `extReadId <=> extReadId = extReadId`. So if we build our arrow instance on these, we can avoid the buffering problem and fully obey the arrow laws.

#### Proof

This claim of identical behaviour can actually be proved using the FDR model checker for CSP. FDR doesn’t know about extended inputs, but an extended input can be formalised using a channel communication followed by an acknowledgement; the extended action takes place before the acknowledgement is sent. So here is the extended identity process from above in FDR script, parameterised by two pairs of data and ack channels:

EXTID(left, leftack, right, rightack) = left?x -> right!x -> rightack?ack -> leftack!ack -> EXTID(left, leftack, right, rightack)

FDR has an awkwardness that all its channels must be declared at the top-level, hence we declare the extra “middle” channels used to compose two of these processes at the top level:

datatype ACK = ack channel midack2C : ACK channel mid2C EXTID2(left, leftack, right, rightack) = EXTID(left, leftack, mid2C, midack2C) [| {| mid2C, midack2C |} |] EXTID(mid2C, midack2C, right, rightack) \ {| mid2C, midack2C |}

The two composed processes synchronise together on `mid2C` and `midack2C` (but not the other channels). We then hide (backslash is hide) these events so that they are not visible outside. Without this hiding we would not have equivalence because some outside process could synchronise on `mid2C` and break everything; hiding makes sure these events are contained. All we then need are some more channel declarations and our assertions of equality:

channel leftackC, rightackC : ACK channel leftC, rightC assert EXTID(leftC, leftackC, rightC, rightackC) [FD= EXTID2(leftC, leftackC, rightC, rightackC) assert EXTID2(leftC, leftackC, rightC, rightackC) [FD= EXTID(leftC, leftackC, rightC, rightackC)

The last two lines express refinement; you can think of this as expressing that one process is as general in its behaviour as another. The refinement operator is anti-symmetric, so expressing it twice in different directions like this is asserting equality of the two processes — a strong relationship. Putting all the above together we can run FDR and get the vital output:

This FDR release is for academic teaching and research purposes only. For any other use, please contact Formal Systems (Europe) Ltd at enquiries@fsel.com to obtain a commercial licence. Checking EXTID(leftC,leftackC,rightC,rightackC) [FD= EXTID2(leftC,leftackC,rightC,rightackC) ...sniptrueChecking EXTID2(leftC,leftackC,rightC,rightackC) [FD= EXTID(leftC,leftackC,rightC,rightackC) ...sniptrue

Inductively, if composing two extended identity processes together is the same as one extended identity process, any chain of N extended identity processes is equivalent to a single one. And the map process (on which the arrow instance is based) has the same communication pattern, so the result applies there too if we construct an extended map process: `extMap f <=> extMap g = extMap (f.g)`

## Functions into processes, using arrows

Pointfree notation is often the most elegant way to write a function in Haskell. Put simply, any time you write code such as:

foo x = f (g (h x))

Or, if you are a dollar fan:

foo x = f $ g $ h x

You can rewrite it as:

foo = f . g . h

Consider this example of a function composition from Neil Mitchell that finds the mode (most common element) of a list:

mostCommon :: Ord a => [a] -> a mostCommon = head.maximumBy (comparing length).group.sort

A nice composition of four functions. Each function in that pipeline of functions is taking a single input and producing a single output, which is then fed into the next function. This pipeline of pure functions is analogous to a pipeline of communicating processes — each taking a single input and sending on a single output to the next process. So is there an easy way of converting such function pipelines into process pipelines? The answer is yes — by using arrows.

Even if you are a Haskell programmer, you may not be familiar with arrows. They can be used to express these input and output compositions. We can convert our function pipeline to use arrow notation by just changing the composition operator:

mostCommonArr1 :: Ord a => [a] -> a mostCommonArr1 = head<<<maximumBy (comparing length)<<<group<<<sort

This is because by good design/happy accident, a function does not need any special annotation to become part of an arrow. If we want to be more general, we must use the arr function to convert pure functions into arrows:

mostCommonArr2 :: Ord a => [a] -> a mostCommonArr2 = arr head<<<arr (maximumBy (comparing length))<<<arr group<<<arr sort

A bit more cumbersome perhaps, but we haven’t changed the original too much — the pipeline of the four functions is still visibly there. Now that we have our function pipeline expressed in terms of arrows, changing to a process pipeline is a relatively simple matter. You’ll need to import the Control.Concurrent.CHP.Arrow module, and then re-type the pipeline to be a CHP process with channels, and stick runPipeline on the front:

mostCommonArrProc :: Ord a => Chanin [a] -> Chanout a -> CHP () mostCommonArrProc = runPipeline$arr head<<<arr (maximumBy (comparing length))<<<arr group<<<arr sort

And that’s it. The function here is now a communicating pipeline of four functions, wrapped up into one. mostCommonArrProc will sit there waiting to be sent a list of items, and once it has been, it will output the most common element of the list. So we’ve re-used our simple pure-function pipeline as a pipeline of communicating processes, with only a little change in notation.

*Note: Since base-4, Arrow has become based on Category, which means you can actually express the function using the original dot composition, like so:*

mostCommonCatProc :: Ord a => Chanin [a] -> Chanout a -> CHP () mostCommonCatProc = runPipeline$arr head.arr (maximumBy (comparing length)).arr group.arr sort

*I prefer the <<< notation of the arrows, as I think it better expresses a pipeline of processes — and it doesn’t require base-4.*