Archive

Author Archive

Accelerate!

March 22, 2010 Leave a comment

I’ve just released chp-2.1.0 and chp-plus-1.1.0. The former features several rounds of optimisation, and the addition of priority for channels and barriers. The latter features a few new instances for Connectable that involve shared channels.

I’ve also now created a public darcs repository on patch-tag that holds the CHP code:

darcs get http://patch-tag.com/r/twistedsquare/chp

While I’m tidying things up: I wired up the blog to a twitter account a while ago and then never told anyone about it. If you want to follow the blog through twitter (I don’t yet use the account for much more than that), you can do so.

Categories: Uncategorized

Processing without Buffering

March 22, 2010 Leave a comment

While giving a talk last week, someone asked about obeying the arrow laws with CHP processes. The long and the short of it is that the arrow laws mean that arr id >>> arr id should be equivalent to arr id. But with the instance I have in chp-plus, the former is two processes and the latter one — and each process introduces one place of buffering. If you put in the same input you’ll get the same output with both, but with different buffering behaviour (this is already pointed out in the documentation). The same issue came up again later in the day with process fusion (more on that another time).

Extended Inputs

The way to resolve the problem of how much buffering to introduce is quite simple in retrospect: don’t introduce any buffering, anywhere. Then there is no issue of adding or removing it with the laws. We can easily eliminate buffering in CHP using extended input. Extended input means that when you read from a channel, you keep the writer waiting longer than usual, while the reader performs an extra action. The API in CHP is quite straightforward:

extReadChannel :: ReadableChannel r => r a -> (a -> CHP b) -> CHP b

The function takes an incoming channel end, an action to perform with the value, and then performs an extended input using your given action. Note that if you need the read value after the action, you’ll have to return it yourself as part of the “b” type.

Here is the relevant extended identity process, compared to the identity process (minus poison handling):

id, extReadId :: Chanin a -> Chanout a -> CHP ()
id = forever (readChannel input >>= writeChannel output)
extReadId = forever (extReadChannel input (writeChannel output))

They share the same type but have slightly different behaviour. The key thing is that two of these extended identity processes composed next to each other have the same behaviour as one of these extended identity processes: extReadId <=> extReadId = extReadId. So if we build our arrow instance on these, we can avoid the buffering problem and fully obey the arrow laws.

Proof

This claim of identical behaviour can actually be proved using the FDR model checker for CSP. FDR doesn’t know about extended inputs, but an extended input can be formalised using a channel communication followed by an acknowledgement; the extended action takes place before the acknowledgement is sent. So here is the extended identity process from above in FDR script, parameterised by two pairs of data and ack channels:

EXTID(left, leftack, right, rightack)
  = left?x -> right!x -> rightack?ack -> leftack!ack
    -> EXTID(left, leftack, right, rightack)

FDR has an awkwardness that all its channels must be declared at the top-level, hence we declare the extra “middle” channels used to compose two of these processes at the top level:

datatype ACK = ack
channel midack2C : ACK
channel mid2C

EXTID2(left, leftack, right, rightack)
  = EXTID(left, leftack, mid2C, midack2C)
    [| {| mid2C, midack2C |} |]
    EXTID(mid2C, midack2C, right, rightack)
    \ {| mid2C, midack2C |}

The two composed processes synchronise together on mid2C and midack2C (but not the other channels). We then hide (backslash is hide) these events so that they are not visible outside. Without this hiding we would not have equivalence because some outside process could synchronise on mid2C and break everything; hiding makes sure these events are contained. All we then need are some more channel declarations and our assertions of equality:

channel leftackC, rightackC : ACK
channel leftC, rightC

assert EXTID(leftC, leftackC, rightC, rightackC)
    [FD= EXTID2(leftC, leftackC, rightC, rightackC)
assert EXTID2(leftC, leftackC, rightC, rightackC)
    [FD= EXTID(leftC, leftackC, rightC, rightackC)

The last two lines express refinement; you can think of this as expressing that one process is as general in its behaviour as another. The refinement operator is anti-symmetric, so expressing it twice in different directions like this is asserting equality of the two processes — a strong relationship. Putting all the above together we can run FDR and get the vital output:

This FDR release is for academic teaching and research purposes only.
For any other use, please contact Formal Systems (Europe) Ltd at
enquiries@fsel.com to obtain a commercial licence.

Checking EXTID(leftC,leftackC,rightC,rightackC) [FD= EXTID2(leftC,leftackC,rightC,rightackC)
...snip
true

Checking EXTID2(leftC,leftackC,rightC,rightackC) [FD= EXTID(leftC,leftackC,rightC,rightackC)
...snip
true

Inductively, if composing two extended identity processes together is the same as one extended identity process, any chain of N extended identity processes is equivalent to a single one. And the map process (on which the arrow instance is based) has the same communication pattern, so the result applies there too if we construct an extended map process: extMap f <=> extMap g = extMap (f.g)

Categories: Uncategorized Tags: , , , ,

Upcoming Talk

March 11, 2010 1 comment

Next Monday morning I’m going to be in Portland (Oregon), and will be giving a talk at Galois on the CHP library. On the off-chance that there are any readers from that area: the talk is open to all, so feel free to come along. It will be an introduction to the library and some of its capabilities.

Update: the talk seemed to go well; here are the Galois talk slides (with notes). I’ve added various notes amongst the slides to help it be understandable without me gesturing at them. The chats afterwards with some of the Galois guys were very useful and interesting, and have given me a lot to think (and blog) about in CHP.

Categories: Uncategorized

Synchronous Channels using MVars

March 8, 2010 2 comments

An MVar in Haskell is a shared variable that is either full, or empty. Trying to write to a full one, or read from an empty one, will cause you to block. It can be used as a one-place buffered asynchronous channel. Consider if you didn’t want choice, or conjunction or any of the fancy features of CHP, but you do want to build a synchronous channel using MVars. How would you do it?

MVars — The Obvious Way

There is a very straightforward way to turn asynchronous channels into synchronous channels: you form a pair of channels, and use one for the writer to send the reader the data, and the other for the reader to send the writer an acknowledgement:

data SimpleChannel a = SimpleChannel (MVar a) (MVar ())

newSimpleChannel :: IO (SimpleChannel a)
newSimpleChannel = liftM2 SimpleChannel newEmptyMVar newEmptyMVar

writeSimpleChannel :: SimpleChannel a -> a -> IO ()
writeSimpleChannel (SimpleChannel msg ack) x
  = putMVar msg x >> takeMVar ack

readSimpleChannel :: SimpleChannel a -> IO a
readSimpleChannel (SimpleChannel msg ack)
  = takeMVar msg <* putMVar ack ()

Let’s assume that context-switching is the major cost in these algorithms, and examine how many times a process must block in the above algorithm. We know that this must be at least one; whoever arrives first will have to block to wait for the second participant.

We’ll start with what happens if the writer arrives first. The writer arrives, puts the value into the data MVar, then blocks waiting for the ack. The reader arrives, takes the data and sends the ack, at which point the writer wakes up. So in this case: one block.

If the reader arrives first, it will block waiting for the data MVar. The writer will arrive, put the value into the data MVar, then block waiting for the ack. The reader will wake up, take the data and send the ack; then the writer will wake up. So here we had two blocks. The writer blocking is unnecessary; if the reader was already there waiting, there is no need for the writer to wait for an ack, it could just deposit the value and go — if it knew the reader was there.

MVars — The Faster Way

There are several ways to remove that second block. One way is to have an MVar as a sort of status MVar. When the reader or writer arrives, they try to put into this MVar. If they succeed, they are first and they wait on a second MVar. If they fail, they are second and act accordingly, emptying the status variable and waking up the first party:

data FastChannel a = FastChannel (MVar (Maybe a)) (MVar ()) (MVar a)

newFastChannel :: IO (FastChannel a)
newFastChannel = liftM3 FastChannel newEmptyMVar newEmptyMVar newEmptyMVar

writeFastChannel :: FastChannel a -> a -> IO ()
writeFastChannel (FastChannel sts ack msg) x
  = do first <- tryPutMVar sts (Just x)
       if first
         then takeMVar ack -- will block
         else takeMVar sts >> putMVar msg x

readFastChannel :: FastChannel a -> IO a
readFastChannel (FastChannel sts ack msg)
  = do first <- tryPutMVar sts Nothing
       if first
         then takeMVar msg -- will block
         else (fromJust <$> takeMVar sts) <* putMVar ack ()

This version is, in my benchmarks, twice as fast as the first version, which suggests that context-switching really is the expensive part of these algorithms. In fact, I started out with the first version in thi spost, but CHP’s more featured and complex algorithms were coming out faster because I only ever block once. It was only when I improved the MVar version to the second one above that the results were as I expected.

Categories: Uncategorized

Choice over Events using STM

March 4, 2010 Leave a comment

I’m currently writing a paper on CHP’s performance for conjunction, which I have been optimising recently. The problem with a new feature like conjunction is that there is nothing else to benchmark it against! But I can benchmark the effect that supporting conjunction has on performance for simple channel communications and other things that don’t feature conjunction.

Two of my comparisons are simple synchronous channels based on MVars and STM. These don’t support choice between events — you can’t choose between writing to two synchronous channels built on MVars or STM without more machinery on top. But they are fast. Another comparison is the CML package, which does support such choice between events — the performance of CML merits its own post some time (in short: fine normally, but terrible if you use its choice operator a lot — unless I have made a mistake in the benchmark).

I also wanted to benchmark an implementation that supported choice but not conjunction, based on STM. Version 1.0.0 of CHP fulfils this criteria, but was badly designed and totally unoptimised — and I know from my recent optimisations how bad the performance might be. So I constructed an optimised version of channels with choice but no conjunction. I was surprised at how short the algorithm was, and realised that it could be explained in a blog post. So here it is.

Implementing Event Choice with STM

Let’s be clear on the problem, first. I want an Event type such that I can say “wait for event A or event B, whichever happens first, but only engage in one of them”. Then I want someone else to be able to concurrently say “wait for event B or event C or event D, whichever happens first, but only engage in one of them” and have the algorithm resolve it all. STM doesn’t achieve this by itself; you can use orElse to choose between writing to two variables, but that doesn’t suffice for multiple people engaging in events with each other.

We begin with a helper function — one of those functions that is general enough that it might almost feature in a standard library. Here it is:

-- | Executes the actions until it finds one that returns True (at which point
-- it will execute no further actions).  Returns True if an action did, False
-- if none of them did.
anyM :: Monad m => [m Bool] -> m Bool
anyM = foldM orM False
  where
    orM True _ = return True
    orM False m = m

Next we’ll declare our data-types. Our Event contains a constant enrollment count (the number of processes required to synchronise together), and a transactional variable holding a list of current offers, each with an associated STM action. An offer is in turn a list of events which uses a ThreadId as a unique identifier; think of an offer as saying: I offer to engage in exactly one of the events in the list, and I’m waiting until I do:

data Offer = Offer { offerThreadId :: ThreadId, offerEvents :: [Event] }
instance Eq Offer where (==) = (==) `on` offerThreadId

data Event = Event { enrollCount :: Int, offersTV :: TVar [(STM (), Offer)] }

Adding an offer to an event is as simple as adding it to the list of offers. My modifyTVar' function has type (a -> a) -> TVar a -> STM () and applies the modification function to the contents of the TVar, but it adds a little strictness that helps performance:

recordOffer :: Offer -> (STM (), Event) -> STM ()
recordOffer o (act, e) = modifyTVar' ((act, o):) (offersTV e)

We also define a function for checking if an event is able to complete (when we are making an offer on it). This takes an event, and an action to perform if the event can complete. It then reads the current offers from the event’s transactional variable. If the enrollment count is equal to the number of current offers plus one (the offer now being made), it can complete. Completion involves performing all the associated STM actions, and then using a revoke function to remove the offers (which have now chosen this event, say: A) from all the events that they had offered on (e.g. event A, event B, event C):

checkComplete :: (STM (), Event) -> STM Bool
checkComplete (act, e)
  = do offers <- readTVar (offersTV e)
       if enrollCount e /= length offers + 1
         then return False
         else do sequence_ (act : map fst offers)
                 mapM_ (revoke . snd) offers
                 return True

revoke :: Offer -> STM ()
revoke off = mapM_ (modifyTVar' removeUs . offersTV) (offerEvents off)
  where
    removeUs = filter ((/= off) . snd)

We only require one further function. This function, offerAll, handles the creation of an offer, checks if any of the events in the offer can complete immediately, and otherwise records the offers in the event then waits for one of them to be completed by a later participants. It must use two transactions for this; one to make the offers (this transaction needs to finish for it to be visible to the other participants) and one to wait for an event to be completed. A crucial part of the function is not just knowing that an offer completed, but also knowing which one. For this we construct a TVar of our own into which a result can be written. This starts off as Nothing, and we later wait for it to become a Just value. We augment the user-supplied action-on-completion to also write into this TVar. The design of the algorithm as a whole ensures that this variable will only ever be written to once. Here is offerAll:

offerAll :: [(STM (), Event, a)] -> IO a
offerAll off
  = do tid <- myThreadId
       rtv <- atomically $ checkAll tid
       atomically $ readTVar rtv >>= maybe retry return    
  where
    checkAll tid
      = do rtv <- newTVar Nothing
           let offer = [(act >> writeTVar rtv (Just x), e) | (act, e, x) <- off]
           complete <- anyM (map checkComplete offer)
           unless complete $
              mapM_ (recordOffer (Offer tid [e | (_, e, _) <- off])) offer
           return rtv

This is all that is needed for events with choice at both ends. You call offerAll with a list of offers and it gives you back the value you associated with that offer.

The Public API

To wrap this into a communication channel with a CML-like API, we wrap it up as follows. First we declare an SEvent type (named after CML, hence the re-use of the term event for another meaning) that represents a synchronisation action; this is a list (of choices), each containing an internal event, an action to perform during the completion of the offer, and one to perform afterwards that will yield a return value (which we can use for a Functor instance):

data SEvent a = SEvent { sEvent :: [((STM (), STM a), Event)] }

instance Functor SEvent where
  fmap f (SEvent es) = SEvent [((dur, fmap f aft), e) | ((dur, aft), e) <- es]

choose :: [SEvent a] -> SEvent a
choose = SEvent . concatMap sEvent

You can see that the choose function simply joins lists of choices together. We define our synchronisation function using offerAll, which will return the corresponding afterwards-STM-action for the chosen event, which we then execute using atomically:

sync :: SEvent a -> IO a
sync (SEvent es) = offerAll [(dur,e,aft) | ((dur,aft),e) <- es] >>= atomically

Finally we can define a type for a synchronous communication channel, SChannel that joins together an event (the internal kind) and a transactional variable for passing the value:

data SChannel a = SChannel Event (TVar a)

send :: SChannel a -> a -> SEvent ()
send (SChannel e ctv) x = SEvent [((writeTVar ctv x, return ()), e)]

recv :: SChannel a -> SEvent a
recv (SChannel e ctv) = SEvent [((return (), readTVar ctv), e)]

The send function puts the value to send into the variable during the original event completion, and then afterwards the reader takes the value out of the variable at its leisure. (The implementation assumes that the same participants will use the channel each time; an extra level of indirection could be added to make the implementation more flexible in this regard.)

The code in this post provides nearly the same functionality as the CML library, but my tests indicate it is faster. I have now uploaded this code (with some documentation) as the sync package on Hackage. This provides a useful “lite” alternative to CHP that runs in the IO monad, and an alternative implementation of most of the features of the CML package.

Categories: Uncategorized Tags: , , ,

Progression 0.3: Bar Charts and Normalisation

March 2, 2010 4 comments

The CHP posts are currently on a bit of a hiatus while I write up a couple of papers on CHP, and prepare for going to the SIGCSE conference as part of my day job. In the mean time, it seems that a few people have picked up and begun using my Progression Haskell benchmarking library. The problem with users is that they want new features. So, spurred on by a couple of emails, I’ve created a new release of the library: version 0.3.

The changes in version 0.3 are all centred around the graphing (which in turn caused changes to the config and command-line options). John Millikin asked if I could support having versions as the groups on the X axis, and benchmarks as the lines — the opposite to the current arrangement. He made the point that with few benchmarks and many versions this would provide a better visualisation. I’ve exposed the function that maps raw data into graphable data in the configuration, and have also added default (exposed) implementations for Progression’s original transformation and the one that John wanted. Furthermore, I added a function that supports normalisation of the data (scaling by the corresponding benchmark times for the latest recorded version — perhaps I should make this configurable, too?). Normalisation should help if the benchmarks take noticeably different amounts of time — it is used by default, but configurable on the command-line.

Sebastian Fischer also emailed me to offer his opinion that my line graphs were inferior to bar charts, both in theory and in terms of visibility. He’s right, and I have now corrected this by offering a choice between line graphs and bar graphs — I’ve actually made bar graphs the new default. I also made Progression print out its gnuplot script when it draws a graph — that way, if you know gnuplot and want to tweak something further manually, you can start with that script and work from there.

All of the above is configurable through Progression’s Config data type, and via command-line options. Here is a demonstration of a normalised bar graph, with benchmarks as the major groups and versions as sub-groupings (i.e. a graph that uses the new defaults) — click it for a larger version:

As ever, the new version of Progression is available on Hackage — you can also get the source from its public darcs repository: darcs get http://patch-tag.com/r/twistedsquare/progression

Categories: Benchmarking Tags:

Optimising CHP

February 11, 2010 1 comment

Recently I have been optimising the core algorithms in CHP. Barrier synchronisations and channel communications in CHP use the same “event” algorithm that supports all of CHP’s features, including choice between events and conjunctions of events. Conjunction really makes the choice algorithms difficult. Here’s a quick example to illustrate. Some terminology: each process wanting to synchronise is “offering” a set of offers — each offer-set is a choice (disjunction) of offers, only one of which should be completed. Each offer is a conjunction of a several events (if you’re not using conjunction, this will be a single event). Here is a test (written in a testing EDSL) from CHP:

       testD "Links 3" $ do
         [a, b, c, d, e] <- replicateM 5 $ evt 2
         p <- offer [b]
         q <- offer [b&c&d&e]
         r <- offer [c&a, d&a]
         s <- offer [e]
         t <- offer [a]
         always $ ??

So each of a–e is a two-party event, and there five processes making offers (each list is an offer-set, each list item is a different offer, each offer is a single event or conjunction of them). What should be the result? The event b can complete if p’s single offer and q’s single offer are taken, but for that to work, we’ll also need to be able to complete c, d, and e, in that case. If we look at process r, c can complete if a does (so we’ll need process t), but then d can’t complete as well, because only one of r’s offers may complete.

So the answer is that those offer-sets can’t complete yet, and all processes will have to wait. We want to be able to figure that out quickly, especially in the most common-case where each process is offering just one offer, with just one event (i.e. they want to perform a single channel communication), and the other common case where a process is offering two or three single-event offers. So we want speed — but we also need correctness. Having a bug in such a core algorithm would make CHP very unstable. So far it seems that CHP has been correct in its algorithm, which is why I have been hesistant to optimise it before now. Ideally I would like to prove the algorithm correct but, for now, tests will have to do.

I knew from a recent exchange with a user of the library that CHP was horribly slow when offering choices between appreciable numbers of offers. Ten offers by one process was slow, and twenty would virtually halt the program, which wasn’t right at all. So I set up a simple set of micro-benchmarks; a single channel communication, and others with one “stressed” process offering to communicate with 2, 5, or 10 other processes (who were all eagerly waiting to communicate to the stressed process, and nothing else). None of these things should be slow. I developed my recently-released Progression library to help me see what difference the changes would make.

The Optimisation

The old algorithm would look through all the connected offer-sets available (reading them from all the STM TVars) and then see if it could find a completion. I noticed that the offer-sets were being searched in an arbitrary order, rather than by the most likely to complete; a judicious sort at the right point provided surprising speed-up:

On the one hand, I’m pleased to have made such a gain so easily, and on the other I’m embarrassed that such a gain was so readily achievable, and hadn’t been done before. But I wasn’t done yet. The algorithm was needlessly expensive, and this resulted from the way I had decomposed the problem. I had one function that read in the value of all the offer-sets (in the STM monad), one that trimmed these offer-sets to iteratively remove offers that couldn’t complete (a pure function), one that searched for remaining possible completions (also pure) and one that wrote the results to the appropriate places (again, in the STM monad).

This approach previously seemed to break the problem down nicely (and separated the effectful parts from the non-effectful parts), but it’s horribly slow if you take a step back. All offer-sets are read in, even if the first is ready to complete. So you may touch far more TVars than you needed to, then spend time pruning the offers when the first was obviously ready to complete. So I rewrote the algorithm; I now have two (mutually recursive) functions in the STM monad that combine the reading in of offer-sets and the searching (without explicit iterative pruning). This yielded great benefits over the previous algorithm for larger choices; here is the difference between the three versions so far on a logarithmic scale:

(As a side note: what horrific time-complexity must my previous algorithm have had to display this curve on a log-scale?!) To my surprise, the details of the algorithm became shorter after I rewrote it to be faster. This was perhaps because I created a new monad to write my algorithm in (a caching, backtracking search monad) which pulled out some of the common logic. With a few extra tweaks, I was able to reduce the speed even further. Here is the final optimisation I settled on, compare to that initial rewrite (back on a linear scale):

See how the performance is now roughly the same for each simpleChoice benchmark, no matter how many offers are being chosen between (because typically in this benchmark, the first one is always ready). I remain a little uneasy about releasing these changes. I have tested the algorithm a lot, but it’s still a rewrite of the core of CHP. I’ll release this soon as CHP 2.0.1 (there is no change in the API, hence the minor version increment). It should be generally faster, but please let me know if you have any odd behaviour that may indicate a bug. (I also plan to do some more optimisation in the future, but this batch will do for now.)

Optimisation Techniques

I don’t want to go into all the gory details of what I did to optimise the library, as that would involve explaining the full algorithm — which be a much longer post than I have time for. The changes that brought the most gain were algorithmic. Inserting the sort to change the search order made a big difference, and rewriting the algorithm to exit the search as soon as possible (and touching as few TVars as possible) made most of the remaining difference. After that, I was making changes such as adding a little caching of certain calculations to make sure that events and offers were never unnecessarily processed twice (by using a StateT monad on top of STM). My monad is a backtracking monad (well, sort of — I use an Applicative Alternative instance to explicitly denote searching of several possible paths, with something a lot like a MaybeT monad) but the caching is preserved across the backtracking as you would hope. Another change I made (but may revisit) was switching from using Map and Set from the standard libraries to using ordered lists for the same purpose. In my common cases these maps and sets will be 1-3 items, which is why I was able to see some speed-up from a simpler structure. I originally put these ordered lists in a newtype, but it turned out that removing the newtype wrapper also gained me some speed-up.

It has not escaped my notice that supporting conjunction in CHP’s events algorithm is really an instance of the “other” CSP: a Constraint Satisfaction Problem. But it does have some interesting performance characteristics (e.g. reading from less events can help a lot) and some useful aspects that I can optimise on (e.g. if one process was offering event “e” anywhere in its offers, and you choose one without “e”, “e” can no longer complete; this is not as obvious as it sounds), which is why I have my own algorithm for resolving the choices.

Progression version 0.2 released

February 8, 2010 Leave a comment

Version 0.2 of Progression is now available on Hackage. There’s a few minor changes; here’s the relevant entry from the ChangeLog:

Fixed a bug where the argument to the -c and -p options were not split on commas as the documentation described. Also added in settings for: drawing the graph with a logarithmic Y axis, plot size, and the way the benchmarks are sorted on the X axis.

Most changes to Progression are likely to touch the Config types, which will therefore entail a major version number bump (in accordance with the PVP).

Categories: Uncategorized Tags:

Progression: Supporting Optimisation in Haskell

February 4, 2010 8 comments

Progression

I have recently been working on optimising CHP. In order to optimise a program, you need a set of benchmarks/tasks for which you want to improve performance. Then, you need to record how long the original version takes to complete the benchmark, record how long a changed version takes to complete the benchmark and compare the two to see if your change was sufficiently beneficial. This process is tedious, and staring at two columns of numbers is not always instructive, so I constructed a small Haskell library to help with optimisation. I’ve called it Progression, and it is now available on Hackage. Progression allows you to specify some benchmarks, run them, and graph their performance against each other; so what you get is a graph like this:

Each line is a different version of my CHP library, and each group of points is a different benchmark. (The versions were made in the order purple, blue, green, red; I ultimately stuck with the green version.)

Using Progression

Progression uses the excellent Criterion library for doing the benchmarking, so it is used in a similar fashion to Criterion. You construct a small wrapper program that defines/imports your benchmarks and passes them to the Progression library to be run. For example, here is the end of my file containing my CHP benchmarks:

runAll = defaultMain . bgroup "" . map (uncurry bench . second runCHP_)
main = runAll [("SimpleChannel", simpleChannel)
              ,("SimpleChoice 2", choice 2)
              ,("SimpleChoice 5", choice 5)
              ,("SimpleChoice 10", choice 10)
              ]

My runAll function turns the second part of each pair from a CHP () type into IO (), then I map the (Criterion) function bench over the list, use the (Criterion) function bgroup to join them all together, then pass them to the (Progression) function defaultMain. I compile this file into an executable.

When you run the program, you can either pass in settings via the command-line arguments, or if they are not present, you will be prompted with an interactive tab-completing prompt (thanks to the easy-to-use Haskeline library). There are three main settings:

  • which benchmark groups you want to run (if you only have one group, you won’t be prompted),
  • what to label the recording you are about to make, and
  • which previous labels you want to compare against in the graph.

Once you’ve entered these two or three items, the benchmarks will all be run by Criterion, the results stored into a CSV file (using the handy txt-sushi library; this way you can easily manually inspect the data or further process it in a spreadsheet program), which is then combined with the previous stored CSV files and fed to GNUplot. (Unfortunately the gnuplot binding in Haskell is not in great shape at the moment, so I’m using a plain system call to run it — if you want graphs, make sure you have GNUplot installed on your system.) The program creates a graph like the one shown at the beginning of the post — by default in the file plot.png (but again, configurable by command-line option). If, later on, you want to just graph previous results against each other, you can do that by running the program with “-m graph”. On the graph you get points plotted at the means (staggered for each benchmark to aid readability and comparison) and error bars for the bounds that Criterion gives — by default these are the 95% confidence intervals, but that is configurable, too (by passing through an option to Criterion).

Installing Progression

1. Make sure gnuplot is installed on your system and available in your path. Gnuplot is very likely to be in your package manager if you’re on Linux.
2. cabal update && cabal install progression

Note that by default, Criterion uses the Chart library for plotting (a feature of Criterion that Progression does not use), which is in turn dependent on gtk2hs, which can be problematic for some people to install. If you get messages about cairo not being installed and you don’t want to install gtk2hs, you can install Criterion without this support for plotting (and thus without the gtk2hs dependency). The command cabal install criterion -f-Chart should do this (the minus between f and Chart is crucial), but unfortunately it seems after that, that you must install progression manually by downloading it and running Setup.lhs (I had hoped cabal install progression would work but that seems to attempt to satisfy the Chart dependency even though criterion is by then installed).

Sharing Progression

I’ve now uploaded the progression repository onto patch-tag. You can get a copy using darcs get http://patch-tag.com/r/twistedsquare/progression if you want the very latest version or wish to contribute some patches.

Issues with the 0.1 release

I realise that the graph is technically invalid. I shouldn’t connect the points with a line because the X-axis is discrete, not continuous. However, without the line (i.e. with just points and error bars) it’s much less readable at a glance, and a bar chart with error bars didn’t seem too readable either when I tried it. The graph display still isn’t perfect though; it works best when you have benchmarks that take roughly similar times, and if you make one huge saving on one benchmark, as I did (a factor of about 100), this throws off the whole display. Normalising all the times, so that one of the versions has all its times normalised to one, would partially fix the problems. Also, if you run a lot of benchmarks, the CSV files do start to litter the directory; I wonder if I should store them in a subdirectory.

Pipe Dream

That’s the summary of Progression. I hope that Progression helps people who are optimising Haskell programs (especially alongside profiling and tools such as ThreadScope, that can help pinpoint possible candidates for optimisation). But the work-flow is not perfect. Often, not all the benchmarks are written and complete when you start benchmarking; you both make changes and add new benchmarks as you work. Currently, when graphing, Progression ignores any benchmarks for which it does not have data for every recording being graphed (this should perhaps be fixed). Ideally, you would re-run the old version of your library/program (for example, the original version before you began optimising) with the benchmarks to get some data. For this, we need access to an old version. All developers store their projects in version control (and with Haskell, that’s typically darcs) so we could automatically pull out the old version from there rather than making the programmer do it themselves.

So perhaps what we would do is dig through the version history for the latest tag starting “OPT:” (which is the “original” for this current round of optimisation), then re-run that version with the latest benchmarks. In fact, why restrict it to just the original? We could look back to find the last tag with “OPT:”, then work forwards from there, looking for patches, marking out those starting “BENCH:” (the ones that introduce new benchmarks). We would then try and create versions of your program from the OPT tag forwards, with all the different versions since then, but always featuring all the BENCH patches. This would give us complete benchmark data for all old versions, and would also mean you wouldn’t have to tell Progression what the labels are, it could just use the patch names/dates. We could also try different combinations of patches (if you’ve recorded A, B, and C, where B depends on A, try A, A&B, A&B&C, A&C) to see how different independent changes combine together to find the fastest combination.

I’m not sure how much work that would be, but it sounds like it might form quite a useful tool. Or, it might turn out to be too over-the-top and awkward — it might be best to just stick to Progression’s current straightforward design. Comments are welcome below, as are any questions or feedback about the current release of Progression.

Exploring a Communicating Sequential Processes EDSL

January 27, 2010 2 comments

Translating Communicating Sequential Processes notation into program code

In my last post I showed the Communicating Sequential Processes (CSP) description of a pipeline of processes; here it is in full:

FULL(prev, next)
  = FULL_WILLING(prev, next) |~| FULL_STATIONARY(prev, next)

FULL_WILLING(prev, next)
  = prev.empty /\ next.move -> EMPTY (prev, next)
     [] prev.move /\ next.move -> FULL (prev, next)
     [] prev.empty /\ next.canstay -> FULL (prev, next)
     [] prev.canstay /\ next.canstay -> FULL (prev, next)
     [] prev.muststay /\ next.muststay -> FULL (prev, next)

FULL_STATIONARY(prev, next)
     prev.empty /\ next.muststay -> FULL (prev, next)
     [] prev.muststay /\ next.muststay -> FULL (prev, next)
     [] prev.canstay /\ next.muststay -> FULL (prev, next) 


EMPTY(prev, next)
  = prev.empty /\ next.empty -> EMPTY (prev, next)
  [] prev.muststay /\ next.empty -> EMPTY (prev, next)
  [] prev.move /\ next.empty -> FULL (prev, next) 

GENERATOR(next) = ON(next) ; OFF(next) ; OFF(next) ; GENERATOR(next)
ON(next) = next.move -> SKIP [] next.canstay -> SKIP
OFF(next) = next.empty -> SKIP

END(prev)
  = prev.empty -> END(prev)
  [] prev.muststay -> END(prev)
  [] prev.move -> END(prev)

I can translate that into Haskell code that uses my CHP library fairly mechanically; here is a snippet:

site :: (Chanin Platelet, EnrolledBarrier, EnrolledBarrier, EnrolledBarrier)
     -> (Chanout Platelet, EnrolledBarrier, EnrolledBarrier, EnrolledBarrier)
     -> CHP ()
site (prevMove, prevEmpty, prevCanStay, prevMustStay)
     (nextMove, nextEmpty, nextCanStay, nextMustStay)
  = empty
 where
 fullWilling platelet = alt
  [(syncBarrier prevEmpty <&> writeChannel nextMove platelet) >> empty
  ,(readChannel prevMove <&> writeChannel nextMove platelet) >>= full . fst
  ,(syncBarrier prevEmpty <&> syncBarrier nextCanStay) >> full platelet
  ,(syncBarrier prevCanStay <&> syncBarrier nextCanStay) >> full platelet
  ,(syncBarrier prevMustStay <&> syncBarrier nextMustStay) >> full platelet
  ]

Thankfully, CHP already allows us to get quite close to the original CSP, but it is still more verbose than the original. I wanted to see if I could get the code looking as much like the original CSP as possible, and this post details how I went about it.

Enter EDSLs

Many people have used Haskell as a host language for Embedded Domain Specific Languages. As far as I understand the term, an EDSL is where you craftily warp the host language to look like a completely different language with all sorts of frighteningly clever hacks. (I’m being very disingenuous to all the designers of very sensible and useful EDSLs, but for me the one that always springs to mind is Lennart Augustsson’s BASIC EDSL.) This post is my small foray into EDSLs, to get the CHP implementation looking as much like CSP as possible. I’m not sure that what I have is actually an EDSL, as it’s fairly specific to this particular piece of CSP, but these are at least EDSL-like techniques.

Cobbling together a CSP EDSL

I have a semantic hitch from the start. CSP’s “prev.move” can be viewed as a communication of the “move” value on the “prev” channel. If you take the latter view, the CSP code above is choosing between different messages on the same channel; something that is not possible in CHP (CML has it, but CML doesn’t have conjunction). So instead, we will have to implement the code as a group of events, where “prev.move” is a separate event to “prev.canStay”. There are four possible “sub-events” of prev and next (move, canStay, mustStay, empty) so we need a tuple of four events, with a different direction on the move channel for incoming platelets compared to outgoing platelets (CSP doesn’t have this notion of delineating direction); the remainder of the events can be symmetric two-party barriers:

type In = (Chanin Platelet, EnrolledBarrier, EnrolledBarrier, EnrolledBarrier)
type Out = (Chanout Platelet, EnrolledBarrier, EnrolledBarrier, EnrolledBarrier)

Our implementation is going to add important features to the original CSP: we are going to keep track of information about the platelets (so if we are full, we know what we are full with) and we are going to report our status each time frame to a visualisation process. We will come back to that later, but this is why we have channels carrying the Platelet type. To access the latter three events of our quadruples, we use special accessor methods. We pick out the barriers and synchronise on them; CSP does not syntactically separate events and the synchronisation on them (we do in CHP to allow for poisoning, which CSP does not feature), so we shouldn’t here either:

canStay, mustStay, empty ::
  (a, EnrolledBarrier, EnrolledBarrier, EnrolledBarrier) -> CHP ()
empty (_, x, _, _) = syncBarrier x
canStay (_, _, x, _) = syncBarrier x
mustStay (_, _, _, x) = syncBarrier x

We want to be able to refer to the CSP event “next.canStay” using the identical CHP code “next.canStay”. If we look at the types, the first part, next, is our tuple of events, and canStay is an accessor on it. So we want to translate “next.canStay” into “canStay next”, effectively. So we hide the Prelude’s definition of the dot operator and redefine it to do just that:

(.) :: a -> (a -> b) -> b
a . b = b a

These mechanisms work fine for the barriers; we will also need a way to use our channels. The difficulty here is that we want to refer to the channel (which will have a different type at each end) and in one the receiving instance access it as an implicitly-synchronising event (like the barriers above), and in the sending instance to leave it is some sort of value that we can then send a value on (borrowing CSP’s “!” operator for sending). So our accessor must be polymorphic both in its argument and return. Hence we need type-classes, with multiple parameters and functional dependencies:

class MoveConnector c r | c -> r where
  move :: (c, EnrolledBarrier, EnrolledBarrier, EnrolledBarrier) -> r

instance MoveConnector (Chanin a) (CHP a) where
  move (c, _, _, _) = readChannel c

instance MoveConnector (Chanout a) (Chanout a) where
  move (c, _, _, _) = c

This takes care of all that we need. “prev.move” will be of type CHP Platelet and “next.move” will be of type Chanout Platelet. Our ! operator is straightforward:

(!) :: Chanout a -> a -> CHP ()
(!) = writeChannel

The pieces of our jigsaw puzzle are almost there. Haskell’s >> sequence operator is already a close enough stand-in for CSP’s -> prefix operator, and the syntactic difficulties of using [] as the choice operator mean that CHP’s <-> will do instead. We will, however, use the CSPc /\ operator to mean the conjunction of the two events. We will return the value from the first of the two, because we only ever want to save the value we receive from the platelet before us (which always appears on the left-hand side of the conjunction in the CSP):

(/\) :: CHP a -> CHP b -> CHP a
(/\) p q = fst <$> (p <&> q)

Now we can write our site process:

site :: Chanout (Maybe (Platelet, Bool)) -> In -> Out -> CHP ()
site obsOut prev next = emptySite
  where
    emptySite :: CHP ()
    emptySite = obsOut!Nothing >>
                ((prev.empty /\ next.empty) >> emptySite)
                <-> ((prev.mustStay /\ next.empty) >> emptySite)
                <-> ((prev.move /\ next.empty) >>= full)

    full :: Platelet -> CHP ()
    full platelet
      = do r <- liftIO $ randomRIO (0, 1::Double)
           if r > 0.05
             then obsOut!(Just (platelet, True)) >>
                  ((prev.empty /\ next.move!platelet) >> emptySite)
                  <-> ((prev.move /\ next.move!platelet) >>= full)
                  <-> ((prev.empty /\ next.canStay) >> full platelet)
                  <-> ((prev.canStay /\ next.canStay) >> full platelet)
                  <-> ((prev.mustStay /\ next.mustStay) >> full platelet)
             else obsOut!(Just (platelet, False)) >>
                  ((prev.empty /\ next.mustStay) >> full platelet)
                  <-> ((prev.mustStay /\ next.mustStay) >> full platelet)
                  <-> ((prev.canStay /\ next.mustStay) >> full platelet)

The generator and end are also short:

plateletGenerator :: Out -> CHP ()
plateletGenerator out = forever $ on >> off >> off
  where
    on = do platelet <- Platelet <$> (liftIO $ randomRIO (0.5, 1))
            (out.move!platelet) <-> (out.canStay)
    off = out.empty

plateletEnd :: In -> CHP ()
plateletEnd prev = forever $ (prev.empty) <-> (prev.mustStay)
                             <-> (prev.move >> return ())

So we have implemented our CSP model, and made it look quite like the CSP itself. The behaviour is just as seen on previous videos, but now our sticky platelets are implemented tickless. There has been work before on compiling CSP by Fred Barnes, that supports a lot more features of CSP than the quick hacks shown here; this was just a fun exercise to get my code looking more like the CSP that it came from.

Categories: Uncategorized Tags:
Follow

Get every new post delivered to your Inbox.