Home > Benchmarking, Tools > Benchmarking STM with Criterion

Benchmarking STM with Criterion

The CHP library is built on top of Software Transactional Memory (STM). Ever since I started this blog with the boids example, I have been looking to optimise CHP, part of which thus involves using STM as efficiently as possible. Looking at the code, I realised that my (long-running) transactions often write to a transactional variable early in a transaction, then later in the transaction they read back from the variable and write to it again. According to the STM paper (section 6.1), such behaviour should result in a local caching of the value, and ultimately there will only be one read from the variable to begin with, and one read-and-write to the variable during the commit. But presumably the lookup in this local cache will come at a cost (which I could perhaps avoid).

I constructed a toy benchmark to test this. One version of my program (named modFunc) reads from a transactional variable, applies two functions, and writes the value back. The other version (named modRW) reads, applies one function and writes back — then reads, applies the second function and writes this value back. This should hopefully illustrate the difference between reading the variable once, and writing+re-reading the variable.

To time this difference, I will use Bryan O’Sullivan’s new criterion benchmarking library, which performs some statistics and produces some nice blogable graphs. Here is the full code to benchmark modRW against modFunc:

import Control.Arrow (second)
import Control.Concurrent (forkIO)
import Control.Concurrent.STM
import Control.Monad (replicateM_, when)
import Control.Parallel.Strategies (NFData, rnf, (>|), using)
import Criterion.Main (bench, defaultMain)

data Foo = Foo Int Integer

instance NFData Foo where rnf (Foo a b) = rnf a >| rnf b

f, g :: Foo -> Foo
f (Foo a b) = Foo (succ a) b `using` rnf
g (Foo a b) = Foo a (pred b) `using` rnf

modFunc, modRW :: TVar Foo -> STM ()
modFunc tv = readTVar tv >>= writeTVar tv . f . g
modRW tv = do readTVar tv >>= writeTVar tv . f
              readTVar tv >>= writeTVar tv . g

t :: Int -> (TVar Foo -> STM ()) -> IO ()
t n mf = do tv <- atomically $ newTVar $ Foo 0 0
            tve <- atomically $ newTVar n
            let end = atomically $ readTVar tve >>= writeTVar tve . pred
                task = replicateM_ 1000000 (atomically $ mf tv)
            replicateM_ n $ forkIO $ task >> end
            atomically $ readTVar tve >>= flip when retry . (/= 0)

main :: IO ()
main = defaultMain $ map (uncurry bench . second (uncurry t))
  [ ("modFunc-1", (1, modFunc))
  , ("modRW-1", (1, modRW))

I ran the benchmarks on a 64-bit Debian machine with GHC 6.10.3 (criterion doesn’t yet work on 6.12 because its dependency uvector doesn’t). The results are below:



Note that the times are for a million iterations — so read that axis as nanoseconds, not milliseconds, for an individual call of modFunc/modRW. The version that reads and writes the variable twice is about 1.5x as slow as the version that reads and writes once. Further tests indicate that this factor increases towards 2x if you have multiple variables involved in the transaction. The transaction is still quite fast in absolute terms though — mainly this was a chance to try out the criterion benchmarking tool. I hope to use it more in future to benchmark parts of CHP’s core.

Categories: Benchmarking, Tools
  1. BadAxes
    October 21, 2009 at 6:30 pm

    Scale your Y-axes consistently! Why has this practice of inconsistent axes become the norm on the Haskell blogs lately for comparing performance? Don’t blame the plotting package — if it can’t do axes consistently, use a different package that does!

    • October 21, 2009 at 9:10 pm

      That’s funny: I patched criterion to get consistent X axes on the charts for this post. I didn’t see that the Y axis was particularly important on these graphs, but I guess I should make a similar change for the Y axis.

    • Cale Gibbard
      October 21, 2009 at 10:04 pm

      While I agree with you that it’d (usually) be nice, it’s not as important in these cases as it’d be with most graphs, since these are (estimates of) probability density functions, and so the thing you really care about is just the general shape: where the peaks occur, and how spread out it is.

      The fact that you know beforehand that it’s supposed to integrate to 1 gives you a certain amount of scale invariance in the Y direction. If it happened to integrate to anything else, you would just scale it in the Y direction to correct for that.

  2. S
    October 22, 2009 at 4:36 pm

    You’ve switched the order of f and g in your two tests. I think you meant to hold that order the same right?

    • October 22, 2009 at 7:01 pm

      Well spotted. I did mean to hold the order the same — although I don’t think it should make any difference to the results

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: