Benchmarking STM with Criterion
The CHP library is built on top of Software Transactional Memory (STM). Ever since I started this blog with the boids example, I have been looking to optimise CHP, part of which thus involves using STM as efficiently as possible. Looking at the code, I realised that my (long-running) transactions often write to a transactional variable early in a transaction, then later in the transaction they read back from the variable and write to it again. According to the STM paper (section 6.1), such behaviour should result in a local caching of the value, and ultimately there will only be one read from the variable to begin with, and one read-and-write to the variable during the commit. But presumably the lookup in this local cache will come at a cost (which I could perhaps avoid).
I constructed a toy benchmark to test this. One version of my program (named modFunc) reads from a transactional variable, applies two functions, and writes the value back. The other version (named modRW) reads, applies one function and writes back — then reads, applies the second function and writes this value back. This should hopefully illustrate the difference between reading the variable once, and writing+re-reading the variable.
To time this difference, I will use Bryan O’Sullivan’s new criterion benchmarking library, which performs some statistics and produces some nice blogable graphs. Here is the full code to benchmark modRW against modFunc:
import Control.Arrow (second) import Control.Concurrent (forkIO) import Control.Concurrent.STM import Control.Monad (replicateM_, when) import Control.Parallel.Strategies (NFData, rnf, (>|), using) import Criterion.Main (bench, defaultMain) data Foo = Foo Int Integer instance NFData Foo where rnf (Foo a b) = rnf a >| rnf b f, g :: Foo -> Foo f (Foo a b) = Foo (succ a) b `using` rnf g (Foo a b) = Foo a (pred b) `using` rnf modFunc, modRW :: TVar Foo -> STM () modFunc tv = readTVar tv >>= writeTVar tv . f . g modRW tv = do readTVar tv >>= writeTVar tv . f readTVar tv >>= writeTVar tv . g t :: Int -> (TVar Foo -> STM ()) -> IO () t n mf = do tv <- atomically $ newTVar $ Foo 0 0 tve <- atomically $ newTVar n let end = atomically $ readTVar tve >>= writeTVar tve . pred task = replicateM_ 1000000 (atomically $ mf tv) replicateM_ n $ forkIO $ task >> end atomically $ readTVar tve >>= flip when retry . (/= 0) main :: IO () main = defaultMain $ map (uncurry bench . second (uncurry t)) [ ("modFunc-1", (1, modFunc)) , ("modRW-1", (1, modRW)) ]
I ran the benchmarks on a 64-bit Debian machine with GHC 6.10.3 (criterion doesn’t yet work on 6.12 because its dependency uvector doesn’t). The results are below:
Note that the times are for a million iterations — so read that axis as nanoseconds, not milliseconds, for an individual call of modFunc/modRW. The version that reads and writes the variable twice is about 1.5x as slow as the version that reads and writes once. Further tests indicate that this factor increases towards 2x if you have multiple variables involved in the transaction. The transaction is still quite fast in absolute terms though — mainly this was a chance to try out the criterion benchmarking tool. I hope to use it more in future to benchmark parts of CHP’s core.