Notes on DoublePlay: Parallelizing Sequential Logging and Replay

The paper [1] won a Best Paper Award at ASPLOS 2011 (ASPLOS = Architectural Support for Programming Languages and Operating Systems). Its ancestry is a long series of papers on deterministic replay, plus a different series of papers on operating system support for speculation:

revirt ReVirt (2002) ossupp OS Support (2003) revirt->ossupp backtracker BackTracker (2003*) ossupp->backtracker ttvm Time-Traveling Virtual Machines (2005*) backtracker->ttvm extravirt ExtraVirt (2005) ttvm->extravirt subvirt SubVirt (2005) extravirt->subvirt smprevirt SMP-ReVirt (2008) subvirt->smprevirt speck Speck (2008) subvirt->speck respec Respec (2010) smprevirt->respec bluefs BlueFS (2004) speculator Speculator (2005*) bluefs->speculator xsyncfs xsyncfs (2006*) speculator->xsyncfs speculator->speck doubleplay DoublePlay (2011*) speck->doubleplay speck->respec respec->doubleplay

(Blue is deterministic replay, yellow is OS speculation, and green is both. Asterisks represent best paper awards. Click on the boxes for links to the ACM Digital Library.)

We read the paper as an introduction to two interesting and increasingly used operating systems techniques, deterministic replay and OS speculation; because it is new; and for the sweet technical idea that is the paper’s main contribution, namely uniparallelism.

For your information, here’s how DoublePlay fits in with the prior work in the diagram above:

Deterministic replay

Deterministic replay is the ability to exactly reproduce an execution of a system. One can replay the execution of a virtual machine, a process, a group of processes, anything.

Deterministic replay of software systems is in some ways easy, since most machine instructions have deterministic effects. (addl %eax, $1 depends only on the value of %eax.) The main requirement is to record and replay all nondeterministic events, so that these events happen at exactly the same times and in exactly the same ways during replay as they did originally. In single-threaded code, the only nondeterministic events are system call return values and signal delivery, and these happen rarely enough that logging them is pretty cheap. (The DoublePlay paper considers single-threaded replay a solved problem.) But multi-threaded code, or code with multiple threads concurrently accessing shared memory, presents a huge challenge. Any memory shared between multiple concurrent threads forms a very high bandwidth nondeterministic communication channel. (It’s nondeterministic because different threads can run at slightly different speeds, and unpredictable factors like bus design can affect which of several simultaneous modifications to a memory address will “win.”) Logging this channel is super expensive.

Why would anyone care about deterministic replay? Fundamentally for debugging, but there are other reasons. Deterministic replay is a primitive useful in several contexts. For instance, imagine replaying a system under an augmented virtual machine—with more security checks, say. The deterministic replay guarantee tells you that any security bugs found actually happened on the original execution. The “time-traveling virtual machines” work [2] uses deterministic replay to debug operating systems with features like “single-step backwards”.

OS speculation

Operating system speculation is the ability for processes to enter speculative mode, a state analogous to the middle of a database transaction in which the process’s external effects are temporarily buffered. Each OS speculation is eventually committed or aborted. On commit, the relevant processes leave speculative mode and their effects become permanent—for example, any file changes may be sent to the disk. On abort, however, all of the affected processes’ actions are undone. Any forked processes are obliterated; any disk writes are thrown away; any pending network packets are junked; and any signals delivered to other processes are “undelivered”, by rolling those receiving processes back to a checkpointed state immediately before the signal.

Speculation is a long-known performance improvement technique and has been implemented many times in different contexts. The version relevant here was originally developed to improve distributed file system performance [3]. It’s really well done: first off, it’s in a conventional OS kernel, which is hard, and secondly, it handles many things (like signals) previous systems did not.

Like deterministic replay, speculation is a primitive useful in several contexts. It may be even more widely applicable than replay: it is used in replay systems, to speed up distributed file systems, to make synchronous I/O appear faster, and elsewhere.

As you read these papers be aware of the new primitives that you could use in your own work!

Approach

DoublePlay’s goal is efficient deterministic replay of multithreaded systems. The big problem they need to solve is how to log the high-bandwidth shared memory channel. Their solution is a clever variant on earlier ideas. Specifically, they transform the multithreaded code to a form with lower-bandwidth nondeterminism, record that form, and then check that the result has the same observable effects as the original code. This is called uniparallelism.

Why do they need speculative execution? The goal, remember, is efficiency. They don’t want to slow down the original execution. But uniparallelism slows down execution a lot, since it reduces nondeterminism bandwidth by running threads sequentially (interleaved). A uniparallel execution of n concurrent threads might run n times slower.

Speed can be recovered if we run and record many uniparallel executions in parallel with the original code. The uniparallel executions turn into checks. At the end of an epoch, the uniparallel (“epoch parallel”) execution thread’s results are compared with a checkpointed version of the original execution’s results. (The original execution is now far ahead.) If the results are the same, all is well: the uniparallel execution, when replayed deterministically, will produce the same effects as the ongoing truly-parallel execution. If they’re not the same, though, there’s a problem. We can fix the problem by rolling back to before the uniparallel checkpoint and trying again. This, though, might never make progress. The authors instead implement forward recovery, which adopts the uniparallel execution’s results as the truth. (Why is this OK?) The truly-parallel execution must be killed (rolled back) and then restarted with a copy of the uniparallel execution’s state.

Uniparallelism is a clever idea (related most to the ideas behind Speck). Even better for us, it combines interesting primitives (speculation, programs as state machines, deterministic execution) to get a cool system.

DoublePlay vs. Respec

Respec could support offline replay, despite DoublePlay’s claim. (“When requested, Respec can optionally save information to enable an offline replay of the recorded process.” [p83, 4]) In offline mode, Respec would log a checksum of each thread’s state (memory, registers, etc.) after each epoch; during replay, Respec would verify these checksums, with a rollback and retry in case of divergence.

The theoretical difference between Respec and DoublePlay is that DoublePlay can always precisely replay any recorded execution in bounded time. (And DoublePlay can record any execution in bounded time.) Respec may find replay difficult: if the replayed execution diverges from the recorded execution, Respec must roll back and retry that epoch, possibly unboundededly many times. Note that this is the same problem DoublePlay faced before the forward recovery optimization.

Possibly-unbounded replay time seems infinitely bad. However, we could view this difference as quantitative, rather than qualitative; and rather than eliminating Respec because it might diverge, we could evaluate how often it diverges, or how many rollbacks are required in practice. Respec points out that “since the recorded process has been replayed successfully at least once [i.e., during the record phase], it is likely that offline replay will eventually succeed, although it may require a number of rollbacks and retries.” [4]

The quantitative performance difference between Respec and DoublePlay is not evaluated. Respec does not evaluate offline replay directly. Benchmarks in Respec can’t be compared with those in DoublePlay—they overlap, but appear for example to have different problem sizes.

We can guess how a comparison might go. Respec must calculate MD5 checksums during the record phase, which DoublePlay need not; this will add some cost. Frequent rollbacks caused by replay divergence could make Respec much slower than DoublePlay, but DoublePlay’s evaluation shows only a modest benefit from the forward recovery optimization (Table 2), which addresses exactly the replay-divergence problem. Respec by default features parallel replay, which is faster than uniparallel replay by a factor of n for n threads (though uniparallel replay could be sped up). Thus, it is possible that for online replay (or even, possibly, offline!) Respec is as fast or faster.

Nevertheless, the DoublePlay authors believe that DoublePlay will be slightly faster than or equivalent to Respec for most applications, as long as rollbacks are rare (personal communication). They also emphasize the risk that a Respec-recorded execution might, due to bad luck, be impossible to replay.

Whether or not it makes a difference for common-case multithreaded replay, the uniparallelism idea is cool enough and evocative enough to discuss and understand.

Questions


  1. “DoublePlay: Parallelizing sequential logging and replay”, Kaushik Veeraraghavan, Dongyoon Lee, Benjamin Wester, Peter M. Chen, Jason Flinn, and Satish Narayanasamy, in Proc. ASPLOS XVI, Mar. 2011 (ACM Digital Library)

  2. “Debugging operating systems with time-traveling virtual machines”, Samuel T. King, George W. Dunlap, and Peter M. Chen, in Proc. USENIX 2005 Annual Technical Conference. (ACM Digital Library)

  3. “Speculative execution in a distributed file system”, Edmund B. Nightingale, Peter M. Chen, and Jason Flinn, in Proc. SOSP ’05. (ACM Digital Library)

  4. “Respec: efficient online multiprocessor replay via speculation and external determinism”, Dongyoon Lee, Benjamin Wester, Kaushik Veeraraghavan, Satish Narayanasamy, Peter M. Chen, and Jason Flinn, in Proc. ASPLOS XV, 2010. (ACM Digital Library)