Notes on Singularity: Rethinking the Software Stack

This paper [1]sing is a good summary of the Singularity project, a Microsoft Research operating system architecture that attracted a lot of attention in the mid–2000s. It’s thought-provoking, and though direct impact has been limited, some of the ideas behind Singularity have reportedly made it into products and checking systems.

Singularity has a goal and several mechanisms. But as with many complex software projects, not all the mechanisms are directly addressing the goal.

Goal

The primary Singularity goal is reliability and robustness (in their terms, “dependability and trustworthiness”). In the early 2000s Microsoft Windows was considered a ridiculously, untenably, and inevitably unreliable system; in people’s minds Windows = “blue screen of death” = endless Internet Explorer bugs and other security holes. It turns out that was temporary. Later versions of Windows (Windows Vista, Windows 7) have become more reliable and more secure, rather than just bigger. Bill Gates in 2002 wrote a memo called “Trustworthy Computing” that made security Microsoft’s highest priority; despite some skepticism at the time, a cultural shift, plus better tools (think software checkers, some integrated with Microsoft’s internal compilers), appears to have changed things. But think of Singularity as a thought experiment. What if a conventional macrokernel design, like Windows, could never be made truly reliable and robust? We’d need another OS design, built from scratch. What would it look like?

Programming language mechanisms

Singularity’s basic approach is to get provable reliability and robustness, in the form of soundness—provided by our friends in the programming languages community. The vast majority of Singularity’s kernel is written in Sing#, a memory-safe programming language derived from C#. (Memory safety means that every pointer dereference goes to a memory object with the correct type. C is not memory safe, since we can fabricate a pointer from an integer by casting, and since we can cast any pointer type to any other. Generally null safety is considered separate from memory safety: a memory-safe language might allow the programmer to dereference a null pointer, although generally the language will turn such a dereference into an explicit exception.) But Singularity’s language integration goes beyond memory safety. The Sing# programming language was extended in several directions to make certain programming errors simply impossible (it “eliminates many preventable defects”). The biggest example is inter-process communication, or contract-based channels. IPC is defined by state-machine-like contracts whose specifications are verified by the compiler. This ensures that every process has explicit code to handle every possible message (“the use of sound program verification tools further guarantees that entire classes of programmer errors are removed from the system early in the development cycle”).

(Integrating the system and the language was powerful, but meant that the Singularity team had to maintain an advanced compiler—making it much harder for others within and outside of Microsoft to use and build on the Singularity system. This is a common and often-unremarked problem with integrated approaches.)

Singularity also addresses robustness by seriously limiting what processes can do. Singularity processes are sealed. They cannot load libraries dynamically, modify their own code, or share memory with other processes. Some serious limitations: just-in-time compilation is impossible in sealed processes, for example. The implicit argument is that dynamically linked libraries, self-modifying code, and shared memory are inherently dangerous and should be eliminated. But another argument is that code without these features is much easier to statically analyze. (“Some benefits of this sealed process architecture are: improved program analysis by tools” [p1, 2]sealing) It’s not clear which of these arguments led to the development of sealed processes.

Finally, Singularity’s manifest-based programs bring type checking to process creation. A manifest defines a bunch of checkable program properties that the Singularity kernel can verify before starting a process. For example, the manifest says what ABI versions a program needs, what IPC interfaces are required, what other processes must be started, and so forth. The kernel can check “type and memory safety, absence of privileged-mode instructions, conformance to channel contracts,” and other, more specific properties, such as “that [a device] driver will not access hardware used by a previously installed device driver.” [p4, 1]sing

Evaluation

At this point the Singularity project gets a little weird. Most systems papers need evaluation sections, but “dependability is difficult to measure in a research prototype” [p1, 1]sing. What is easy to measure? Performance. And that is what they measure.

But why measure performance? Singularity aims for “no-worse-than” measurements—to show that Singularity’s limitations do not unduly hurt performance. For instance, the abstract contains this classic “no-worse-than” line: “[T]he first macrobenchmarks for a sealed-process operating system and applications … show that [such a] system can achieve performance competitive with highly-tuned, commercial, open-process systems.” [p1, 2]sealing

(Actually, “competitive with” or “comparable with” are near meaningless phrases: any two systems compete, and any two numbers can be compared! But these phrases are commonly used to describe “no-worse-than” measurements, and I’m ashamed to say I’ve used them myself. A better alternative is to be specific: “better than” or “within 5% of the latency of.”)

Performance measurements are inevitable (no significantly-worse-performing OS architecture deserves consideration) and unfortunate (measurements should concentrate on the system’s most important goal, which is dependability). But for me the Singularity approach to performance is problematic: the authors seem to care about performance too much. SIPs are one example.

Evaluating Software Isolated Processes (SIPs)

Singularity processes are generally isolated only by software. They are called Software Isolated Processes, in fact. Most processes run in the same address space as the kernel. Software verification and language safety ensures that SIP code can’t abuse the kernel privilege under which it runs.

Why do this? Why not use hardware isolation as well as software isolation, for defense in depth against verification bugs? The answer seems to be performance.

Hardware isolation ain’t free. Kernel crossings, which require special instructions (like interrupts and/or sysenter), are much more expensive than simple function calls. Hardware virtual memory, which is irrelevant if you trust your memory-safe language, introduces a TLB and associated costs. So Singularity systems recover some performance lost to sealing and garbage collection by collocating processes with kernel code in a single privileged address space, and then optimizing accordingly. (“Singularity takes advantage of this safe in-lining to optimize channel communication and the performance of language runtimes and garbage collectors in SIPs.” [p5, 1]sing)

Awesomely, Singularity is flexible enough to evaluate many different levels of hardware isolation [3]deconstructing. Figure 5 shows the result: adding hardware isolation and additional kernel crossings can make a Singularity system 37.7% slower at a macrobenchmark. But so what? Singularity is supposed to be robust, not fast. And this benchmark is limited. Singularity’s IPC mechanisms, which don’t change over the benchmark, are designed for the same-address-space mode. Even the “no runtime checks” code runs a garbage collector. For these and other reasons I doubt that running a conventional C server in the same address space as a conventional kernel would lead to 37% performance gains. Figure 5 is interesting, but should be narrowly construed.

To break it down:

“No runtime checks”: No bounds checks, null pointer checks, etc.
“Physical memory”: The default Singularity mode (for which the OS was named); segmentation but effectively no paging
“Add 4KB Pages”: Turn on paging.
- What are the alternatives to 4 KB pages on IA–32? (There are two, PTE_PS and PTE_G.) Why would these alternatives perform better than 4 KB pages?
“Add Separate Domain”: Introduce a separate, duplicate page table for the application process, so that context switches require lcr3.
“Add Ring 3”: Give the application process user privilege. Now context switches require protected control transfer (syscall/interrupts), which is expensive.
“Full Microkernel”: All of the drivers are also in their own separate privilege domains.
- This is the most surprising result to me—I’m shocked that the additional control transfers required for “full microkernel” shift overhead only from 33% to 37.7%.

SIP memory management: Page-disjoint heaps

Sing# is a garbage collected language, and Singularity is a garbage collected operating system. (Garbage collection is the most robust and well-known mechanism to provide memory safety.) Additionally, in the default mode, all processes cohabit the same address space. So you might expect all processes to share a single garbage collector. They don’t, and this is one of the more unusual and interesting design decisions in the Singularity system.

Each Singularity process has its own page-disjoint heap. That is, no process can ever access objects in another process’s heap, and the heaps are disjoint at the level of pages, not objects. All of process A’s objects live on process A’s pages, which are disjoint from any other process’s pages. (Page-disjointness is enforced by the Singularity kernel and verifier, not necessarily the MMU.)

Why page-disjoint heaps? A nice set of reasons, which together are pretty convincing:

Disjoint heaps mean no shared memory.
- Shared memory would complicate verification.
Page-disjoint heaps simplify accounting.
- A process’s memory size equals its page count.
Page-disjoint heaps simplify recovery after process exit.
- No need to garbage collect its objects one at a time, just reclaim all its pages.
Page-disjoint heaps simplify experiments with address space designs (Figure 5).
Disjoint heaps allow each process to run its own garbage collector.
- “The large number of existing garbage collection algorithms and experience strongly suggest that no one garbage collector is appropriate for all system or application code.” [p6, 1]sing
- If no two processes can share objects, then each process can collect its own objects without agreeing on object layout with other processes.
- In a system with a single unified GC, it would occasionally be necessary to completely stop the system for GC. (Even concurrent GCs usually have stop-the-world phaselets.) Independent GCs avoid this: GCs can run independently.
  - Wrinkle: In the single-address-space variant, Singularity system calls are implemented as procedure calls, and the kernel stack shares space with application stacks. So how can the kernel and application GC tell one another’s objects apart?
  - Solution: Special, explicit structures mark the stack boundaries between kernel and application threads. The GCs understand these structures.
- Note that SIPs can choose their own GCs, but not necessarily write their own GCs. In the single-address-space variant, SIP GCs must be trusted by the kernel, so untrusted code can’t run in the GC. Most likely Singularity provides several pre-approved GC implementations; each SIP chooses one of these GCs and supplies it with optional parameters.

SIP memory management: Exchange heap

But given disjoint heaps, how can one process send a message to another?

Since normal heap data can’t be shared, a separate, explicitly-managed memory area called the exchange heap is used for message passing. Exchange heap objects must have an exchangeable type.

“Exchangeable types encompass …. all values that can be sent from one process to another. They consist of scalars [e.g. numbers], rep structs (structs of exchangeable types), and pointers to exchangeable types. Pointers can either point to a single exchangeable value or to a vector of values.” [p179, 4]reliable Channel endpoints are also exchangeable [p3, 1]sing.

Exchangeable objects are thus relatively simple—think flat objects, or objects with pointers to simpler objects, such as a “packet” type that points to an array of bytes. (It is not clear from the papers whether recursive data structures are exchangeable, or more specifically, whether processes can construct circular structures in the exchange heap, since the exchange heap is reference counted [p6, 1]sing.)

The kernel is ultimately responsible for managing the exchange heap’s memory; for example, it garbage collects the exchange heap to eliminate objects held by exited SIPs. But recall that for robustness, Singularity also prevents processes from simultaneously accessing objects in shared memory. Regular heaps are pagewise disjoint, but the exchange heap is explicitly designed for inter-process communication. How can Singularity prevent shared memory access in the exchange heap?

The answer: a fancy type system. Sing# was extended to support a linear type discipline for exchange heap objects. Linear types ensure that each process can have at most one pointer to an exchange heap object at a time. When a process sends a message, the type of the send “system call” forces the sending process to lose that sole pointer to the message. As a result, and because of memory safety, the process also loses the ability to modify the message, and each exchange heap object is accessible to at most one process at a time. The linear type discipline also facilitates explicit allocation and deallocation operations for exchange heap objects, new and delete, which quickly recycle unneeded exchange heap memory.

Linear types are cool and useful to enforce the no-shared-memory invariant. But why implement the exchange heap, rather than a simpler mechanism that avoids shared memory, such as message copying or kernel buffering? A not-perfectly-convincing reason, as above: performance. The exchange heap allows one Singularity process to send a message to another without copying; in the simplest case a single pointer to an exchange heap object will be transmitted. This can look great on microbenchmarks [4]reliable. But…

If the sending process wants to preserve a copy of the message it must make a copy explicitly.
The sending process must always construct the message on the exchange heap or copy it to the exchange heap from elsewhere. There is no way to reuse a message buffer, since the message buffer is explicitly lost on send.
As discussed by the exokernel and L3, microbenchmark performance does not always correlate with application performance.

Verification

SIP safety depends on some trusted code and some untrusted code. The trusted code includes the verifier itself, parts of the kernel, and any unsafe code that runs on behalf of the SIP, including the SIP’s garbage collector and memory allocator. The SIP’s process code is untrusted, and therefore Singularity must actively verify that it obeys Singularity’s invariants. Safety requires these checks:

Heap memory safety.
- All local pointers point into the SIP’s heap.
- Pointers cannot be fabricated from integers: new pointers only arrive from the SIP’s trusted memory allocator.
- Pointers are strongly typed (i.e., can’t access memory using the wrong type, which could break memory safety later).
- Enforced using known techniques (e.g., the Java bytecode verifier).
Exchange heap memory safety.
- Exchange heap pointers obey the linear type discipline (to prevent modification of shared memory).
- No double access, etc.
- Exchange heap pointers are strongly typed.
Channel contract agreement.
- Much of channel contracts are not strictly required for safety. But channel contracts affect memory safety because Singularity infers a message’s type from its name and state, as specified in a channel contract. So if Singularity didn’t verify that both ends of a channel agree on the channel contract, and that sent and received messages’ types agreed with the contract, then message passing might break exchange heap type safety, and therefore memory safety.
Kernel ABI agreement.
- Again important for memory safety.
Instruction safety.
- SIPs must not use privileged machine instructions inappropriately.

Singularity also verifies other properties that aren’t as safety sensitive.

Channel contracts are checked for unhandled messages.
- Not very safety sensitive since unhandled messages could easily be implemented as exceptions.
- (But depending on how exchange heap objects are represented, it’s not clear that the receiver could safely free an unexpected message type.)
Channel contracts are checked to verify that all cycles in contract states contain at least one receive and one send action.
- Maybe the oddest, most surprising check.
- Reason: If a state cycle existed involving only process A sending messages, then A could send infinitely many messages to B without waiting for a response. This might overflow any bounded queue, and Singularity wants to avoid overflow while bounding the queue size.
  - Why? Feels a bit random: Singularity wants message receive to involve no allocation, so that a receiver can always reliably receive a message; this requires bounded receive queues. Why not involve allocation?—perhaps a performance concern?
  - “Although the rule [about state cycles] seems restrictive, we have not yet seen a need to relax this rule in practice.” I.e., it’s not a safety concern.
- What about a cycle like “A receives → A sends M1 → A sends M2 → B receives M1 → B sends M3 → cycle”? Won’t B’s queue grow infinitely long?
  - No, because contracts don’t work this way. In contracts receive events are implicit—an endpoint must receive all previous messages in a contract line before sending a message itself. The contract above would really be written as something like “state X { M1? → M2? → M3! → X; }”, which shows that B (the exporting end) must receive both M1 and M2 before sending M3.

Verification happens like this. A Sing# compiler, Bartok, compiles source code to an intermediate bytecode language, MSIL. At SIP install time (as a SIP is started), the verifier checks the bytecodes; simultaneously, a bytecode compiler generates machine code from the bytecodes (possibly interleaving that machine code with trusted machine code, such as the GC). At runtime, machine code is active.

Why not verify at compile time?
- If the verification happened only at compile time, then the compiler would have to be trusted.
Why not compile to machine code?
- Machine code, being lower level, is harder to verify.
- MSIL bytecode is machine-agnostic, which arguably simplifies deployment in heterogenous environments (“Singularity packages manifest-based programs in the abstract MSIL format, which can be converted to any I/O processor’s instruction set.” [p10, 1]sing)—but this is a stretch.
- Nevertheless, the paper claims that in future, the compiler could generate type-safe assembly language (TAL), which is closer to machine code.
Why not JIT at run time?
- The Singularity authors argue that JIT is inherently risky and unrobust.
Why not verify at run time?
- Some verification does happen at run time (for example, consider the trusted Cell<T> wrapper for exchange heap pointers [4]reliable, and the bounds checks described as the “Safe Code Tax”), but in general, run-time verification is expensive. If verification can be done statically, the runtime cost is zero.
- But we don’t know how fast the bytecode verifier is, and therefore how slow process startup is. There might be a tradeoff.

The paper claims future work will push Singularity verification further, with the nice goal of requiring less trust. In addition to TAL, already a type system was developed that can be used to write type-safe garbage collectors.

Safety without verification?

One dimension that I wish Singularity had investigated more: enforcing safety in different ways for different isolation mechanisms. For example, why not let a SIP that’s in a separate, isolated address space dynamically load code, use its own, unsafe GC, or even use an unsafe language? The kernel could check and enforce type safety on message transmit, or simply copy messages between the process and the type-safe exchange heap. A SIP’s manifest could declare the level of isolation it required.

Contract-based channels

Channels are like type-safe pipes. “A channel is a bi-directional [lossless, in-order] message conduit with exactly two endpoints.” [p3, 1]sing Each endpoint is sort of like a pipe file descriptor, except that pipes handle byte streams (channels handle complex, type-safe message protocols) and pipe file descriptors can be shared by multiple processes (each channel endpoint is owned by exactly one thread at a time).

We’ve discussed channel contracts in the context of type safety, but the two Listings in Section 2.2 are worth considering. Note how new channels may be passed over old ones (see NicEvents.Exp:READY in the text and in Listing 1’s in message RegisterForEvents).

What’s the difference between the importing end of a channel and the exporting end?
In Listing 1, which owns the exporting end, the client or the driver? What about Listing 2? Why?

Questions and notes

Relate Singularity message contracts with the TCP Robustness Principle (“TCP implementations should follow a general principle of robustness: be conservative in what you do, be liberal in what you accept from others”). Do message contracts indicate a different view of software engineering than the principle? Does the different implementation context require a different approach?
How bad is it that sealed Singularity processes cannot implement a JIT?
Singularity is a fascinating combination of programming language and systems contributions. When did the OS drive the language tools, and when did language tools drive the OS design?
In Figure 5 [1]sing, is the “Safe Code Tax” of 4.7% a conservative estimate of the cost of running Sing#?
- I don’t think so. Sing# introduces a lot of safety checks, safe languages can be harder to compile to minimal machine code, and safe languages inspire different coding patterns than C and C++ can. The “safe code tax” measured here is that of array bounds checks and such, which is, I’d guess, a significant underestimate of the cost of writing a systems kernel in Sing# rather than C/C++.
“To avoid a costly reset of the scheduling hardware timer, threads from the unblocked list inherit the scheduling quantum of the thread that unblocked them.” [p6, 1]sing
- Just as in exokernel and L3!
“Singularity systems use manifest features to move command-line processing out of almost all applications and to centralize it in the shell program.” [p4, 1]sing
- This means the manifest actually has a list of potential program arguments, which the shell can check! Very nice—imagine how this can help tab completion, for example.
192 system calls! (There’s no right number, but Singularity clearly isn’t a minimal microkernel.)
The garbage collector is 48% of the unsafe code! (But there’s not much unsafe code.)
Compile-time reflection, as used in manifest-based configuration, is something in between reflection and a limited form of dynamic linking. The syntax is odd, but does lead to some inlining opportunities. Overall not a very convincing language extension.
“Singularity abandoned application and driver compatibility to explore new design options. This choice has been a double-edged sword …. we have been forced to rewrite or port every line of code in the Singularity system. We would not suggest this approach for every project, but we believe it was the correct choice for Singularity. The payoff from the research freedom has been worth the cost.” [p11, 1]sing This is inspirational! Also, MSR has a lot of resources.

sing
“Singularity: Rethinking the Software Stack”, Galen C. Hunt and James R. Larus, ACM SIGOPS Operating Systems Review 41(2), Apr. 2007, pp.37–49. (Via Microsoft Research)
sealing
“Sealing OS Processes to Improve Dependability and Safety”, Galen Hunt, Chris Hawblitzel, Orion Hodson, James Larus, Bjarne Steensgaard, and Ted Wobber, in Proc. EuroSys ’07, Mar. 2007. (Via Microsoft Research)
deconstructing
“Deconstructing Process Isolation”, Mark Aiken, Manuel Fähndrich, Chris Hawblitzel, Galen Hunt, and James R. Larus, in Proc. ACM SIGPLAN Workshop on Memory Systems Performance and Correctness ’06, Oct. 2006. (Via Microsoft Research)
reliable
“Language Support for Fast and Reliable Message-based Communication in Singularity OS”, Manuel Fähndrich, Mark Aiken, Chris Hawblitzel, Orion Hodson, Galen Hunt, James R. Larus, and Steven Levi, in Proc. EuroSys 2006, Apr. 2006. (Via Microsoft Research)