This paper [1] is a good summary of the Singularity project, a Microsoft Research operating system architecture that attracted a lot of attention in the mid–2000s. It’s thought-provoking, and though direct impact has been limited, some of the ideas behind Singularity have reportedly made it into products and checking systems.
Singularity has a goal and several mechanisms. But as with many complex software projects, not all the mechanisms are directly addressing the goal.
The primary Singularity goal is reliability and robustness (in their terms, “dependability and trustworthiness”). In the early 2000s Microsoft Windows was considered a ridiculously, untenably, and inevitably unreliable system; in people’s minds Windows = “blue screen of death” = endless Internet Explorer bugs and other security holes. It turns out that was temporary. Later versions of Windows (Windows Vista, Windows 7) have become more reliable and more secure, rather than just bigger. Bill Gates in 2002 wrote a memo called “Trustworthy Computing” that made security Microsoft’s highest priority; despite some skepticism at the time, a cultural shift, plus better tools (think software checkers, some integrated with Microsoft’s internal compilers), appears to have changed things. But think of Singularity as a thought experiment. What if a conventional macrokernel design, like Windows, could never be made truly reliable and robust? We’d need another OS design, built from scratch. What would it look like?
Singularity’s basic approach is to get provable reliability and robustness, in the form of soundness—provided by our friends in the programming languages community. The vast majority of Singularity’s kernel is written in Sing#, a memory-safe programming language derived from C#. (Memory safety means that every pointer dereference goes to a memory object with the correct type. C is not memory safe, since we can fabricate a pointer from an integer by casting, and since we can cast any pointer type to any other. Generally null safety is considered separate from memory safety: a memory-safe language might allow the programmer to dereference a null pointer, although generally the language will turn such a dereference into an explicit exception.) But Singularity’s language integration goes beyond memory safety. The Sing# programming language was extended in several directions to make certain programming errors simply impossible (it “eliminates many preventable defects”). The biggest example is inter-process communication, or contract-based channels. IPC is defined by state-machine-like contracts whose specifications are verified by the compiler. This ensures that every process has explicit code to handle every possible message (“the use of sound program verification tools further guarantees that entire classes of programmer errors are removed from the system early in the development cycle”).
(Integrating the system and the language was powerful, but meant that the Singularity team had to maintain an advanced compiler—making it much harder for others within and outside of Microsoft to use and build on the Singularity system. This is a common and often-unremarked problem with integrated approaches.)
Singularity also addresses robustness by seriously limiting what processes can do. Singularity processes are sealed. They cannot load libraries dynamically, modify their own code, or share memory with other processes. Some serious limitations: just-in-time compilation is impossible in sealed processes, for example. The implicit argument is that dynamically linked libraries, self-modifying code, and shared memory are inherently dangerous and should be eliminated. But another argument is that code without these features is much easier to statically analyze. (“Some benefits of this sealed process architecture are: improved program analysis by tools” [p1, 2]) It’s not clear which of these arguments led to the development of sealed processes.
Finally, Singularity’s manifest-based programs bring type checking to process creation. A manifest defines a bunch of checkable program properties that the Singularity kernel can verify before starting a process. For example, the manifest says what ABI versions a program needs, what IPC interfaces are required, what other processes must be started, and so forth. The kernel can check “type and memory safety, absence of privileged-mode instructions, conformance to channel contracts,” and other, more specific properties, such as “that [a device] driver will not access hardware used by a previously installed device driver.” [p4, 1]
At this point the Singularity project gets a little weird. Most systems papers need evaluation sections, but “dependability is difficult to measure in a research prototype” [p1, 1]. What is easy to measure? Performance. And that is what they measure.
But why measure performance? Singularity aims for “no-worse-than” measurements—to show that Singularity’s limitations do not unduly hurt performance. For instance, the abstract contains this classic “no-worse-than” line: “[T]he first macrobenchmarks for a sealed-process operating system and applications … show that [such a] system can achieve performance competitive with highly-tuned, commercial, open-process systems.” [p1, 2]
(Actually, “competitive with” or “comparable with” are near meaningless phrases: any two systems compete, and any two numbers can be compared! But these phrases are commonly used to describe “no-worse-than” measurements, and I’m ashamed to say I’ve used them myself. A better alternative is to be specific: “better than” or “within 5% of the latency of.”)
Performance measurements are inevitable (no significantly-worse-performing OS architecture deserves consideration) and unfortunate (measurements should concentrate on the system’s most important goal, which is dependability). But for me the Singularity approach to performance is problematic: the authors seem to care about performance too much. SIPs are one example.
Singularity processes are generally isolated only by software. They are called Software Isolated Processes, in fact. Most processes run in the same address space as the kernel. Software verification and language safety ensures that SIP code can’t abuse the kernel privilege under which it runs.
Why do this? Why not use hardware isolation as well as software isolation, for defense in depth against verification bugs? The answer seems to be performance.
Hardware isolation ain’t free. Kernel crossings, which require special
instructions (like interrupts and/or sysenter
), are much more expensive
than simple function calls. Hardware virtual memory, which is irrelevant if
you trust your memory-safe language, introduces a TLB and associated
costs. So Singularity systems recover some performance lost to sealing and
garbage collection by collocating processes with kernel code in a single
privileged address space, and then optimizing accordingly. (“Singularity
takes advantage of this safe in-lining to optimize channel communication
and the performance of language runtimes and garbage collectors in SIPs.”
[p5, 1])
Awesomely, Singularity is flexible enough to evaluate many different levels of hardware isolation [3]. Figure 5 shows the result: adding hardware isolation and additional kernel crossings can make a Singularity system 37.7% slower at a macrobenchmark. But so what? Singularity is supposed to be robust, not fast. And this benchmark is limited. Singularity’s IPC mechanisms, which don’t change over the benchmark, are designed for the same-address-space mode. Even the “no runtime checks” code runs a garbage collector. For these and other reasons I doubt that running a conventional C server in the same address space as a conventional kernel would lead to 37% performance gains. Figure 5 is interesting, but should be narrowly construed.
To break it down:
PTE_PS
and PTE_G
.) Why would these alternatives perform better than 4
KB pages?lcr3
.syscall
/interrupts), which
is expensive.Sing# is a garbage collected language, and Singularity is a garbage collected operating system. (Garbage collection is the most robust and well-known mechanism to provide memory safety.) Additionally, in the default mode, all processes cohabit the same address space. So you might expect all processes to share a single garbage collector. They don’t, and this is one of the more unusual and interesting design decisions in the Singularity system.
Each Singularity process has its own page-disjoint heap. That is, no process can ever access objects in another process’s heap, and the heaps are disjoint at the level of pages, not objects. All of process A’s objects live on process A’s pages, which are disjoint from any other process’s pages. (Page-disjointness is enforced by the Singularity kernel and verifier, not necessarily the MMU.)
Why page-disjoint heaps? A nice set of reasons, which together are pretty convincing:
But given disjoint heaps, how can one process send a message to another?
Since normal heap data can’t be shared, a separate, explicitly-managed memory area called the exchange heap is used for message passing. Exchange heap objects must have an exchangeable type.
“Exchangeable types encompass …. all values that can be sent from one process to another. They consist of scalars [e.g. numbers], rep structs (structs of exchangeable types), and pointers to exchangeable types. Pointers can either point to a single exchangeable value or to a vector of values.” [p179, 4] Channel endpoints are also exchangeable [p3, 1].
Exchangeable objects are thus relatively simple—think flat objects, or objects with pointers to simpler objects, such as a “packet” type that points to an array of bytes. (It is not clear from the papers whether recursive data structures are exchangeable, or more specifically, whether processes can construct circular structures in the exchange heap, since the exchange heap is reference counted [p6, 1].)
The kernel is ultimately responsible for managing the exchange heap’s memory; for example, it garbage collects the exchange heap to eliminate objects held by exited SIPs. But recall that for robustness, Singularity also prevents processes from simultaneously accessing objects in shared memory. Regular heaps are pagewise disjoint, but the exchange heap is explicitly designed for inter-process communication. How can Singularity prevent shared memory access in the exchange heap?
The answer: a fancy type system. Sing# was
extended to support a linear type discipline for exchange heap
objects. Linear types ensure that each process can have at most one
pointer to an exchange heap object at a time. When a process sends a
message, the type of the send
“system call” forces the sending process to
lose that sole pointer to the message. As a result, and because of memory
safety, the process also loses the ability to modify the message, and each
exchange heap object is accessible to at most one process at a time. The
linear type discipline also facilitates explicit allocation and
deallocation operations for exchange heap objects, new
and delete
,
which quickly recycle unneeded exchange heap memory.
Linear types are cool and useful to enforce the no-shared-memory invariant. But why implement the exchange heap, rather than a simpler mechanism that avoids shared memory, such as message copying or kernel buffering? A not-perfectly-convincing reason, as above: performance. The exchange heap allows one Singularity process to send a message to another without copying; in the simplest case a single pointer to an exchange heap object will be transmitted. This can look great on microbenchmarks [4]. But…
SIP safety depends on some trusted code and some untrusted code. The trusted code includes the verifier itself, parts of the kernel, and any unsafe code that runs on behalf of the SIP, including the SIP’s garbage collector and memory allocator. The SIP’s process code is untrusted, and therefore Singularity must actively verify that it obeys Singularity’s invariants. Safety requires these checks:
Singularity also verifies other properties that aren’t as safety sensitive.
state X { M1? → M2? → M3! → X; }
”, which shows that B
(the exporting end) must receive both M1 and M2 before sending M3.Verification happens like this. A Sing# compiler, Bartok, compiles source code to an intermediate bytecode language, MSIL. At SIP install time (as a SIP is started), the verifier checks the bytecodes; simultaneously, a bytecode compiler generates machine code from the bytecodes (possibly interleaving that machine code with trusted machine code, such as the GC). At runtime, machine code is active.
Cell<T>
wrapper for exchange heap pointers [4], and
the bounds checks described as the “Safe Code Tax”),
but in general, run-time verification is expensive. If verification
can be done statically, the runtime cost is zero.The paper claims future work will push Singularity verification further, with the nice goal of requiring less trust. In addition to TAL, already a type system was developed that can be used to write type-safe garbage collectors.
One dimension that I wish Singularity had investigated more: enforcing safety in different ways for different isolation mechanisms. For example, why not let a SIP that’s in a separate, isolated address space dynamically load code, use its own, unsafe GC, or even use an unsafe language? The kernel could check and enforce type safety on message transmit, or simply copy messages between the process and the type-safe exchange heap. A SIP’s manifest could declare the level of isolation it required.
Channels are like type-safe pipes. “A channel is a bi-directional [lossless, in-order] message conduit with exactly two endpoints.” [p3, 1] Each endpoint is sort of like a pipe file descriptor, except that pipes handle byte streams (channels handle complex, type-safe message protocols) and pipe file descriptors can be shared by multiple processes (each channel endpoint is owned by exactly one thread at a time).
We’ve discussed channel contracts in the context of type safety, but the
two Listings in Section 2.2 are worth considering. Note how new channels
may be passed over old ones (see NicEvents.Exp:READY
in the text and in
Listing 1’s in message RegisterForEvents
).
Relate Singularity message contracts with the TCP Robustness Principle (“TCP implementations should follow a general principle of robustness: be conservative in what you do, be liberal in what you accept from others”). Do message contracts indicate a different view of software engineering than the principle? Does the different implementation context require a different approach?
How bad is it that sealed Singularity processes cannot implement a JIT?
Singularity is a fascinating combination of programming language and systems contributions. When did the OS drive the language tools, and when did language tools drive the OS design?
192 system calls! (There’s no right number, but Singularity clearly isn’t a minimal microkernel.)
The garbage collector is 48% of the unsafe code! (But there’s not much unsafe code.)
Compile-time reflection, as used in manifest-based configuration, is something in between reflection and a limited form of dynamic linking. The syntax is odd, but does lead to some inlining opportunities. Overall not a very convincing language extension.
“Singularity abandoned application and driver compatibility to explore new design options. This choice has been a double-edged sword …. we have been forced to rewrite or port every line of code in the Singularity system. We would not suggest this approach for every project, but we believe it was the correct choice for Singularity. The payoff from the research freedom has been worth the cost.” [p11, 1] This is inspirational! Also, MSR has a lot of resources.
“Singularity: Rethinking the Software Stack”, Galen C. Hunt and James R. Larus, ACM SIGOPS Operating Systems Review 41(2), Apr. 2007, pp.37–49. (Via Microsoft Research)
“Sealing OS Processes to Improve Dependability and Safety”, Galen Hunt, Chris Hawblitzel, Orion Hodson, James Larus, Bjarne Steensgaard, and Ted Wobber, in Proc. EuroSys ’07, Mar. 2007. (Via Microsoft Research)
“Deconstructing Process Isolation”, Mark Aiken, Manuel Fähndrich, Chris Hawblitzel, Galen Hunt, and James R. Larus, in Proc. ACM SIGPLAN Workshop on Memory Systems Performance and Correctness ’06, Oct. 2006. (Via Microsoft Research)
“Language Support for Fast and Reliable Message-based Communication in Singularity OS”, Manuel Fähndrich, Mark Aiken, Chris Hawblitzel, Orion Hodson, Galen Hunt, James R. Larus, and Steven Levi, in Proc. EuroSys 2006, Apr. 2006. (Via Microsoft Research)