CS 260r, Spring 2015: Topics and Close Readings in Computer Systems

Spring 2015—MW 1–2:30—Maxwell Dworkin 221—Eddie Kohler

Piazza signup • Presenting papers • Upcoming papers

CS 260r is a research seminar in computer systems. That means it’s a relatively informal course in which we learn about current research in the computer systems area, and hopefully create some research of our own. The work of the course consists of reading and presenting research papers; some homeworks; and a final project, which can be done in groups.

Each time the course is offered we have a new topic. This year the topic is building reliable systems code; we’ll try to better understand modern techniques for making systems software more robust. These techniques include, but are not limited to:

Ad hoc static code analyses and tools.
Model checkers.
Code verification.
Machine learning techniques applied to code analysis.
Undefined behavior detectors.
Testing frameworks.
Record-and-replay techniques.

We will read papers by many authors, but I hope we investigate several research groups’ work in depth, possibly including those of Dawson Engler, John Regehr, Andrea and Remzi Arpaci-Dusseau, Yuanyuan Zhou, Shan Lu, Luis Ceze, Jason Flinn and Peter Chen, Michael Ernst, Martin Rinard, George Candea, and Nickolai Zeldovich.

The goal of the seminar is to understand the best current research on systems software robustness. Students should also learn how to read research papers actively—that is, how to work past initial unfamiliarity with a research area toward a good understanding of its main research approaches and topics, using different kinds of reading and thinking.

Schedule

The course schedule will be posted piecemeal as we go and is subject to discussion and change based on class interests.

W 1/28 Ignoring failure First, let’s read something fun:

Enhancing Server Availability and Security Through Failure-Oblivious Computing, Martin Rinard, Cristian Cadar, Daniel Dumitran, Daniel M. Roy, Tudor Leu, and William S. Beebee, Jr., Proc. OSDI 2004. ** Commentary: The Gamble of Failure-Oblivious Computing; KHB: Failure-Oblivious Computing

M 2/2 Integer overflow Before you read these papers, take John Regehr’s Integer Quiz!

Understanding Integer Overflow in C/C++, Will Dietz, Peng Li, John Regehr, and Vikram Adve, Proc. ICSI 2012. (Dan King)
Improving Integer Security for Systems with KINT, Xi Wang, Haogang Chen, Zhihao Jia, Nickolai Zeldovich, and M. Frans Kaashoek, Proc. OSDI 2012. (Hannah Blumberg)
- Related: Integer Quiz Rationale, Catching Integer Errors with Clang, How Should You Write a Fast Integer Overflow Check?, We Need Hardware Traps for Integer Overflow.
Question: Describe at least three ways that KINT and/or IOC might fail to detect an erroneous integer overflow.

W 2/4 Symbolic execution

KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs, Cristian Cadar, Daniel Dunbar, and Dawson Engler, Proc. OSDI 2008. (Lily Tsai)
- Strongly recommended related paper: Symbolic Execution for Software Testing: Three Decades Later, Cristian Cadar and Koushik Sen, Comm. ACM 56(2), Feb. 2013, 82–90. The PDF version is more readable (find it from the linked page).
Related: KLEE Project.
Question: Some programming styles may be harder for KLEE to deal with than others. Give an example programming style that might cause KLEE to get poor coverage, and describe why. Give either something general, like “object oriented programming,” or something specific, like a snippet of code.

M 2/9, delayed to W 2/11 Undefined behavior, symbolic test generation

Towards optimization-safe systems: analyzing the impact of undefined behavior, Xi Wang, Nickolai Zeldovich, M. Frans Kaashoek, and Armando Solar-Lezama, Proc. SOSP ’13 (Oreoluwa Babarinsa)
Practical, low-effort equivalence verification of real code (another PDF link), David A. Ramos and Dawson R. Engler, Proc. CAV ’11 (Pulak Goyal)
Questions:
- STACK: Give examples of (1) an error that KINT would catch, but STACK would not; (2) an error that STACK would catch, but KINT would not; (3) an error that both tools would catch.
- Optional bonus question: I believe there are at least two bugs in STACK’s Fig. 3. Can you find them?
- UC-KLEE: What does UC-KLEE gain by initializing memory lazily? Why not just initialize memory in response to malloc() calls?

W 2/18 Reliability design patterns: Philosophy

Crash-Only Software, George Candea and Armando Fox, Proc. HotOS IX (Nikhil Benesch)
Microreboot – A Technique for Cheap Recovery, George Candea, Shinichi Kawamoto, Yuichi Fujiki, Greg Friedman, Armando Fox, Proc. OSDI ’04

M 2/23

MINIX 3: a highly reliable, self-repairing operating system, Jorrit N. Herder, Herbert Bos, Ben Gras, Philip Homburg, and Andrew S. Tanenbaum, ACM SIGOPS Operating Systems Review 40(3), Jul. 2006 (Todd Lubin)
From L3 to seL4: What have we learnt in 20 years of microkernels?, Kevin Elphinstone and Gernot Heiser, Proc. SOSP ’13 (Korey Tucker)
Question: Compare and contrast the ways Minix and seL4 handle (1) IPC message formats (e.g., are variable-length messages allowed?), and (2) process hierarchies. Do the systems make similar or different choices? How do the authors justify their choices?

W 2/25

Improving the Reliability of Commodity Operating Systems, Michael M. Swift, Brian N. Bershad, and Henry M. Levy, Proc. SOSP 2003 (Alex)
SECONDARILY: The Design and Implementation of Microdrivers, Vinod Ganapathy, Matthew J. Renzelmann, Arini Balakrishnan, Michael M. Swift, and Somesh Jha, Proc. ASPLOS ’08 (Zehra)
Question: The protection mechanisms in Nooks, like those in Minix, are designed to protect “against bugs but not against malicious code.” So do Minix and Nooks provide qualitatively equivalent protection mechanisms, or does one provide more protection than the other? If yes, mention specific potential bugs that would be handled differently by the two systems.

F 2/27

Thorough Static Analysis of Device Drivers, Thomas Ball, Ella Bounimova, Byron Cook, Vladimir Levin, Jakob Lichtenberg, Con McGarvey, Bohus Ondrusek, Sriram K. Rajamani, and Abdullah Ustuner, Proc. Eurosys ’06 (Anitha)
Question: Contrast SDV’s counterexample-guided abstraction refinement approach with KLEE’s symbolic execution engine. Give at least one similarity (of approach, of result, of type of bug that can be found, etc.) and at least two differences.

W 3/4

Triage: Diagnosing Production Run Failures at the User’s Site, Joseph Tucek, Shan Lu, Chengdu Huang, Spiros Xanthos and Yuanyuan Zhou, Proc. SOSP ’07 (Paola)
Pip: Detecting the Unexpected in Distributed Systems, Patrick Reynolds, Charles Killian, Janet L. Wiener, Jeffrey C. Mogul, Mehul A. Shah, and Amin Vahdat, Proc. NSDI ’06
Question: Describe how Triage-style diagnosis could be combined with Pip-style failure detection, including at least one significant challenge to this combination.

M 3/9

Finding Bugs in Web Applications Using Dynamic Test Generation and Explicit-State Model Checking, Shay Artzi, Adam Kiezun, Julian Dolby, Frank Tip, Danny Dig, Amit Paradkar, and Michael D. Ernst, IEEE Trans. Soft. Eng. 36(4), July/Aug. 2010 (Zach)
Building Secure High-Performance Web Services with OKWS, Maxwell Krohn, Proc. USENIX ATC 2004 (Jefferson)
Question: (1) How would you deploy PHP in the OKWS framework? What OKWS security benefits would you lose thereby? (2) Describe a security bug that Apollo could find, that would not be eliminated by the OKWS architecture.

W 3/11

Isolating cause-effect chains from computer programs, Andreas Zeller, Proc. SIGSOFT ’02/FSE-10 (Aaron)
Swarm Testing, Alex Groce, Chaoqiang Zhang, Eric Eide, Yang Chen, and John Regehr, Proc. ISSTA 2012 (Yihe)
- What is feedback-directed random testing?: Feedback-directed Random Test Generation, Carlos Pacheco, Shuvendu K. Lahiri, Michael D. Ernst, and Thomas Ball, Proc. ICSE ’07
- What is a mutant?: Is mutation an appropriate tool for testing experiments?, J.H. Andrews, L.C. Briand, and Y. Labiche, Proc. ICSE ’05
Question: Write out a semi-formal (or formal) argument that justifies the following statement on Page 3 of Zeller: “In the best case, we have no unresolved test outcomes…[t]hen, the number of tests t is limited by t≤log₂(|c_x|)—basically, we obtain a binary search.”

W 3/25

Moving Fast with Software Verification, Cristiano Calcagno, Dino Distefano, Jeremy Dubreil, Dominik Gabi, Pieter Hooimeijer, Martino Luca, Peter O’Hearn, Irene Papakonstantinou, Jim Purbrick, and Dulma Rodriguez, Proc. NASA Formal Method Symposium (Lily)
Automatic Inference of Necessary Preconditions, Patrick Cousot, Radhia Cousot, Manuel Fähndrich, and Francesco Logozzo, Proc. Verification, Model Checking, and Abstract Interpretation 2013
Question: Write a sample function for which INFER, as in use at Facebook, would detect a bug that would not be ruled out by any Clousot-style necessary precondition. (This code doesn’t need to compile—you can just make something up, and pseudocode is ok.)

M 4/6

Automatic Detection of Floating-Point Exceptions, Earl T. Barr, Thanh Vo, Vu Le, and Zhendong Su, Proc. POPL ’13 (BHKM group)
Differential Symbolic Execution, Suzette Person, Matthew B. Dwyer, Sebastian Elbaum, and Corina S. Pǎsǎreanu, Proc. SIGSOFT ’08 (BLT group)
Original paper: A Framework for Automated Testing of JavaScript Web Applications, Shai Artzi, Julian Dolby, Simon Holm Jensen, Anders Møller, and Frank Tip, Proc. ICSE ’11 (BLT group)

W 4/8

Z3-str: A Z3-Based String Solver for Web Application Analysis, Yunhui Zheng, Xiangyu Zhang, and Vijay Ganesh (String-Strung)
Verifying Resilient Software, P.E. Black and P.J. Windley (Webslingers)
Axiomatic Semantics Verification of a Secure Web Server, Ph.D. dissertation, Paul E. Black, section 7 only (Webslingers)

M 4/13

AVIO: Detecting Atomicity Violations via Access Interleaving Invariants, Shan Lu, Joseph Tucek, Feng Qin and Yuanyuan Zhou, Proc. ASPLOS ’06 (Learning)
Efficient System-Enforced Deterministic Parallelism, Amittai Aviram, Shu-Chun Weng, Sen Hu, Bryan Ford, Proc. OSDI ’10 (N.B.)

W 4/22

Case Studies and Tools for Contract Specifications, Todd W. Schiller, Kellen Donohue, Forrest Coward, Michael D. Ernst, Proc. ICSE ’14

CS 260r, Spring 2015: Topics and Close Readings in Computer Systems

Schedule

Exercises

Project