Section 4 – CS 161 2018

Linux bugs: “An Empirical Study of Operating Systems Errors.” Andy Chou, Junfeng Yang, Benjamin Chelf, Seth Hallem, and Dawson Engler. In Proc. SOSP 2001. Link

We present a study of operating system errors found by automatic, static, compiler analysis applied to the Linux and OpenBSD kernels. Our approach differs from previous studies that consider errors found by manual inspection of logs, testing, and surveys because static analysis is applied uniformly to the entire kernel source, though our approach necessarily considers a less comprehensive variety of errors than previous studies. In addition, automation allows us to track errors over multiple versions of the kernel source to estimate how long errors remain in the system before they are fixed.

We found that device drivers have error rates up to three to seven times higher than the rest of the kernel. We found that the largest quartile of functions have error rates two to six times higher than the smallest quartile. We found that the newest quartile of files have error rates up to twice that of the oldest quartile, which provides evidence that code “hardens” over time. Finally, we found that bugs remain in the Linux kernel an average of 1.8 years before being fixed.

Finding file system bugs: “Using Model Checking to Find Serious File System Errors.” Junfeng Yang, Paul Twohey, Dawson Engler, and Madanlal Musuvathi. In Proc. OSDI 2004. Link

This paper shows how to use model checking to find serious errors in file systems. Model checking is a formal verification technique tuned for finding corner-case errors by comprehensively exploring the state spaces defined by a system. File systems have two dynamics that make them attractive for such an approach. First, their errors are some of the most serious, since they can destroy persistent data and lead to unrecoverable corruption. Second, traditional testing needs an impractical, exponential number of test cases to check that the system will recover if it crashes at any point during execution. Model checking employs a variety of state-reducing techniques that allow it to explore such vast state spaces efficiently.

We built a system, FiSC, for model checking file systems. We applied it to three widely-used, heavily-tested file systems: ext3, JFS, and ReiserFS. We found serious bugs in all of them, 32 in total. Most have led to patches within a day of diagnosis. For each file system, FiSC found demonstrable events leading to the unrecoverable destruction of metadata and entire directories, including the file system root directory “/”.

Section 4: Kernel and file system bugs