Read "An Investigation of the Therac-25 Accidents", by Nancy Leveson and Clark S. Turner.
The Therac-25 was an early radiation therapy machine. A simple concurrency bug in its control and user-interface software led, between 1985 and 1987, to at least six massive radiation overdoses. Leveson and Turner's thorough and masterful analysis of the Therac-25 accidents, and the software process that made the accidents possible, have made this incident one of the most cited of the (unfortunately many) cases where software problems have led to loss of life.
Why read this paper in an OS course? Because the Therac-25's problems were operating systems problems: concurrency control, race conditions, hardware-software interfaces. Because it's still easy to make similar mistakes, even in more modern systems. Because we engineers and scientists must learn from past mistakes to avoid repeating them. Because it's a pointed illustration of systems complexity -- the complexity inherent in a large system, and the complexity that the rest of this course will try to give you tools to manage. And because it's a great paper.
Think this 20-year-old problem is irrelevant? Think again.
Many parts of a complex system's development process can contribute to a catastrophic failure. Choose one of the following areas and describe how a problem in this area ultimately contributed to the Therac-25 accidents. Be sure to justify your answer using information from the Leveson paper.
- Design
- Implementation
- Testing
- Oversight
Also, discuss a technical idea or technique from class that could have helped the Therac-25 programmer build a less buggy operating system. Be specific! Talk about what the technique does, and why that would help.
Your response must be typed or word-processed, not handwritten. It should fit on one side of one sheet of paper, using 11- or 12-point fonts and generous margins. Put your name at the top of the page. I value quality, clarity and conciseness (brevity: saying exactly what you need to say, and no more), not exhaustive completeness. If you can't fit everything you want to say into one page, then cut out the boring stuff until you have one page's worth of material. You won't be able to address every issue on one page, so choose your best arguments and best evidence. Responses longer than one page will get no credit.
Here are some good reports from last quarter's class, in response to a similar question (but not the same question). Take a look! (Of course, don't copy.) From Jon Binney; From Ali Mojibi.
Get started reading the paper early! It is long, but clear. Don't be afraid of technical terms like "gantry" or "collimator". If you don't want to look them up, just read on and try to get the important meaning from context.
This assignment was partially adapted from a one-pager handed out in MIT's 6.033.