Formal Specification, Verification, and Implementation of Fault-Tolerant Systems
- PDF – short version (12 pages, 2 columns)
- PDF – long version (16 pages, 1 column)
- DSN publication talking about both the overall methodology and the system side of our approach
- Virtual machines running Nuprl to browse the library; in particular, the specifications of the 2/3 consensus protocol.
Distributed programs are known to be extremely difficult to implement, test, verify, and maintain. This is due in part to the large number of possible unforeseen interactions among components, and to the difficulty of precisely specifying what the programs should accomplish in a formal language that is intuitively clear to the programmers. We discuss here a methodology that has proven itself in building a state of the art implementation of Multi-Paxos and other distributed protocols used in a deployed database system. This article focuses on the basic ideas of formal EventML programming illustrated by implementing a fault-tolerant consensus protocol and showing how we prove its safety properties with the Nuprl proof assistant.