Verifying a high-performance crash-safe file system using a tree specification

Haogang Chen,Atalay İleri,Stephanie Wang,M Frans Kaashoek,Adam Chlipala,Nickolai Zeldovich,Tej Chajed,Alex Konradi

doi:10.1145/3132747.3132776

Haogang Chen, Atalay İleri + Show 6 more

Open Access

PDF Available

https://doi.org/10.1145/3132747.3132776

Copy DOI

Export

Save

Cite

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

DFSCQ is the first file system that (1) provides a precise specification for fsync and fdatasync, which allow applications to achieve high performance and crash safety, and (2) provides a machine-checked proof that its implementation meets this specification. DFSCQ's specification captures the behavior of sophisticated optimizations, including log-bypass writes, and DFSCQ's proof rules out some of the common bugs in file-system implementations despite the complex optimizations. The key challenge in building DFSCQ is to write a specification for the file system and its internal implementation without exposing internal file-system details. DFSCQ introduces a metadata-prefix specification that captures the properties of fsync and fdatasync, which roughly follows the behavior of Linux ext4. This specification uses a notion of tree sequences---logical sequences of file-system tree states---for succinct description of the possible states after a crash and to describe how data writes can be reordered with respect to metadata updates. This helps application developers prove the crash safety of their own applications, avoiding application-level bugs such as forgetting to invoke fsync on both the file and the containing directory. An evaluation shows that DFSCQ achieves 103 MB/s on large file writes to an SSD and durably creates small files at a rate of 1,618 files per second. This is slower than Linux ext4 (which achieves 295 MB/s for large file writes and 4,977 files/s for small file creation) but much faster than two recent verified file systems, Yggdrasil and FSCQ. Evaluation results from application-level benchmarks, including TPC-C on SQLite, mirror these microbenchmarks.

Highlights

File systems achieve high I/O performance and crash safety by implementing sophisticated optimizations to increase disk throughput
The widely used Linux ext4 is an example of an I/O-efficient file system; the above optimizations allow it to batch many writes into a single I/O operation and to reduce the number of disk-write barriers that flush data to disk [33, 56]
We report the results of several benchmarks on top of DFSCQ, compared to Linux ext4 and two recent verified file systems

Summary

Introduction

File systems achieve high I/O performance and crash safety by implementing sophisticated optimizations to increase disk throughput These optimizations include deferring writing buffered data to persistent storage, grouping many transactions into a single I/O operation, checksumming journal entries, and bypassing the write-ahead log when writing to file data blocks. The widely used Linux ext is an example of an I/O-efficient file system; the above optimizations allow it to batch many writes into a single I/O operation and to reduce the number of disk-write barriers that flush data to disk [33, 56] These optimizations complicate a file system’s implementation. It took 6 years for ext developers to realize that two optimizations (data writes that bypass the journal and journal checksumming) taken together can lead to disclosure of previously deleted data after a crash [30]. File systems typically delay the reuse of freed disk blocks until in-memory transactions are flushed to disk

Results

Discussion

Conclusion