Tom Ridge
2015-09-21
(with Thomas Tuerk, David Sheets, Andrea Giugliano, Anil Madhavapeddy, Peter Sewell;
talk at departmental Industrial Advisory Board; 20 mins talk+? mins qus)
Everyone knows specifications are important; how else do you know what something is supposed to do?
So, where are these specifications?
In edge cases, we know file systems behave quite differently
The question now is: how do they behave differently? Some individuals (e.g. file system developers) may know some of the truth (e.g. about their file system); no-one knows the full truth.
Who said this (c. 2009)?
Speaking with my 'git' hat on, I can tell that
- git was designed to have almost minimal requirements from the filesystem, and to not do anything even half-way clever.
- despite that, we've hit an absolute metric sh*tload of filesystem bugs and misfeatures...
...we found a real data-loss CIFS bug thanks to that. Afaik, the bug has been there for a year and half. Don't tell me nobody uses cifs...
Before that, we had cross-directory rename bugs. Or the inexplicable "pread() doesn't work correctly on HP-UX". Or the "readdir() returns the same entry multiple times" bug. And all of this without ever doing anything even remotely odd. No file locking, no rewriting of old files, no lseek()ing in directories, no nothing...
...But I also think that the "we write meta-data synchronously, but then the actual data shows up at some random later time" is just crazy talk. That's simply insane. It guarantees that there will be huge windows of times where data simply will be lost if something bad happens...
...The point here? Sometimes those filesystem people who say "you must use fsync() to get well-defined semantics" are the same people who SCREWED IT UP SO DAMN BADLY THAT FSYNC ISN'T ACTUALLY REALISTICALLY USEABLE!
And more recently http://www.itworld.com/article/2868393/linus-torvalds-apples-hfs-is-probably-the-worst-file-system-ever.html
A test oracle (based directly on the spec) that can determine whether any observed trace of a real-world system conforms to the spec
Test scripts mostly generated automatically, with additional hand-written scripts
Test scripts are fed to a test executor which executes the scripts on a real-world libc+OS+filesystem stack
Results are recorded as traces
Traces are checked by SibylFS (the spec) to determine whether they represent allowable behaviour or not; the output is in the form of a checked trace
A test script is essentially a list of libc calls to execute on a real-world system
@type script
#####################################
# Test link___link_nonempty_dir1__d2__sl_dotdot_d2___nonempty_dir1__nonexist_4
#####################################
mkdir "empty_dir1" 0o777
mkdir "empty_dir2" 0o777
mkdir "nonempty_dir1" 0o777
mkdir "nonempty_dir1/d2" 0o777
open "nonempty_dir1/d2/f3.txt" [O_CREAT;O_WRONLY] 0o666
write! (FD 3) "Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor inc" 83
close (FD 3)
... // further setup commands
link "nonempty_dir1/d2/sl_dotdot_d2" "nonempty_dir1/nonexist_4"
...
A trace records the libc calls from the test script, and the responses received from the real-world system
# processing file 'link___link_nonempty_dir1__d2__sl_dotdot_d2___nonempty_dir1__nonexist_4-int.trace' ...
@type trace
#####################################
# Test link___link_nonempty_dir1__d2__sl_dotdot_d2___nonempty_dir1__nonexist_4
#####################################
5: mkdir "empty_dir1" 0o777
Tau
RV_none
6: mkdir "empty_dir2" 0o777
Tau
RV_none
7: mkdir "nonempty_dir1" 0o777
Tau
RV_none
8: mkdir "nonempty_dir1/d2" 0o777
Tau
RV_none
9: open "nonempty_dir1/d2/f3.txt" [O_CREAT;O_WRONLY] 0o666
Tau
RV_num(3)
10: write! (FD 3) "Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor inc" 83
Tau
RV_num(83)
...
28: link "nonempty_dir1/d2/sl_dotdot_d2" "nonempty_dir1/nonexist_4"
Tau
RV_none
...
We found the following sorts of bugs
Error codes are quite often non-POSIX
Path resolution, particularly when a trailing slash is involved, is variable, non-POSIX
Treatment of paths referencing symlinks, particularly when the path ends in a trailing slash, is highly variable
Various overlay filesystems, and FUSE filesystems, mess up things like permissions
More serious: posixovl/VFAT (posix emulation on top of VFAT) gets link count wrong when rename overwrites a file that is linked elsewhere; possible to get to a state where the filesystem contains no files, but there is no free space (space leak)
OpenZFS on Linux 3.13.0-34 (Ubuntu Trusty): files opened with O_APPEND
would not seek to the end of the file before write
or pwrite
(probably causing applications that use this functionality to fail)
OpenZFS on OS/X: possible to execute a sequence of calls which leads to the calling process hanging using 100% CPU, unresponsive to signals; volume cannot be unmounted, machine cannot be shut down; force unmounting may cause storage device to become unusable until next system restart
Permissions: the Linux implementation of permissions should give the same behaviour from one kernel version to the next; however, we found test scripts involving file and directory access that failed on Linux kernel 3.13 and succeeded on 3.14 which we believe is due to a buggy 3.13 implementation of permissions in edge cases
Huge number of tests, mostly automatically generated; tested on a large number of stacks
SibylFS is very efficient at checking tests
Excellent trace acceptance and coverage figures
In edge cases there are numerous differences between POSIX, Linux, Mac and FreeBSD; most are not very interesting, but the spec gives a complete description of them; the testing even uncovered some relatively serious bugs (which we didn't expect)
The real result of testing is that we have confidence in the spec
Documentation (formal counterpart to POSIX)
Testing existing/new filesystems (particularly non-traditional filesystems)
Reference specification for verified filesystem implementations
Reference model (e.g. for systems research - a lot of papers construct their own little models of parts of the filesystem)
Basis for formal proof (Thomas Tuerk did some initial work on this in Isabelle/HOL) e.g. of properties of filesystems, or as a basis for soundness proofs of program logics etc.
To provide a POSIX compatiblity layer e.g. on Windows, or non-traditional settings such as Mirage
For analysis of applications: e.g. do any applications use libc calls in edge cases where the behaviour of Linux and Mac differs?
For analysis of filesystems: e.g. do Linux and Mac behave the same when restricted to some subset of libc?
To identify gaps in existing test suites
etc
Lots of possibilities. In the short term, we hope for take-up from industry.
We are currently working on a verified file system implementation. Compared to existing file systems, we hope for: