The problem: specifications

Everyone knows specifications are important; how else do you know what something is supposed to do?
So, where are these specifications?
For file systems:
- POSIX online spec eg http://pubs.opengroup.org/onlinepubs/9699919799/
- other informal specifications (Linux Standard Base, libc, muslc etc)
- man pages (Linux, Mac, FreeBSD... all subtly different, even between distributions of same OS)
- implementations (presumably you can read the source code and discover "the truth" for a particular impl)
- test suites (for a small number of test cases eg 50 for rename in POSIX test suite)
Problems:
- lots of specifications - which is "right"?
- specs are mostly informal, and so (almost inevitably) ambiguous, incomplete, plain wrong
- kernel code is formal, but unreadable unless you are a kernel coder, and impractical as a spec
- (perhaps most important) Does an implementation actually meet the specification? Unfortunately not possible to use existing specs to test an implementation directly

The problem: implementations and specifications

Do file system implementations meet the specifications? NO (none of them as far as we can tell!)
- for "typical" use cases, they mostly behave sensibly; in edge cases things go wrong
- => the specifications are not much use in these edge cases

Another problem: how do implementations differ?

In edge cases, we know file systems behave quite differently
The question now is: how do they behave differently? Some individuals (e.g. file system developers) may know some of the truth (e.g. about their file system); no-one knows the full truth.

Do specifications really matter? Do file systems matter?

Who said this (c. 2009)?

Speaking with my 'git' hat on, I can tell that

git was designed to have almost minimal requirements from the filesystem, and to not do anything even half-way clever.

despite that, we've hit an absolute metric sh*tload of filesystem bugs and misfeatures...

...we found a real data-loss CIFS bug thanks to that. Afaik, the bug has been there for a year and half. Don't tell me nobody uses cifs...

Before that, we had cross-directory rename bugs. Or the inexplicable "pread() doesn't work correctly on HP-UX". Or the "readdir() returns the same entry multiple times" bug. And all of this without ever doing anything even remotely odd. No file locking, no rewriting of old files, no lseek()ing in directories, no nothing...

...But I also think that the "we write meta-data synchronously, but then the actual data shows up at some random later time" is just crazy talk. That's simply insane. It guarantees that there will be huge windows of times where data simply will be lost if something bad happens...

...The point here? Sometimes those filesystem people who say "you must use fsync() to get well-defined semantics" are the same people who SCREWED IT UP SO DAMN BADLY THAT FSYNC ISN'T ACTUALLY REALISTICALLY USEABLE!

And more recently http://www.itworld.com/article/2868393/linus-torvalds-apples-hfs-is-probably-the-worst-file-system-ever.html

OK, so what should a specification (in general) look like?

Precise, unambiguous, readable, maintainable, comphrehensive, detailed
- also loose: the spec should precisely and unambiguously describe any looseness
Usable as a test oracle
- ideally it should be easy to use the spec directly to test conformance of real-world implementations

What have we done?

A formal specification of filesystem behaviour (with variants for POSIX, Linux, Mac, FreeBSD)
- precise, unambiguous, readable, maintainable, comphrehensive, detailed
A test oracle (based directly on the spec) that can determine whether any observed trace of a real-world system conforms to the spec
The test oracle enables exhaustive testing. We also provide an extensive test suite of >20k scripts, and the results of running those tests on >40 different combinations of libc+OS+filesystem
- 20k test cases (and acceptable results) probably infeasible with hand-written test cases
- impossible to compare results at this scale (800k traces) without an oracle

How does everything fit together?

Next few slides show the system structure, with excerpts from the main files involved

System structure

Test scripts mostly generated automatically, with additional hand-written scripts
Test scripts are fed to a test executor which executes the scripts on a real-world libc+OS+filesystem stack
Results are recorded as traces
Traces are checked by SibylFS (the spec) to determine whether they represent allowable behaviour or not; the output is in the form of a checked trace

Example test script

A test script is essentially a list of libc calls to execute on a real-world system

 @type script
 #####################################
 # Test link___link_nonempty_dir1__d2__sl_dotdot_d2___nonempty_dir1__nonexist_4
 #####################################
 mkdir "empty_dir1" 0o777
 mkdir "empty_dir2" 0o777
 mkdir "nonempty_dir1" 0o777
 mkdir "nonempty_dir1/d2" 0o777
 open "nonempty_dir1/d2/f3.txt" [O_CREAT;O_WRONLY] 0o666
 write! (FD 3) "Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor inc" 83
 close (FD 3)
 ... // further setup commands
 link "nonempty_dir1/d2/sl_dotdot_d2" "nonempty_dir1/nonexist_4"
 ...

Example trace (behaviour of real-world system)

A trace records the libc calls from the test script, and the responses received from the real-world system

# processing file 'link___link_nonempty_dir1__d2__sl_dotdot_d2___nonempty_dir1__nonexist_4-int.trace' ...
        @type trace
        #####################################
        # Test link___link_nonempty_dir1__d2__sl_dotdot_d2___nonempty_dir1__nonexist_4
        #####################################
     5: mkdir "empty_dir1" 0o777
        Tau
        RV_none
     6: mkdir "empty_dir2" 0o777
        Tau
        RV_none
     7: mkdir "nonempty_dir1" 0o777
        Tau
        RV_none
     8: mkdir "nonempty_dir1/d2" 0o777
        Tau
        RV_none
     9: open "nonempty_dir1/d2/f3.txt" [O_CREAT;O_WRONLY] 0o666
        Tau
        RV_num(3)
    10: write! (FD 3) "Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor inc" 83
        Tau
        RV_num(83)

...

    28: link "nonempty_dir1/d2/sl_dotdot_d2" "nonempty_dir1/nonexist_4"
        Tau
        RV_none

...

Example checked trace

See e.g. here (local) or here (online)

Test results

We found the following sorts of bugs

Errors (and ambiguities etc) in the specifications (including POSIX, man pages etc)
Errors and deviations in implementations
Errors in our spec (which we then fixed of course!)

Test results, stats

Trace acceptance
- Linux: 21061 accepted by spec; 9 rejected (21070 total)
- Mac: 21036 accepted by spec, 34 rejected (21070 total); FreeBSD similar
- essentially no barrier to getting 100% trace acceptance
Coverage: >98% (of the spec)
- by way of comparison, a paper (Groce et al., 2007, "Randomized differential testing...") from NASA scientists that applied randomized testing to a model of a filesystem achieved 89% coverage

Test results, strange behaviours

Error codes are quite often non-POSIX
Path resolution, particularly when a trailing slash is involved, is variable, non-POSIX
Treatment of paths referencing symlinks, particularly when the path ends in a trailing slash, is highly variable
Various overlay filesystems, and FUSE filesystems, mess up things like permissions
More serious: posixovl/VFAT (posix emulation on top of VFAT) gets link count wrong when rename overwrites a file that is linked elsewhere; possible to get to a state where the filesystem contains no files, but there is no free space (space leak)
OpenZFS on Linux 3.13.0-34 (Ubuntu Trusty): files opened with O_APPEND would not seek to the end of the file before write or pwrite (probably causing applications that use this functionality to fail)
OpenZFS on OS/X: possible to execute a sequence of calls which leads to the calling process hanging using 100% CPU, unresponsive to signals; volume cannot be unmounted, machine cannot be shut down; force unmounting may cause storage device to become unusable until next system restart
Permissions: the Linux implementation of permissions should give the same behaviour from one kernel version to the next; however, we found test scripts involving file and directory access that failed on Linux kernel 3.13 and succeeded on 3.14 which we believe is due to a buggy 3.13 implementation of permissions in edge cases

A FreeBSD bug

An important invariant
- if there is an error when executing a file system function, then the state of the file system is unchanged
- => e.g. so if I try to create a file using open and I get an error, I don't have to clear up after myself
- this invariant appears to hold for POSIX, Linux and Mac
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=202892 (2015-09-04!)
- opening a file: symlink deleted, file created, error returned
- => here, this invariant is broken on FreeBSD

Testing summary

Huge number of tests, mostly automatically generated; tested on a large number of stacks
SibylFS is very efficient at checking tests
Excellent trace acceptance and coverage figures
In edge cases there are numerous differences between POSIX, Linux, Mac and FreeBSD; most are not very interesting, but the spec gives a complete description of them; the testing even uncovered some relatively serious bugs (which we didn't expect)
The real result of testing is that we have confidence in the spec

A virtuous circle

Is our spec/ testing perfect?
Almost certainly not, but in many ways it improves on the status quo
And there is a virtuous circle (next page)

Possible uses of the spec

Documentation (formal counterpart to POSIX)
Testing existing/new filesystems (particularly non-traditional filesystems)
Reference specification for verified filesystem implementations
Reference model (e.g. for systems research - a lot of papers construct their own little models of parts of the filesystem)
Basis for formal proof (Thomas Tuerk did some initial work on this in Isabelle/HOL) e.g. of properties of filesystems, or as a basis for soundness proofs of program logics etc.
To provide a POSIX compatiblity layer e.g. on Windows, or non-traditional settings such as Mirage
For analysis of applications: e.g. do any applications use libc calls in edge cases where the behaviour of Linux and Mac differs?
For analysis of filesystems: e.g. do Linux and Mac behave the same when restricted to some subset of libc?
To identify gaps in existing test suites
etc

sibylfs.io

The problem

The problem: specifications

The problem: implementations and specifications

Another problem: how do implementations differ?

Do specifications really matter? Do file systems matter?

OK, so what should a specification (in general) look like?

sibylfs.io

What have we done?

How does everything fit together?

System structure

Example test script

Example trace (behaviour of real-world system)

Example checked trace

Testing

Test results

Test results, stats

Test results, strange behaviours

A FreeBSD bug

Testing summary

A virtuous circle

A virtuous circle

Conclusion

Possible uses of the spec

Future work

Thanks for listening. Any questions?