Date: 2018-07-09
Categories: tools; software
Backup tools
I thought it might be worth recording the current technologies I favour. Let's start with backup...
NOTE this is written from the point of view of someone working on multiple single-user Linux machines, in different physical locations.
Old solution: Btrfs
I have been using Btrfs for a while. It seems reasonable, but apparently still not mature. I liked it principally for its very fast send/receive functionality which enables incredibly quick incremental backups.
However, the process feels a bit flaky, and I don't like being dependent on Btrfs for incremental backups.
Git-annex
git-annex is a great tool which I have been using for many years to archive old data that is no longer modified. It essentially manages the metadata of files (including in which locations copies of a file may be found).
However, for backup it is not ideal because it doesn't naturally deal with named backups etc. It's design (based on symlinks) means that it is not particularly good at handling files that are modified. However, it is a very good choice for replicating large numbers of large files across machines, and keeping track of where all the copies are.
Borg backup
borg backup is a great tool for creating backups of a particular directory. The backups are stored as archives, with each archive containing a small amount of "live" data (essentially an index), together with immutable files which contain the data (new files are created, but as far as I can see existing files are not altered after creation).
Git-annex + borg
The combination of git-annex and borg is, for me, the best backup system currently available. The idea is to use borg for the backups, and git-annex to handle the task of replicating backups across machines, and keeping track of where all the data is.
borg has several files that are repeatedly written (eg index files) and git-annex doesn't handle these well. The solution is to add these files to the underlying git repository, rather than git-annex. This means that the index files are handled by git, whereas all the large data files are handled by git annex. This combination works perfectly!
Related posts:
- 2020-05-22 A simple MCQ test program, in OCaml, compiled to JavaScript
- 2019-08-30 B-tree random write performance
- 2019-08-21 ML'19 Workshop at ICFP: A key-value store for OCaml
- 2018-07-09 Backup tools
- 2018-02-01 New OCaml library: path resolution
- 2017-11-14 New OCaml parsing algorithm: tjr_simple_earley
- 2017-09-17 Two new OCaml libraries: P0 and tjr-csv
- 2017-03-16 tjr-btree: a CoW B-tree library in OCaml
- 2016-11-17 OCaml string functions
- 2016-02-19 Tree-structured text
- 2016-02-09 Simple implementation of an Earley-like parsing algorithm
- 2015-06-26 P5 scala parsing library
- 2014-11-21 Talk on parsing at the University of Sussex
- 2014-09-26 P1 combinator parsing library for OCaml
- 2014-09-26 E3 earley parser library for OCaml
- 2014-04-15 New release of P3 code on github
- 2014-03-02 New release of P3 code on github
- 2013-12-16 New release of P3 code on github
- 2013-11-24 Experience of using Lem
- 2011-12-01 Verified parsing