Date: 2018-07-09
Categories: tools; software

Backup tools

I thought it might be worth recording the current technologies I favour. Let's start with backup...

NOTE this is written from the point of view of someone working on multiple single-user Linux machines, in different physical locations.

Old solution: Btrfs

I have been using Btrfs for a while. It seems reasonable, but apparently still not mature. I liked it principally for its very fast send/receive functionality which enables incredibly quick incremental backups.

However, the process feels a bit flaky, and I don't like being dependent on Btrfs for incremental backups.

Git-annex

git-annex is a great tool which I have been using for many years to archive old data that is no longer modified. It essentially manages the metadata of files (including in which locations copies of a file may be found).

However, for backup it is not ideal because it doesn't naturally deal with named backups etc. It's design (based on symlinks) means that it is not particularly good at handling files that are modified. However, it is a very good choice for replicating large numbers of large files across machines, and keeping track of where all the copies are.

Borg backup

borg backup is a great tool for creating backups of a particular directory. The backups are stored as archives, with each archive containing a small amount of "live" data (essentially an index), together with immutable files which contain the data (new files are created, but as far as I can see existing files are not altered after creation).

Git-annex + borg

The combination of git-annex and borg is, for me, the best backup system currently available. The idea is to use borg for the backups, and git-annex to handle the task of replicating backups across machines, and keeping track of where all the data is.

borg has several files that are repeatedly written (eg index files) and git-annex doesn't handle these well. The solution is to add these files to the underlying git repository, rather than git-annex. This means that the index files are handled by git, whereas all the large data files are handled by git annex. This combination works perfectly!