Tarsnap - Tips

Available tips

Here are some tips:

Including and excluding files
What to back up
Don't use your own compression
How to back up "live" files and filesystems
Printing statistics for all archives
Copying an archive
"Deleting" files from an archive
Write-only keys
Monitoring stats without access to root's cachedir
Checking which file Tarsnap is processing
Cleanly stopping a Tarsnap upload
Cleanly pausing a Tarsnap operation
Receiving emails from creating backups

Including and excluding files

You may select which files to be included in backups, either on a global basis or a per-directory / per-file basis.

What to back up

Tarsnap can be used to back up your entire system by pointing it to /, but this can lead to unnecessarily large backups. We can't tell you exactly what to back up, but we can give you some questions and advice to consider.

The fundamental question is: "if I lost this data, how much effort would it take to recover it?"

Operating system files

If you are running a standard operating system like Ubuntu x.y or FreeBSD z.w, then it might not be necessary to back up those files — they are easy to reinstall.
For example, consider a software developer who runs Ubuntu and only uses software from the repositories. She keeps a USB drive of the latest Ubuntu installation image, never modifies files in /etc, and maintains a script which lists all the packages she's installed. In this situation, the only files she would consider backing up are in /home since recovering everything else is trivial.
On the other hand, if you are running a heavily-tweaked operating system with a great deal of manually-installed software, then perhaps it would be a good idea to back up everything — although this will incur a large initial archive, future archives will be much smaller thanks to deduplication.
For example, consider an old lab computer with a version of UNIX from 1994, running software which communicates with rare equipment whose manufacturer went bankrupt fifteen years ago. Generations of PhD students have passed on knowledge of how to use the machine, but reinstalling it from scratch is all but impossible now.

Temporary files

By their very definition, temporary files are not worth backing up. However, simply excluding *tmp* may cause problems; for example, it would affect tmpfile_handler.c and webapp_add_user.tmpl. We therefore recommend:
```
exclude */tmp/*
```
There are a few other common patterns for files and directories that are likely not worth backing up: memory dumps, the OSX user cache directory, and the GNOME virtual filesystem:
```
exclude *.core
exclude /Users/*/Library/Caches
exclude /home/*/.gvfs/
```

Don't use your own compression

Don't apply any compression (gzip, bzip2, zip, tar.gz, etc.) to your data — Tarsnap itself will compress data after it performs deduplication. If you compress a file, it will interfere with deduplication, meaning that if you change that file and re-compress it, Tarsnap will upload (almost) all of that compressed file again. It is more efficient (and saves you money!) to let Tarsnap handle the deduplication and compression.

GNU gzip has a --rsyncable option which attempts to compress a file while retaining some amount of deduplication ability. This is not as efficient as using Tarsnap's normal deduplication and compression, but if you must use your own compression for files which are likely to be modified, using --rsyncable could lessen the increase in storage space that you would otherwise incur.

How to back up "live" files and filesystems

For normal home use, Tarsnap needs no special handling. But when used on a production server, there are a few additional considerations.

Backing up a database

Don't attempt to back up a "live" database. Use your database software to create a static dump (ideally a text file).
Don't compress the database dump. Tarsnap's deduplication (followed by its compression) is more efficient than attempting to compress the database dump directly.

Backing up a live filesystem snapshot

A filesystem snapshot is a copy of a filesystem at a particular time. This allows an "atomic" archive to be created, even if the filesystem modifies files while Tarsnap is running. For an example of the problems which can arise from "non-atomic" archives, consider the following directory layout:

a/
b/lots-of-data.dat
c/important-file.txt

Suppose that Tarsnap processed the directories in the order a, b, c; however, while Tarsnap was archiving b, the user moved c/important-file.txt into a. When Tarsnap read c, the important-file.txt was not in that directory, so important-file.txt was not archived at all! A similar scenario (in which Tarsnap processed the directories in reverse order) could result in important-file.txt being archived in both directories. These concerns can apply to a single file being modified in place — we could end up with half of an old version of the file, and half of the new version.

In order to avoid these problems, filesystem snapshots were developed in the 1990s. Given a normal read-write (RW) filesystem, a filesystem (FS) snapshot is created as a read-only (RO) filesystem, then deleted once the archive is created:

filesystem (RW) --+--> use as normal
                  |
           (create FS snapshot)
                  |
                  +--> FS snapshot (RO) --> run tarsnap --> delete FS snapshot

Please consult the documentation for your operating system or filesystem management software to learn how to create and delete filesystem snapshots.

There is a potential race condition when backing up from a filesystem snapshot which could result in Tarsnap being unable to detect some modifications to a file. If you would like to run Tarsnap on filesystem snapshots, please read about the --snaptime argument.

Printing statistics for all archives

You can print statistics about all archives with:

tarsnap --print-stats -f '*'

Copying an archive

If you wish to create an identical copy of an archive (for example, having identical archives named backup-2016-01-01-daily, backup-2016-01-01-weekly, backup-2016-01-01-monthly), we recommend that you use:

tarsnap -c -f backup-2016-01-01-weekly @@backup-2016-01-01-daily

This will download some metadata (the list of blocks in backup-2016-01-01-daily), but this is a relatively small amount of bandwidth.

Alternatively, you could simply create a second archive right after creating the first one; our deduplication algorithm will ensure that no data will be uploaded unnecessarily. This avoids downloading metadata, but if your files have changed between creating the first and second (or subsequent) archives, then your "copy" would not be an exact copy.

"Deleting" files from an archive

Tarsnap archives are read-only (or immutable), so there's no way to remove files from archives. Archives are both encrypted and signed, so there's no way for the Tarsnap service to even find the data you want to remove from an archive, never mind removing it and leaving you with a valid archive.

However, we can imitate deleting file(s) by copying an archive while excluding certain files, then removing the old archive:

tarsnap -c -f newarchive --exclude badstuff @@oldarchive

This will create a new archive from the old archive but without the excluded files — and it does so in an efficient way, downloading archive metadata but adding references to blocks of data without needing to download the entire archive. Once you've created an archive without the unwanted file(s), delete the old archive:

tarsnap -d -f oldarchive

Note that the --exclude option can be tricky, so we recommend trying this first with --dry-run -v so you can make sure that you're getting the right files excluded.

Write-only keys

tarsnap-keymgmt creates new keys with (optionally) reduced permissions. In particular, it can create "write-only" keys (sometimes called "encryption keys"):

tarsnap-keymgmt --outkeyfile write-only-key.txt -w ~/tarsnap-main-key-file.txt

This allows you to create a write-only key (without a passphrase) which is used to create archives automatically, and a key with more permissions (requiring a passphrase) which is used for restoring or deleting archives.

Since Tarsnap archives are immutable, this protects your archived data even if an intruder breaks into your server: She would be able to halt your backups (or add new archives with faulty data), but not delete your existing archives.

If you want to keep your full keys on a different (more secure) system and only use them there, or use keys on multiple systems for any other reason, things get more complicated.

For additional security, you may wish to store your master keys on a different (more secure) system, or even keep them offline. Whatever method you choose to use to protect your master keys, please ensure that you can always retrieve them when needed.

Warning

If you create a write-only key and lose the master key, you will not be able to retrieve your data.

Monitoring stats without access to root's cachedir

Many users run tarsnap as the root user, but this means that they cannot monitor their usage with an unprivileged user account. There are two ways to allow user username to view the statistics:

As root, configure the system with:

touch /var/log/tarsnap-output.log

chown username /var/log/tarsnap-output.log

Then redirect the statistics to a file when creating an archive:

tarsnap --print-stats -c OPTIONS >/var/log/tarsnap-output.log

If you use sudo, you can allow an unprivileged user account to run a specific command as root by adding this to your sudoers file:
```
username    ALL = (root) NOPASSWD: /usr/local/bin/tarsnap --print-stats
```
(adjust the directory as appropriate)

Checking which file Tarsnap is processing

The tarsnap binary responds to the SIGUSR1 POSIX signal (and SIGINFO on platforms which support it) by printing the current file. For example, running this command in one terminal:

tarsnap --dry-run -c ~/src/tarsnap

Then entering this command in a different terminal:

killall -SIGUSR1 tarsnap

Will produce output similar to this in the first terminal:

adding home/td/src/tarsnap/build/tarsnap-keygen (196608 / 637689 bytes)

On BSD systems (including OS X), a SIGINFO can be sent to the active terminal by pressing ^T (CTRL-T).

Cleanly stopping a Tarsnap upload

If you use ^C (CTRL-C) to stop Tarsnap uploading a new archive, you will lose progress back to the last checkpoint. To stop cleanly, use ^Q (CTRL-Q) or send the SIGQUIT signal to tell Tarsnap to create a truncated archive. The truncated archive will have ".part" appended to its name.

Cleanly pausing a Tarsnap operation

You can suspend any tarsnap operation by pressing ^Z (CTRL-Z) which sends a SIGTSTP signal, or by sending a SIGSTOP or SIGTSTP signal via kill.

When tarsnap resumes (via the fg or bg commands, or sending it the SIGCONT signal), it will reconnect to the Tarsnap server (if necessary).

Receiving emails from making backups

If you are running the /root/tarsnap-backup.sh backup script described in Simple usage, then you may wish to modify it to send you an email with the status, particularly if you are running it automatically with cron:

#!/bin/sh

# User variables
email=my_email@example.net
tarsnap_output_filename=/tmp/tarsnap-output-temporary.log
dirs="/home /etc"

# Run backup
tarsnap -c \
    -f "$(uname -n)-$(date +%Y-%m-%d_%H-%M-%S)" \
    ${dirs} >${tarsnap_output_filename} 2>&1

# Send email
if [ $? -eq 0 ]; then
	subject="Tarsnap backup success"
else
	subject="Tarsnap backup FAILURE"
fi
mail -s "${subject}" ${email} < ${tarsnap_output_filename}

# Clean up
rm ${tarsnap_output_filename}

Naturally, you will want to modify the email variable and the /MY/DATADIR directory name(s).

Setting up command-line `mail`

Sending email with mail will only function if you have configured a Mail Transfer Agent (MTA) such as sendmail, postfix, or ssmtp. If your system does not have an MTA already set up, then we recommend trying ssmtp (also known as sSMTP), which is an extremely simple MTA and thus is easier to configure.

Privacy and `print-stats`

If you have print-stats enabled in your tarsnap config file, the emails will include that information. However, the simple mail utility will send the email unencrypted, so that info would be available to anybody who intercepts your emails. This could be avoided by adding --no-print-stats to the script, or using a more sophisticated command-line mailer which encrypts email.

More information

There are many other options available with tarsnap. All of the information on this page, and more, can be found in the man pages.