Available tips
Here are some tips:
- Including and excluding files
- What to back up
- Don't use your own compression
- How to back up "live" files and filesystems
- Printing statistics for all archives
- Copying an archive
- "Deleting" files from an archive
- Write-only keys
- Monitoring stats without access to root's cachedir
- Checking which file Tarsnap is processing
- Cleanly stopping a Tarsnap upload
- Cleanly pausing a Tarsnap operation
- Receiving emails from creating backups
Including and excluding files
You may select which files to be included in backups, either on a global basis or a per-directory / per-file basis.
What to back up
Tarsnap can be used to back up your entire system by pointing it to /, but this can lead to unnecessarily large backups. We can't tell you exactly what to back up, but we can give you some questions and advice to consider.
The fundamental question is: "if I lost this data, how much effort would it take to recover it?"
Operating system files
- If you are running a standard operating system like Ubuntu x.y or FreeBSD z.w, then it might not be necessary to back up those files — they are easy to reinstall.
- On the other hand, if you are running a heavily-tweaked operating system with a great deal of manually-installed software, then perhaps it would be a good idea to back up everything — although this will incur a large initial archive, future archives will be much smaller thanks to deduplication.
Temporary files
-
By their very definition, temporary files are not worth backing up.
However, simply excluding *tmp* may cause problems; for
example, it would affect tmpfile_handler.c and
webapp_add_user.tmpl. We therefore recommend:
exclude */tmp/*
-
There are a few other common patterns for files and directories that
are likely not worth backing up: memory dumps, the OSX user cache
directory, and the GNOME virtual filesystem:
exclude *.core exclude /Users/*/Library/Caches exclude /home/*/.gvfs/
Don't use your own compression
Don't apply any compression (gzip, bzip2, zip, tar.gz, etc.) to your data — Tarsnap itself will compress data after it performs deduplication. If you compress a file, it will interfere with deduplication, meaning that if you change that file and re-compress it, Tarsnap will upload (almost) all of that compressed file again. It is more efficient (and saves you money!) to let Tarsnap handle the deduplication and compression.
GNU gzip has a --rsyncable
option which attempts to
compress a file while retaining some amount of deduplication ability.
This is
not as efficient
as using Tarsnap's normal deduplication and compression, but if you
must use your own compression for files which are likely to be
modified, using --rsyncable
could lessen the increase in
storage space that you would otherwise incur.
How to back up "live" files and filesystems
For normal home use, Tarsnap needs no special handling. But when used on a production server, there are a few additional considerations.
Backing up a database
- Don't attempt to back up a "live" database. Use your database software to create a static dump (ideally a text file).
- Don't compress the database dump. Tarsnap's deduplication (followed by its compression) is more efficient than attempting to compress the database dump directly.
Backing up a live filesystem snapshot
A filesystem snapshot is a copy of a filesystem at a particular time. This allows an "atomic" archive to be created, even if the filesystem modifies files while Tarsnap is running. For an example of the problems which can arise from "non-atomic" archives, consider the following directory layout:
a/ b/lots-of-data.dat c/important-file.txt
Suppose that Tarsnap processed the directories in the order
a
, b
, c
; however, while Tarsnap
was archiving b
, the user moved
c/important-file.txt
into a
. When Tarsnap
read c
, the important-file.txt
was not in
that directory, so important-file.txt
was not archived at
all! A similar scenario (in which Tarsnap processed the directories
in reverse order) could result in important-file.txt
being archived in both directories. These concerns can apply to a
single file being modified in place — we could end up with half
of an old version of the file, and half of the new version.
In order to avoid these problems, filesystem snapshots were developed in the 1990s. Given a normal read-write (RW) filesystem, a filesystem (FS) snapshot is created as a read-only (RO) filesystem, then deleted once the archive is created:
filesystem (RW) --+--> use as normal | (create FS snapshot) | +--> FS snapshot (RO) --> run tarsnap --> delete FS snapshot
Please consult the documentation for your operating system or filesystem management software to learn how to create and delete filesystem snapshots.
There is a potential race condition when backing up from a filesystem
snapshot which could result in Tarsnap being unable to detect some
modifications to a file. If you would like to run Tarsnap on
filesystem snapshots, please read about the
--snaptime
argument.
Printing statistics for all archives
You can print statistics about all archives with:
tarsnap --print-stats -f '*'
Copying an archive
If you wish to create an identical copy of an archive (for example,
having identical archives named
backup-2016-01-01-daily
,
backup-2016-01-01-weekly
,
backup-2016-01-01-monthly
), we recommend that you
use:
tarsnap -c -f backup-2016-01-01-weekly @@backup-2016-01-01-daily
This will download some
metadata
(the list
of blocks in backup-2016-01-01-daily
), but this is a
relatively small amount of bandwidth.
Alternatively, you could simply create a second archive right after creating the first one; our deduplication algorithm will ensure that no data will be uploaded unnecessarily. This avoids downloading metadata, but if your files have changed between creating the first and second (or subsequent) archives, then your "copy" would not be an exact copy.
"Deleting" files from an archive
Tarsnap archives are read-only (or immutable), so there's no way to remove files from archives. Archives are both encrypted and signed, so there's no way for the Tarsnap service to even find the data you want to remove from an archive, never mind removing it and leaving you with a valid archive.
However, we can imitate deleting file(s) by copying an archive while excluding certain files, then removing the old archive:
tarsnap -c -f newarchive --exclude badstuff @@oldarchive
This will create a new archive from the old archive but without the excluded files — and it does so in an efficient way, downloading archive metadata but adding references to blocks of data without needing to download the entire archive. Once you've created an archive without the unwanted file(s), delete the old archive:
tarsnap -d -f oldarchive
Note that the --exclude
option can be tricky, so we
recommend trying this first with --dry-run -v
so you can
make sure that you're getting the right files excluded.
Write-only keys
tarsnap-keymgmt creates new keys with (optionally) reduced permissions. In particular, it can create "write-only" keys (sometimes called "encryption keys"):
tarsnap-keymgmt --outkeyfile write-only-key.txt -w ~/tarsnap-main-key-file.txt
This allows you to create a write-only key (without a passphrase) which is used to create archives automatically, and a key with more permissions (requiring a passphrase) which is used for restoring or deleting archives.
Since Tarsnap archives are immutable, this protects your archived data even if an intruder breaks into your server: She would be able to halt your backups (or add new archives with faulty data), but not delete your existing archives.
If you want to keep your full keys on a different (more secure) system and only use them there, or use keys on multiple systems for any other reason, things get more complicated.
For additional security, you may wish to store your master keys on a different (more secure) system, or even keep them offline. Whatever method you choose to use to protect your master keys, please ensure that you can always retrieve them when needed.
Warning
Monitoring stats without access to root's cachedir
Many users run tarsnap
as the root
user, but
this means that they cannot monitor their usage with an unprivileged
user account. There are two ways to allow user username to
view the statistics:
-
As
root
, configure the system with:touch /var/log/tarsnap-output.log
chown username /var/log/tarsnap-output.log
Then redirect the statistics to a file when creating an archive:tarsnap --print-stats -c OPTIONS >/var/log/tarsnap-output.log
-
If you use
sudo
, you can allow an unprivileged user account to run a specific command asroot
by adding this to yoursudoers
file:username ALL = (root) NOPASSWD: /usr/local/bin/tarsnap --print-stats
(adjust the directory as appropriate)
Checking which file Tarsnap is processing
The tarsnap
binary responds to the SIGUSR1
POSIX signal (and SIGINFO
on platforms which support it)
by printing the current file. For example, running this command in
one terminal:
tarsnap --dry-run -c ~/src/tarsnap
Then entering this command in a different terminal:
killall -SIGUSR1 tarsnap
Will produce output similar to this in the first terminal:
adding home/td/src/tarsnap/build/tarsnap-keygen (196608 / 637689 bytes)
On BSD systems (including OS X), a SIGINFO
can be sent to
the active terminal by pressing ^T
(CTRL-T
).
Cleanly stopping a Tarsnap upload
If you use ^C (CTRL-C) to stop Tarsnap uploading a new archive, you will lose progress back to the last checkpoint. To stop cleanly, use ^Q (CTRL-Q) or send the SIGQUIT signal to tell Tarsnap to create a truncated archive. The truncated archive will have ".part" appended to its name.
Cleanly pausing a Tarsnap operation
You can suspend any tarsnap operation by pressing ^Z
(CTRL-Z) which sends a SIGTSTP signal, or by sending a SIGSTOP or
SIGTSTP signal via kill
.
When tarsnap resumes (via the fg
or bg
commands, or sending it the SIGCONT signal), it will reconnect to the
Tarsnap server (if necessary).
Receiving emails from making backups
If you are running the /root/tarsnap-backup.sh backup script described in Simple usage, then you may wish to modify it to send you an email with the status, particularly if you are running it automatically with cron:
#!/bin/sh # User variables email=my_email@example.net tarsnap_output_filename=/tmp/tarsnap-output-temporary.log dirs="/home /etc" # Run backup tarsnap -c \ -f "$(uname -n)-$(date +%Y-%m-%d_%H-%M-%S)" \ ${dirs} >${tarsnap_output_filename} 2>&1 # Send email if [ $? -eq 0 ]; then subject="Tarsnap backup success" else subject="Tarsnap backup FAILURE" fi mail -s "${subject}" ${email} < ${tarsnap_output_filename} # Clean up rm ${tarsnap_output_filename}
Naturally, you will want to modify the email variable and the /MY/DATADIR directory name(s).
Setting up command-line mail
mail
will only function if you have
configured a Mail Transfer Agent (MTA) such as sendmail
,
postfix
, or ssmtp
. If your system does
not have an MTA already set up, then we recommend trying
ssmtp
(also known as sSMTP), which is an
extremely simple MTA and thus is easier to configure.
Privacy and print-stats
print-stats
enabled in your tarsnap config
file, the emails will include that information. However, the simple
mail
utility will send the email unencrypted, so that
info would be available to anybody who intercepts your emails. This
could be avoided by adding
--no-print-stats
to the script, or using a more sophisticated command-line mailer
which encrypts email.
More information
There are many other options available with tarsnap. All of the information on this page, and more, can be found in the man pages.