Improve the speed of some Tarsnap operations
Here are some tips to improve Tarsnap's speed:
Restore large archives faster
Tarsnap extract performance is currently latency-bound; the latency in question is client→EC2→S3, and the EC2→S3 step is about 50 ms.
The best workaround right now is to do parallel extracts; if you can split your data between multiple archives, or use --include and --exclude options when extracting so that each tarsnap -x is extracting a subset of the files, you should be able to use more bandwidth.
This process has been automated in at least one third-party tool.
Restore a single file faster
Tarsnap is built on top of the tar
utility, which allows
you to have multiple copies of the same file (to append an updated
version to a tape disk). Due to this functionality, Tarsnap needs to
scan the entire archive to see if there's other (later) copies of any
file it is extracting.
If you have a large archive and know that you only have a single copy of the file that you wish to restore, we recommend that you use the --fast-read command-line option, which stops reading the archive as soon as it has extracted the file.
Delete multiple archives faster
Multiple archives can be deleted with the same command; this is usually faster (and never slower) than using individual delete commands:
tarsnap -d \ -f mycomputer-2015-08-07_13-52-46 \ -f mycomputer-2015-08-09_19-37-20 \ -f mycomputer-2015-08-14_08-22-34
In particular, deleting multiple archives at once allows tarsnap to cache metadata rather than downloading it multiple times. The speed-up therefore depends on how similar the archives are; if the archives are completely different then it will not save any time.
For optimal cache performance, sort the list of archives so that archives which share a lot of their contents are deleted consecutively. In most cases, this is the same as sorting the archive names alphabetically.