Frequently Asked Questions about Tarsnap
Questions are split into four sections: getting started, legal/accounting/administrative, technical details, and often requested features. If you have a question that is not answered here, please contact us.
Getting started
- I tried to register but I never received the confirmation email. What's wrong?
-
How can I point the
configure
script at locally installed libraries? - I've been using Tarsnap for a few hours, but I can't see any usage shown on the web interface. What's going on?
- Does Tarsnap run on Windows?
Legal / accounting / administrative
- What happens when my account runs out of money?
- Could you send me a monthly account summary?
- Why did my daily storage usage cost change when I haven't uploaded or deleted any data?
- Do you accept bitcoins as payment?
- I've forgotten my tarsnap account password; how can I reset it?
- I've lost the key file for a machine; how can I delete its data so that I'm not stuck paying for it forever?
- I've lost the key file for a machine (or I have the key file, but I've forgotten the passphrase I set on it); how can I read my archives?
Technical details
- If Tarsnap costs $0.25 / GB of storage, how is it possible to store "archives adding up to several terabytes" while paying less than $10/month?
- Since the cost depends on "encoded bytes", how can I predict how much Tarsnap will cost before signing up?
- Is Tarsnap storage reliable?
-
Why doesn't Tarsnap
--list-archives
print archives in alphabetical (or chronological) order? - Can I move my Tarsnap setup to a new computer?
- How can I investigate network problems?
-
What does "
Pathname in pax header can't be converted to current locale
" mean?
Requested features
- Why don't you automatically redirect HTTP links to HTTPS?
- Is there an option to avoid storing data on US servers?
- Is there an option to print which files have been modified since the previous backup?
- My computer automatically runs tarsnap once a day, but sometimes there's no new data to back up. Can I avoid creating an archive in this case?
- Why doesn't Tarsnap use AWS Glacier?
Getting started
I tried to register but I never received the confirmation email. What's wrong?
It's probably stuck in a spam filter somewhere. Tarsnap sends email via Amazon SES, which is usually successful in delivering email (more so than when Tarsnap mail was sent directly, at least); but convincing everyone to accept email from you is incredibly difficult these days.
If you can't convince your spam filters to let Tarsnap email through, try registering from an email address at a different domain.
How can I point the configure
script at locally installed libraries?
If you installed libraries in a non-standard location (such as $HOME/.local/), we recommend that you modify your $HOME/.profile as follows:
export C_INCLUDE_PATH=$HOME/.local/include/:$C_INCLUDE_PATH export CPLUS_INCLUDE_PATH=$HOME/.local/include/:$CPLUS_INCLUDE_PATH export LIBRARY_PATH=$HOME/.local/lib/:$LIBRARY_PATH export LD_LIBRARY_PATH=$HOME/.local/lib:$LD_LIBRARY_PATH
Change $HOME/.local/ as appropriate, then reload your profile with:
. $HOME/.profile
I've been using Tarsnap for a few hours, but I can't see any usage shown on the web interface. What's going on?
All Tarsnap accounting is currently done daily at approximately midnight UTC. Wait until after that point and your usage should be visible. (Payments made should show up immediately, however).
Does Tarsnap run on Windows?
Only via Cygwin or Windows Subsystem for Linux.
Legal / accounting / administrative
What happens when my account runs out of money?
You will be sent an email when your account balance falls below 7 days worth of storage costs warning you that you should probably add more money to your account soon. If your account balance falls below zero, you will lose access to Tarsnap, an email will be sent to inform you of this, and a 7-day countdown will start; if your account balance is still below zero after 7 days, it may be deleted (along with any data you have stored) at our discretion. (If you can't add money yet but will be able to later, contact us and explain the situation. We're reasonable people and simply knowing that you're alive and haven't forgotten that you were using Tarsnap is very helpful.)
Could you send me a monthly account summary?
Certainly! Monthly invoices are sent to all Canadians for legal reasons; non-Canadians can receive them if they contact us. We do require you to provide a (physical) mailing address, as the invoice needs to show why you are not paying Canadian tax. (Canadians using Tarsnap are already required to provide their name and mailing address.)
Why did my daily storage usage cost change when I haven't uploaded or deleted any data?
Tarsnap's storage cost is priced per month, and different months have different numbers of days. In addition, although the pricing is defined in picodollars ($10-12), Tarsnap computes the cost per byte-day of storage for each month in attodollars ($10-18).
Do you accept bitcoins as payment?
Not any more. For more details, see the email announcement.
I've forgotten my tarsnap account password; how can I reset it?
Please contact the author.
I've lost the key file for a machine; how can I delete its data so that I'm not stuck paying for it forever?
Please contact the author.
I've lost the key file for a machine (or I have the key file, but I've forgotten the passphrase I set on it); how can I read my archives?
You can't. Your key file contains the only copy of the cryptographic keys needed to decrypt your data; if you lose them there is no way to get your data back.
Technical details
If Tarsnap costs $0.25 / GB of storage, how is it possible to store "archives adding up to several terabytes" while paying less than $10/month?
Please see our page about deduplication efficiency.
Since the cost depends on "encoded bytes", how can I predict how much Tarsnap will cost before signing up?
Starting with tarsnap 1.0.36, you can test the deduplication and compression without an account:
tarsnap --dry-run --no-default-config --print-stats --humanize-numbers -c /MY/DATADIR
This will produce output in the form:
tarsnap: Performing dry-run archival without keys (sizes may be slightly inaccurate) tarsnap: Removing leading '/' from member names Total size Compressed size All archives 2.2 GB 1.8 GB (unique data) 2.1 GB 1.7 GB This archive 2.2 GB 1.8 GB New data 2.1 GB 1.7 GB
The value which matters for the cost is "(unique data) — Compressed size", which represents the "encoded bytes" that is stored on the Tarsnap servers. In above example, this is 1.7 GB, so it will cost approximately $0.43 (= 1.7 * $0.25) to upload the data, and $0.43 per month for storage.
Note that deduplication is most effective when creating multiple snapshots (e.g., daily backups), so it will not help much for the initial snapshot. We have a few examples of deduplication with multiple snapshots.
Is Tarsnap storage reliable?
Yes. Data archived via Tarsnap is stored on the Amazon S3 storage service (the original version, not the "reduced redundancy" version introduced in 2010).
Why doesn't Tarsnap
--list-archives
print archives in
alphabetical (or chronological) order?
The archive metadata which contains Tarsnap archive names and
creation times is encrypted; so it's impossible for the Tarsnap
client code to figure out in what order the archives should be listed
until it downloads and decrypts the metadata. Once it has done so,
it might as well just print out the information immediately —
if you want a particular order, sort(1)
is your friend.
Can I move my Tarsnap setup to a new computer?
Yes, no problem! Tarsnap doesn't care about the physical hardware; only the data, key file, and cache directory that it is given to work with.
To confirm that everything is set up, we recommend that after you have copied your data to the new system:
-
Create an archive on the old system with:
tarsnap -c --print-stats MY_OPTIONS
- Transfer the cache directory to the new system
-
Simulate creating an archive on the new system:
tarsnap -c --dry-run --print-stats MY_OPTIONS
The "new data" size should be quite small (consisting of archive metadata), and the "this archive" size should be approximately the same as the old statistics (machines can present metadata in a slightly different manner, and can list files within a directory in a new order which could alter the compression efficiency).
How can I investigate network problems?
We have a series of tips about debugging Tarsnap network problems.
What does "Pathname in pax header
can't be converted to current locale
" mean?
This message arises if you created an archive that contains filenames with characters that can't be represented in the current locale. We recommend that if you would like to use non-ASCII characters, your locale should support UTF-8.
For example, consider this archive:
tarsnap -tf kana kana/ kana/kana.txt kana/カナ.txt
With an environment which cannot print Japanese characters, we get:
LANG=C tarsnap -tf kana kana/ kana/kana.txt tarsnap: Pathname in pax header can't be converted to current locale. kana/\343\202\253\343\203\212.txt tarsnap: Error exit delayed from previous errors.
Requested features
Why don't you automatically redirect HTTP links to HTTPS?
Not redirecting most of our static pages was a deliberate choice:
- The tarsnap website redirects to HTTPS for pages where integrity is important (e.g. logins and downloads).
- HTTPS does not provide any confidentiality for static pages; it's trivial to see the hostname being accessed and simple traffic analysis reveals which page within the website is being loaded as well.
- An attacker which can hijack an unencrypted HTTP session could prevent any redirection to HTTPS anyway.
-
TLS stacks and web browsers are notoriously buggy, and I want the
tarsnap website to impose the least requirements possible — if so
inclined, you can read the tarsnap website with:
printf "GET / HTTP/1.0\r\n\r\n" | nc tarsnap.com 80
Is there an option to avoid storing data on US servers?
Due to concerns about the privacy of personal data and industrial espionage, some organizations would prefer (or even mandate) that their data not be stored on servers which reside in the United States of America.
From a purely technical standpoint, there is no benefit to this. Tarsnap encrypts all data before it leaves a computer. It therefore does not matter where that data is stored; attackers (be they criminals or governments) cannot decrypt the data. However, we realize that while the technical staff in an organization may understand our encryption, policy-makers may still be hesitant, especially if they face personal liability if their customers' data is stolen. Adding this feature could also make some security checklists and paperwork easier to fill out; for example "is any data stored outside of the EU?".
We are hoping to add the option to avoid US-based servers, but any change to our infrastructure and customer data must be handled extremely carefully. At the moment we are not announcing any estimated time of completion for this feature.
Is there an option to print which files have been modified since the previous backup?
Not directly. Because Tarsnap's deduplication happens after files have been squished together into a tar stream and that tar stream has been split into blocks, it is not feasible to track backwards to figure out which file a particular new block came from. For that matter, you can get blocks which contain pieces from several different files, or blocks of data could appear in multiple files.
A more technical answer is that the deduplication is done at a different layer from the crawl-a-directory-tree-and-generate-a-tarball code. In essence the layers are:
- bsdtar code, which crawls a directory tree and feeds files to the
- libarchive code, which generates a stream of tar and feeds it to the
- multitape code, which splits the stream into several sub-streams and uses the
- chunkifier code, which splits each sub-stream into chunks and sends them to the
- chunk deduplication code, which looks at each chunk to decide if it's new.
(And underneath this all is the transactional storage layer, the request protocol layer, the network connection protocol layer, and the underlying non-blocking network I/O code.)
Feeding information back from the chunk deduplication code up to the bsdtar code is theoretically possible, but the necessary code would be complex and would risk introducing bugs.
One trick for tracking the changed files (other than the obvious
"find . -mtime -1d
", assuming daily backups) is to run
tarsnap
with a small value for --maxbw-rate
(e.g., --maxbw-rate 50000
) and then send it a
SIGUSR1
every second to
check which file Tarsnap is
processing.
This will prompt tarsnap to repeatedly print its
current progress, and when it slows down dramatically you've found a
place where it is finding lots of new data which it needs to upload.
My computer automatically runs tarsnap once a day, but sometimes there's no new data to back up. Can I avoid creating an archive in this case?
This is not supported by the tarsnap
binary. It
could be achieved with an additional shell script, but we
don't recommend it:
- Thanks to Tarsnap's deduplication, the amount of waste is extremely small: It only needs to store a tiny amount of additional metadata. For an archive with no new user data, the non-deduplicated metadata is approximately 1 kB + 1 byte per file + 1 byte per MB of data. In most cases, this is less than 10 kB, which costs $0.0025 per month to store.
-
There is a potential danger in disrupting a regular schedule of
archives: How can you distinguish between "no archives because there
was no new data" and "no archives because the internet connection
failed" weeks or months after the fact? It can be very reassuring to
see daily archives in
tarsnap --list-archives
.
We therefore recommend that if your data changes enough that automatic daily archives are useful, don't worry about a few days' worth of "unnecessary" archives. Naturally, if your data changes very infrequently, you may prefer to use weekly, monthly, or manually-triggered backups.
Why doesn't Tarsnap use AWS Glacier?
Amazon Glacier is a cloud storage service aimed at backing up large amounts of data where it may be accessed slowly and infrequently. However, we cannot store some files in this "cold storage" and other files in the regular Amazon S3 service while retaining Tarsnap's deduplication abilities. It would theoretically be possible to mark all the files stored with a particular key as being frozen ("glaciated"?), but the implementation would require reworking a great deal of the Tarsnap server code. We would like to support this, but at the moment we are not announcing any estimated time of completion for this feature.
For more details, see why Tarsnap doesn't use Glacier.