How does the Tarsnap client create an archive?
Consider a typical backup scenario:
- Read files: The client begins by reading data from the hard disk with code from libarchive.
- Segment: The data from each file is split into context-dependent blocks or "chunks". The mathematics behind context-dependent chunkification is discussed in "From bsdtar to tarsnap: Building an online backup service."
-
Deduplicate: The client recognizes
chunks that it's already uploaded to the server.
A simplified version is of this is explained in
How does deduplication work?
- Compress: Each block is compressed with the DEFLATE algorithm from zlib. Given the size of chunks produced by the segmentation stage, there is no significant difference in compression ratio between the DEFLATE and LZMA algorithms, but LZMA is significantly slower.
- Encrypt: Each compressed block is encrypted with your private keys so that nobody else can access your unencrypted data.
- Upload: The "encoded" bytes (encrypted, compressed, new data) are uploaded to the Tarsnap service.