FileId – Tahoe-LAFS

Context Navigation

← Previous Version
View Latest Version
Next Version →

Version 1 (modified by warner, at 2007-04-26T22:39:19Z) (diff)
explain fileids and how they are used

The "file id" is a tagged hash of the plaintext of the file to be uploaded. It is used as the final end-to-end integrity check of the user's data, use to detect any problems in the entire processing chain: corrupted shares, broken erasure coding, the wrong decryption key.

Because it is a hash of the entire file, download-time verification cannot be completed until the entire file has been retrieved. For large files that are being streamed, this only provides an after-the-fact check: "sorry, but the data that you just finished downloading and viewing was corrupted".

Because it is a hash of the plaintext, it can only be verified by someone with full read access to the file.

Is is not the only such integrity check. The VerifierId is a tagged hash of the crypttext: this allows people without read access to the file data to verify the crypttext against errors in share storage or erasure coding (but not encryption). The FileVerifier? and FileRepairer? are given the VerifierId but not the FileId so that they can check the file's integrity without actually being able to read the data.

The root hash of the share Merkle tree is the other integrity check. It allows individual blocks of each share to be validated incrementally, one block at a time. This allows validated data to be delivered in small pieces. The share hashes only cover the shares, so they will detect corrupted shares, but will not detect problems in the erasure decoding or decryption.

Download in other formats:

Plain Text