[tahoe-dev] blocks instead of files?

Wed Mar 10 11:44:26 PST 2010

Jody Harris writes:

> Has the Tahoe Dev ever discussed the possibility of encrypting and storing
> blocks instead of files?

I don't know if the Tahoe-LAFS developers ever considered it, but this is
how Octavia works.

http://wieldysoftware.com/octavia

>  - you would never hit the large memory overhead problems seen with 100 MB
> and larger files.
>  - processing and storage would just take the file one chunk at a time
>  - retrieval and decryption (seems) that it would stream nicely since
> allocation of the next block(s) could take place while the current block is
> streaming out.

Yes.

Also, you can parallelize reads and writes across many servers at a block
level of granularity.

http://wieldysoftware.com/octavia/protocol-overview.html

Octavia clients choose their preferred block size(s); a block is "uniquely"
identified by its (size, SHA-512d(block data)) tuple.

I hypothesize that fitting a block plus all 8va and UDP protocol overhead
into an MTU-sized chunk might be ideal. Of course, not all your
communications with all your servers will use the same MTU, and there is no
particularly large internet-guaranteed MTU (it can be as low as ~= 500 bytes
or less IIRC). I was assuming that this MTU optimization is likely to be
most effective in the storage LAN deployment scenario (we are all on 10 Gb
Ethernet and we all use jumbo frames; otherwise are are on the internet and
we just cope as best we can).

Using UDP also means I pay no connection setup or maintenance costs, which
could be especially important for high-volume servers.

I also wrote up the protocol as a Google Protocol Buffers specification;
this has the benefit of being easy to read and of generating read/write code
for me. I haven't uploaded that Hg blob yet, but look for it soon. It's part
of a coming FUSE + C implementation.