﻿id	summary	reporter	owner	description	type	status	priority	milestone	component	version	resolution	keywords	cc	launchpad_bug
599	"maybe add share-metadata: ""where-are-the-other-shares"" hints"	warner		"An idea that we've kicked around before came back to me today, in a different
form. What if each share (living on some server somewhere), in addition to
the data necessary to recover the file (blocks and hashes and signatures),
also contained hints about the locations of other shares? These hints could
take the form of publically-visible FURLs of the storage servers that are
holding the other shares.

These ""hints"" would not be authoritative, because shares may have been
deleted, copied, or regenerated without necessarily updating all the other
shares. But, assuming that shares tend to stick around for a while, these
hints could provide a high-probability way to locate all the other shares.

The basic addition would be an extra method on the {{{WriteBucket}}} object,
something like {{{set_share_location_hints}}}, which accepts a list of FURLs,
and stores it next to the share. The retrieve side would be an extra return
argument to the {{{get_buckets}}} call, to return the list of hints.

The download algorithm (the new state-machine oriented one that we're
thinking of building, to help with #287 and #193) would be changed. The first
phase of the download process is responsible for acquiring {{{ReadBucket}}}
objects, each of which connects to some remote server. This phase would be
changed to use the regular ""Tahoe-2"" peer-selection algorithm to find a batch
of servers (perhaps 5 servers) to whom {{{get_buckets}}} messages will be
sent in parallel. Each time a response comes back, a new message will be sent
to the next unasked peer in the permuted list. But if a response includes
hints about the locations of other shares, then all of those hinted servers
will be asked. The download process can begin once at least ""k"" shares have
been located.

For mutable files, this increases the chance that we'll locate all shares of
the file, which helps reduce the danger of accidental-rollback.

The hints can contain FURLs, but in general only the tubid portion will be
used. The client will look in its list of connections to see if it has any
which connect to the same tubid as in the hint. This minimizes the importance
of servers maintaining their same IP address and port number for long periods
of time. If the client hasn't heard about the tubid from the Introducer, it
could conceivably try to connect to the given FURL anyways. This would cause
servers which have left the grid (or merely been unable to connect to the
introducer recently) to still be used, however it would also make it slightly
easier to cause mischief by publishing FURLs that point to port 25, etc.

For maximum benefit, hints should be updated when the file is repaired, but
we must be careful to define the necessary authority correctly. For mutable
files, the authority to modify the share is sufficient: anyone who can
clobber the file is also allowed to clobber the hints. For immutable files,
the question is more difficult. One possibility is that the repairer submits
potential hints to the servers that hold old shares, and the server validates
the hint itself before committing them to the share's metadata. The server
would do this by connecting to the FURL in question and performing the same
{{{get_buckets}}} that a downloading client would do. The server could then
randomly check a few segments against their block hash trees, and make sure
the remote share hash fits into its local share hash tree. This verification
process is made marginally easier by the fact that the verifying server
already has a copy of the share hash tree.

This last feature (servers checking up on each other) is reminiscent of the
original ""Tahoe 1"" design, in which each file had a cabal of servers holding
its shares, and the members of this cabal kept track of each other
(validating each others shares, repairing the file when necessary, recruiting
new members if too many servers dropped out). We abandoned that design
because it required each server to keep track of too much information, but if
the location-of-other-shares list is treated merely as a hint, then perhaps
it wouldn't be too bad.
"	enhancement	new	major	undecided	code-storage	1.2.0		download		
