I didn't mean to hit any jugulars. I see network outages all the time at the edges and<div>not many sites have the the billion dollars required to maintain a four minute/year down 99.999% uptime. Some years I'm lucky, and some years I'm not and see easily few days to two weeks of total disconnect time a year per site and a couple years I've had even longer. Backhoes hitting fiber optic lines and DSL lines going down so outages is what keeps dinner on the table. For me, I don't consider it an unlikely scenario, it just happens too many times. Kinda like saying, you don't need backups, failures rarely happen. I have this funny conversation way too many times. just like babies, new one is born every minute.</div>


<div>     Me> you have backups?</div><div>     Boss>and why do we need backups?...   </div><div>     Me> well it is kinda like insurance WHEN the hard drives fail...</div><div>     Boss> and this happens all the time?...</div>

<div>     Me> no, rarely, but it DOES eventually happen. MTBF blah blah blah... </div>

<div>     Boss> and how much is this going to cost?....</div><div>     Me> XYZ blah blah blah dinner on my plate blah blah blah equipment blah blah blah</div><div>     Boss> blah blah blah..ROI...blah blah blah....we can do without backups".</div>


<div><br></div><div>To respond to an earlier post on this thread, I see a few scenarios, but other scenario's that I'm thinking of is the source control repository (don't even want to ask about database support yet). If the source control repository was inside of Tahoe, inconsistency on a merge, can't even start to figure out how much work is involved in recoverying 100% from that. (I'm sure the intensity of this specific problem would be related to the specific source control tools). I think this meets the criteria that was mentioned. It is typically running on a server and patches are stored in the repo as a single user, but there are constant checkins throughout the day and when two sites or even home to corporate scenario, both sides have to keep working. If the repo comes to a grinding halt, that is good, cuz nobody checks in until the repo comes back online. If people are checking their changes in and not knowing it might get lost, and not knowing it is getting lost, is very bad. Since most users are discarding their workspace, the bugs magically reappear later. QA guys really hate that. Without a preserved copy of the patches somewhere in serverspace, recovery is I would think too timeconsuming. I think the management decision would be to recover from backups/snapshots back to a known working state and go from there. that would definitely would loose some brownie points for tahoe. I like the word Data Integrity because I'm concerned about 3 things -- availability, integrity, and confidentiality. I believe I need all three. Looks like Tahoe has accounted for the C (with the LeastAuthority) and A (with the distributed and .happy), my concern is in the I.</div>


<div><br><div class="gmail_quote">On Wed, Aug 8, 2012 at 1:53 AM, Zooko Wilcox-O'Hearn <span dir="ltr"><<a href="mailto:zooko@zooko.com" target="_blank">zooko@zooko.com</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div>There's a tiny chance that a very unlucky sequence of failures or</div>

network partitions, combined with the uncoordinated use of the same<br>

write cap by multiple people, will result in the irretrievable<br>

destruction of your Incoming directory. (To see why, think how you<br>

need K different shares of that directory to reconstruct it, and each<br>

writer is simultaneously writing out shares of their own new version.<br>

In a very unlucky scenario, each writer would succeed at writing fewer<br>

than K of their own version to the servers, and then suddenly<br>

disconnect from the Net. The result would be that there are fewer than<br>

K shares of each of several different versions, meaning that no<br>

version is recoverable and the directory is lost forever.)<br>

<br>

On the other hand, should that unlucky chance not strike, I suspect<br>

that the "automatic merging of directory modifications" feature -- the<br>

one that I just mentioned that I didn't like it and want to remove it<br>

-- is making sure that simultaneous uncoordinated adds and removes of<br>

children from that Incoming directory is reliable.<br>

<br>

(I still want to remove it, but now that I see people are relying it,<br>

I now feel an obligation to replace it with something better when<br>

doing so!)<br>

<br>

If you want to be safer, you give each uploader their own separate<br>

"Incoming-John" directory, and the curators use a tool to view all of<br>

the separate Incomings. That would eliminate the risk outlined above.<br>

(A tool such as "find" if LAFS is mounted via FUSE, or a custom script<br>

that runs "tahoe ls" on each of Incoming, or a custom web app that<br>

queries the WAPI.)<br>

<div><div><br>

Regards,<br>

<br>

Zooko<br>

_______________________________________________<br>

tahoe-dev mailing list<br>

<a href="mailto:tahoe-dev@tahoe-lafs.org" target="_blank">tahoe-dev@tahoe-lafs.org</a><br>

<a href="https://tahoe-lafs.org/cgi-bin/mailman/listinfo/tahoe-dev" target="_blank">https://tahoe-lafs.org/cgi-bin/mailman/listinfo/tahoe-dev</a><br>

</div></div></blockquote></div><br></div>