<div>Hi everyone, over the last few days I have been working on a proposal for GSoC to address share rebalancing and repair. I've copied the proposal below (with some of my personal contact information redacted :] ). If you see something wrong in my proposal, have any questions, or have any suggestions, please let me know.</div>
<div><br></div><div>Thanks!</div><div>Mark Berger</div><div><br></div><div><br></div><div><br></div><div><div>Organization: Tahoe-LAFS</div><div>=============</div><div><br></div><div>Student Info:</div><div>=============</div>
<div><span class="Apple-tab-span" style="white-space:pre"> </span>Mark J. Berger</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>Time Zone: Pacific</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>Time Zone during GSoC: Eastern</div>
<div><span class="Apple-tab-span" style="white-space:pre"> </span>IRC Handle: <a href="mailto:Mark_B@irc.freenode.net">Mark_B@irc.freenode.net</a></div><div><span class="Apple-tab-span" style="white-space:pre"> </span>Github: markberger</div>
<div><span class="Apple-tab-span" style="white-space:pre"> </span>Email: mjberger [at] <a href="http://stanford.edu">stanford.edu</a></div><div><br></div><div>University Info:</div><div>================</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>University: Stanford University</div>
<div><span class="Apple-tab-span" style="white-space:pre"> </span>Major: Computer Science</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>Current Year: Freshman</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>Expected Graduation: June 2016</div>
<div><span class="Apple-tab-span" style="white-space:pre"> </span>Degree: BS</div><div><br></div><div>About Me:</div><div>=========</div><div><br></div><div><span class="Apple-tab-span" style="white-space:pre"> </span>I'm a freshman at Stanford University studying computer science. Right now</div>
<div><span class="Apple-tab-span" style="white-space:pre"> </span>I am finishing up my core requirements and will be pursuing the artificial</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>intelligence track or the systems track within the major. My interests lie</div>
<div><span class="Apple-tab-span" style="white-space:pre"> </span>in machine learning, large distributed systems, and web applications.</div><div><br></div><div><span class="Apple-tab-span" style="white-space:pre"> </span>I began programming during an internship at Four Directions Productions in</div>
<div><span class="Apple-tab-span" style="white-space:pre"> </span>2011, where I learned how to use Python in conjunction with Maya. The</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>majority of my college coursework has been in C or C++ on linux with a</div>
<div><span class="Apple-tab-span" style="white-space:pre"> </span>little Java. This has made me familiar with tools such as GCC, GDB and</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>Valgrind.</div>
<div><br></div><div><span class="Apple-tab-span" style="white-space:pre"> </span>While I have never contributed to an open source project before, I am</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>making an effort to learn about Tahoe-LAFS and become familiar with its</div>
<div><span class="Apple-tab-span" style="white-space:pre"> </span>code base and community. Using a virtual machine, I've successfully</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>installed Tahoe on an Ubuntu server and connected to the Public Test Grid.</div>
<div><span class="Apple-tab-span" style="white-space:pre"> </span>I've also subscribed to the mailing list, connected to the IRC channel, and</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>successfully pulled the code off of Github. While I know my lack of</div>
<div><span class="Apple-tab-span" style="white-space:pre"> </span>experience in open source is a short coming, I am completely dedicated to</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>using GSoC's Community Bonding Period to overcome any obstacles before the</div>
<div><span class="Apple-tab-span" style="white-space:pre"> </span>official coding period begins.</div><div><br></div><div><br></div><div>Project Title: Share Rebalancing and Repair in Tahoe-LAFS</div><div>=========================================================</div>
<div><br></div><div>Abstract:</div><div>=========</div><div><br></div><div><span class="Apple-tab-span" style="white-space:pre"> </span>The "servers of happiness" algorithm has improved Tahoe's ability to</div>
<div><span class="Apple-tab-span" style="white-space:pre"> </span>maximize redundancy by ensuring a given subset of all shares are placed on</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>distinct nodes. However, this processes is not used to upload mutable</div>
<div><span class="Apple-tab-span" style="white-space:pre"> </span>files, instead opting for the old "shares of happiness" algorithm, which</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>has well documented downsides. Additionally, file repair does not</div>
<div><span class="Apple-tab-span" style="white-space:pre"> </span>necessarily redistribute files to new servers when nodes have been added.</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>This creates issues in terms of redundancy and long term server health.</div>
<div><span class="Apple-tab-span" style="white-space:pre"> </span>Implementing proper file rebalancing for all file types during file upload,</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>modification, and repair will enhance the reliability of the Tahoe system</div>
<div><span class="Apple-tab-span" style="white-space:pre"> </span>and take full advantage of erasure encoding.</div><div><br></div><div><br></div><div>Deliverables:</div><div>=============</div><div><br></div><div><span class="Apple-tab-span" style="white-space:pre"> </span>1. Mutable files automatically distribute over nodes according to the</div>
<div><span class="Apple-tab-span" style="white-space:pre"> </span>"servers of happiness" algorithm whenever uploaded, modified, or repaired</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>(ticket #232). </div>
<div><br></div><div><span class="Apple-tab-span" style="white-space:pre"> </span>2. Repair will redistribute files according to "servers of happiness" </div><div><span class="Apple-tab-span" style="white-space:pre"> </span>algorithm and only renew the appropriate leases (ticket #699).</div>
<div><br></div><div><span class="Apple-tab-span" style="white-space:pre"> </span>3. Documentation changed to correctly reflect the new feature set</div><div><br></div><div><span class="Apple-tab-span" style="white-space:pre"> </span>4. Create a test suite to be used on a network of virtual machines in order</div>
<div><span class="Apple-tab-span" style="white-space:pre"> </span>to test file rebalancing.</div><div><br></div><div><br></div><div>Time Line:</div><div>==========</div><div><br></div><div>Note: I would like to have a code review session with my mentor on a weekly</div>
<div>basis at minimum, especially at the beginning of the program. Those sessions are</div><div>left off the time line to avoid redundancy</div><div><br></div><div>May 27th - June 17th (Community Bonding):</div><div>-----------------------------------------</div>
<div><br></div><div><span class="Apple-tab-span" style="white-space:pre"> </span>- Remain available via IRC and email</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>- Closely follow the development email list</div>
<div><span class="Apple-tab-span" style="white-space:pre"> </span>- Isolate and understand the classes which pertain to the current </div><div><span class="Apple-tab-span" style="white-space:pre"> </span> implementations of the servers of happiness algorithm to determine which </div>
<div><span class="Apple-tab-span" style="white-space:pre"> </span> parts can be reused.</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>- Discuss with my mentor(s) and the community to determine whether code </div>
<div><span class="Apple-tab-span" style="white-space:pre"> </span> should be refactored to apply to both immutable and mutable files or if </div><div><span class="Apple-tab-span" style="white-space:pre"> </span> the two need to remain distinct for design reasons</div>
<div><span class="Apple-tab-span" style="white-space:pre"> </span>- Discuss with my mentor(s) and the community the best way to go about testing </div><div><span class="Apple-tab-span" style="white-space:pre"> </span> file rebalancing.</div>
<div><br></div><div>Note: June 3rd through the 14th is my final exams period and I will be packing</div><div><span class="Apple-tab-span" style="white-space:pre"> </span> so that I can go home to Upstate NY. Since I will be very busy during this</div>
<div><span class="Apple-tab-span" style="white-space:pre"> </span> time, not all of the above may be accomplished in time to start coding.</div><div><span class="Apple-tab-span" style="white-space:pre"> </span> My classes do not resume until the end of September 23rd, so I can push my</div>
<div><span class="Apple-tab-span" style="white-space:pre"> </span> time line back a week or two if need be.</div><div><br></div><div><br></div><div>Jun 17th - 28th</div><div>---------------</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>- Implement "servers of happiness" for mutable files during the initial</div>
<div><span class="Apple-tab-span" style="white-space:pre"> </span> file upload and file modification</div><div><br></div><div>Jul 1st - 12th</div><div>--------------</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>- Throughly document code</div>
<div><span class="Apple-tab-span" style="white-space:pre"> </span>- Write test scripts for larger networks</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>- Test code using virtual machines or predetermined test scheme from CBP</div>
<div><br></div><div>Jul 15th - 19th</div><div>---------------</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>- Clean up test scripts</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>- Throughly document test scripts</div>
<div><span class="Apple-tab-span" style="white-space:pre"> </span>- Fix minor bugs</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>- Continue to consider and test edge cases</div><div><br></div><div>
Note: "Servers of happiness" for mutable files should be in a mergable state</div><div> with tests before the midway point on July 29th.</div><div><br></div><div>Jul 22nd - Aug 2</div><div>----------------</div>
<div><br></div><div><span class="Apple-tab-span" style="white-space:pre"> </span>- Modify repair code to use the "server of happiness" algorithm for both</div><div><span class="Apple-tab-span" style="white-space:pre"> </span> immutable and mutable files. This should be accomplished by utilizing the</div>
<div><span class="Apple-tab-span" style="white-space:pre"> </span> existing code from the initial upload process</div><div><br></div><div><span class="Apple-tab-span" style="white-space:pre"> </span>- Edit mechanism for lease renewal to ensure minimal amount of lease</div>
<div><span class="Apple-tab-span" style="white-space:pre"> </span> renewal is done during rebalancing</div><div><br></div><div>Aug 5th - 16th</div><div>--------------</div><div><br></div><div><span class="Apple-tab-span" style="white-space:pre"> </span>- Throughly document code</div>
<div><span class="Apple-tab-span" style="white-space:pre"> </span>- Extend tests for mutable files to encompass rebalancing during file repair</div><div><br></div><div>Aug 19th - 23rd</div><div>---------------</div><div><br>
</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>- Clean up test scripts</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>- Throughly document test scripts</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>- Fix minor bugs</div>
<div><span class="Apple-tab-span" style="white-space:pre"> </span>- Continue to consider and test edge cases</div><div><br></div><div>Aug 26th - 30th</div><div>---------------</div><div><br></div><div><span class="Apple-tab-span" style="white-space:pre"> </span>- Change documentation to reflect additional features</div>
<div><br></div><div><br></div><div>The weeks of September 1st and 8th are left blank for flexibility.</div><div><br></div><div><br></div><div>Possible projects if the above are accomplished ahead of schedule:</div><div>==================================================================</div>
<div><br></div><div> - Detect if disk(s) on a server are in a near fail state. If the disk(s)</div><div> are close to failing, notify the administrator, and slowly begin</div><div> redistributing shares to the other storage nodes (tickets #481 and #864).</div>
<div><br></div><div> - Let the user specify a maximum storage capacity for a given storage node</div><div> based on folder size instead of free space left on the machine.</div><div><br></div><div> - Tahoe backend for Google Drive (ticket #1831).</div>
<div><br></div></div>