<div>I've revised my GSoC proposal to address the project of implementing the "upload strategy of happiness" after discussions from last week's dev chat. Once again, if<span style="background-color:rgb(255,255,255);color:rgb(34,34,34);font-family:arial,sans-serif;font-size:13px"> you see something wrong in my proposal, have any questions, or have any suggestions, please let me know. All feedback is very much appreciated. </span></div>
<div style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:13px;background-color:rgb(255,255,255)"><br></div><div style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:13px;background-color:rgb(255,255,255)">
Thanks!</div><div style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:13px;background-color:rgb(255,255,255)">Mark Berger</div><div><br></div><div><br></div>Organization: Tahoe-LAFS<br>=============<br><br>Student Info:<br>
=============<br><br>Mark J. Berger<br><br>Time Zone: Pacific<br><br>Time Zone during GSoC: Eastern<br><br>IRC Handle: <a href="mailto:Mark_B@irc.freenode.net">Mark_B@irc.freenode.net</a><br><br>Github: markberger<br><br>
Email: mjberger [at] <a href="http://stanford.edu">stanford.edu</a><br><br><br>University Info:<br>================<br><br>University: Stanford University<br><br>Major: Computer Science<br><br>Current Year: Freshman<br><br>
Expected Graduation: June 2016<br><br>Degree: BS<br><br><br>About Me:<br>=========<br><br>I'm a freshman at Stanford University studying computer science. Right now I am finishing up my core requirements and will be pursuing the artificial intelligence track or the systems track within the major. My interests lie in machine learning, large distributed systems, and web applications.<br>
<br>I began programming during an internship at Four Directions Productions in 2011, where I learned how to use Python in conjunction with Maya. The majority of my college coursework has been in C or C++ on linux with a little Java. This has made me familiar with tools such as GCC, GDB and Valgrind.<br>
<br>While I have never contributed to an open source project before, I am making an effort to learn about Tahoe-LAFS and become familiar with its code base and community. Using a virtual machine, I've successfully installed Tahoe on an Ubuntu server and connected to the Public Test Grid. I've also subscribed to the mailing list, connected to the IRC channel, and successfully pulled the code off of Github. While I know my lack of experience in open source is a short coming, I am completely dedicated to using GSoC's Community Bonding Period to overcome any obstacles before the official coding period begins.<br>
<br><br><br>Project Title: Upload Strategy of Happiness<br>===========================<br><br><br>Abstract:<br>=========<br><br>The "servers of happiness" algorithm has improved Tahoe's ability to maximize redundancy by ensuring a given subset of all shares are placed on distinct nodes. However, the share placement algorithm was not designed to pass the servers of happiness test [1]. The current algorithm satisfies the majority of cases, but it fails to satisfy multiple instances where happiness can be achieved (see tickets #1124 and #1130). Furthermore, the algorithm fails to take advantage of existing shares, replacing said shares instead of renewing their respective leases. Implementing the upload strategy of happiness detailed in Kevan Carstensen's master thesis would address these issues, as well as ease the development of share rebalancing and repair [2].<br>
<br><br>Deliverables:<br>=========<br><br>1. Static files are uploaded in accordance to the algorithm detailed in Kevan's master thesis, utilizing bipartite graphs to determine a maximum matching graph.<br><br>2. Various scripts which are used to test the new share placement algorithm on a network of virtual machines or a more suitable test environment.<br>
<br>3. A script to test whether the new placement algorithm meets Brian's performance desiderata (200 shares, 1000 servers, 1 second).<br><br>4. Change documentation to reflect the implementation of the new algorithm.<br>
<br> <br><br>Time Line:<br>==========<br><br>Note: I would like to have a code review session with my mentor on a weekly basis at minimum, especially at the beginning of the program. Those sessions are left off the time line to avoid redundancy<br>
<br><br>May 27th - June 17th (Community Bonding):<br>---------------------------------------------------------------<br><br>- Remain available via IRC and email<br><br>- Closely follow the development email list<br><br>- Isolate and understand the classes which pertain to the current implementations of the servers of happiness algorithm to determine which parts can be reused.<br>
<br>- Gain a greater understanding of the algorithm detailed Kevan's thesis, including the Edmonds-Karp algorithm used to find the maximum matching graph.<br><br>- Discuss with my mentor(s) and the community the best way to go about testing the new share placement algorithm.<br>
<br><br>Note: June 3rd through the 14th is my final exams period and I will be packing so that I can go home to Upstate NY. Since I will be very busy during this time, not all of the above may be accomplished in time to start coding. My classes do not resume until September 23rd, so I can push my time line back a week or two if need be.<br>
<br><br>Jun 17th - 28th<br>---------------------<br><br>- Implement a rough version of the upload strategy of happiness<br><br>- Upload strategy should be a separate class if possible in order to make it easier to apply to mutable files<br>
<br><br>Jul 1st - 12th<br>------------------<br><br>- Revise upload strategy code<br><br>- Throughly document initial upload strategy code<br><br>- Begin work on test scripts used to test the new algorithm<br><br><br>Jul 15th - 19th<br>
--------------------<br><br>- Clean up test scripts<br><br>- Throughly document test scripts<br><br>- Fix minor bugs<br><br><br>Jul 22nd - Aug 2<br>-----------------------<br><br>- Begin testing the placement algorithm using the test scripts<br>
<br>- Tackle bugs as they arise<br><br>- Discuss possible edge cases with Tahoe-LAFS community<br><br><br>Aug 5th - 16th<br>--------------------<br><br>- Change documentation to reflect new placement algorithm<br><br>- Create test cases for possible edge cases<br>
<br><br>Aug 19th - 30th<br>----------------------<br><br>- Address any test cases which arose from further testing<br><br>- Clean up documentation changes<br><br>- Continue testing to ensure the new algorithm can be merged into the next major release<br>
<br><br>The weeks of September 1st and 8th are left blank for flexibility.<br><br><br><br>Possible projects if the above are accomplished ahead of schedule:<br><br>=================================================<br><br>
- Change mutable files to use the same upload algorithm<br><br> - Detect if disk(s) on a server are in a near fail state. If the disk(s) are close to failing, notify the administrator, and slowly begin redistributing shares to the other storage nodes (tickets #481 and #864).<br>
<br> - Let the user specify a maximum storage capacity for a given storage node based on folder size instead of free space left on the machine.<br><br> - Tahoe backend for Google Drive (ticket #1831).<br><br><br>Link to Patch/Code Sample: <a href="https://github.com/tahoe-lafs/tahoe-lafs/pull/41">https://github.com/tahoe-lafs/tahoe-lafs/pull/41</a><br>
<br> <br><br>[1] "<a href="https://zooko.com/uri/URI%3ADIR2-RO%3Aoljrwy5i2t3dhcx5mzrksegehe%3Axtac4ubcnr5eqo6d7h4wyj5sm522olj4mthizz2i3lfw2b5nla6q/Latest/compsci/Carstensen-2011-Robust_Resource_Allocation_In_Distributed_Filesystem.pdf">https://zooko.com/uri/URI%3ADIR2-RO%3Aoljrwy5i2t3dhcx5mzrksegehe%3Axtac4ubcnr5eqo6d7h4wyj5sm522olj4mthizz2i3lfw2b5nla6q/Latest/compsci/Carstensen-2011-Robust_Resource_Allocation_In_Distributed_Filesystem.pdf</a>". Pages 32-33.<br>
<br>[2] <a href="https://tahoe-lafs.org/pipermail/tahoe-dev/2013-April/008216.html">https://tahoe-lafs.org/pipermail/tahoe-dev/2013-April/008216.html</a>