Changes between Version 97 and Version 98 of GSoCIdeas2010


Ignore:
Timestamp:
2010-03-29T01:21:16Z (14 years ago)
Author:
zooko
Comment:

add Kevan's MDMF write-up

Legend:

Unmodified
Added
Removed
Modified
  • GSoCIdeas2010

    v97 v98  
    1 Tahoe-LAFS Summer-of-Code Projects
     1= Tahoe-LAFS Summer-of-Code Projects =
    22
    33This page contains specific suggestions for projects we would like to see in the Summer of Code. Note that they vary a lot in required skills and difficulty. We hope to get applications with a broad spectrum.
     
    1313||''Project''||''Difficulty''||''Contact''||
    1414||[#RedundantArrayofIndependentClouds Redundant Array of Independent Clouds]||Medium||[mailto:zooko@zooko.com Zooko Wilcox-O'Hearn] or any mentor||
     15||[#RedundantArrayofIndependentClouds Redundant Array of Independent Clouds]||Medium||[mailto:zooko@zooko.com Zooko Wilcox-O'Hearn] or any mentor||
    1516||[#ShareMigration Share Migration]||Medium||[mailto:warner-tahoe@lothar.com Brian Warner] or any mentor||
    1617||[#SecureDecentralizedWiki Secure Decentralized Wiki]||Medium||[mailto:zooko@zooko.com Zooko Wilcox-O'Hearn] or any mentor||
     
    2425
    2526
    26 = Redundant Array of Independent Clouds =
     27== Medium-Sized Distributed Mutable Files (MDMF) ==
     28
     29Mutable files in Tahoe-LAFS have some significant limitations and
     30performance issues, as discussed in
     31[http://allmydata.org/trac/tahoe-lafs/browser/docs/performance.txt docs/performance.txt]. Users who aren't aware of these limitations are
     32surprised when they find out that mutable files can't scale to large
     33sizes without using unacceptable levels of memory, and that reading one
     34byte of the file costs as much as reading the entire file.
     35
     36A fix for this issue would essentially be fixing #393. That is,
     37
     38  * Developing mutable files that are segmented on upload, as with immutable files. Part of this would involve making sure that the way we currently ensure the integrity of the parts of mutable files stored on servers is adequate for your new design, and altering it if it isn't.
     39  * Implementing efficient reading and writing of arbitrary spans of those mutable files.
     40
     41This would make Tahoe-LAFS less surprising to users, and allow mutable
     42files to be used in more ways than they currently are. If successful enough, this might allow Tahoe-LAFS to support range queries or "graph database"-style access, in the style of the "NoSQL" projects.
     43
     44To learn more about this issue, you should first read
     45[http://allmydata.org/trac/tahoe-lafs/browser/docs/performance.txt docs/performance.txt], so you're familiar with the performance problems
     46with mutable files as currently implemented. You should also look at the
     47[http://allmydata.org/trac/tahoe-lafs/browser/docs/specifications/file-encoding.txt file encoding specification], to understand how immutable files are
     48segmented (since you'll be doing something similar with this project). [http://allmydata.org/trac/tahoe-lafs/browser/docs/specifications/mutable.txt The mutable file specification] may be informative as well.
     49The mutable file upload and download code is in
     50[http://allmydata.org/trac/tahoe-lafs/browser/src/allmydata/mutable mutable],
     51and, for comparison, the immutable file upload and download code is in
     52[http://allmydata.org/trac/tahoe-lafs/browser/src/allmydata/immutable immutable].
     53
     54== Redundant Array of Independent Clouds ==
    2755
    2856Add backends to the storage servers so that they store their shares on a cloud storage system instead of on their local filesystem. This means that you can get all of the availability and scalability of services such as Amazon S3 or Rackspace !CloudFiles combined with the security properties of Tahoe-LAFS. See [http://allmydata.org/~zooko/RAIC.png the RAIC diagram]. For details read ticket #999 which including pointers to the relevant source code and instructions on how to begin writing the code.
    2957
    30 = Share Migration =
     58== Share Migration ==
    3159
    3260When uploading a file to a grid, Tahoe-LAFS will make sure that the file is
     
    6593jumping-off point for health is #778.
    6694
    67 = Secure Decentralized Wiki =
     95== Secure Decentralized Wiki ==
    6896
    6997Write a wiki in Google's [http://code.google.com/p/google-caja/ "caja"] dialect of !JavaScript. This wiki will load and store data directly on a Tahoe-LAFS storage grid so that it is a full "Cloud App"—there is no server. All computation is done in the user's web browser in caja and all of the storage is done by the decentralized Tahoe-LAFS storage grid. This wiki should leverage Tahoe-LAFS's secure sharing features to offer fine-grained, dynamic, and easy transclusion or client-side mashups. This project is intended to be the successor to [http://allmydata.org/trac/tiddly_on_tahoe the TiddlyWiki-on-Tahoe-LAFS project], which is a wiki written in !JavaScript and hosted on Tahoe-LAFS, but one that has been "bolted on" to Tahoe-LAFS instead of designed for Tahoe-LAFS, and is currently incapable of good transclusions or mashups.
     
    7199To get started, play with [http://testgrid.allmydata.org:3567/uri/URI:DIR2-RO:7h7syiurogz5erc2au74tjwguu:h7bdxvjtvidlkcdbld3j2d5sbgyzsbqs7wdnu6yznqrejzssc5za/wiki.html the TiddlyWiki-on-Tahoe-LAFS quick start], read the source code of [http://allmydata.org/trac/tiddly_on_tahoe/browser/tahoe_tiddly/HTTPSavingPlugin.js the HTTPSavingPlugin] and [http://allmydata.org/trac/tiddly_on_tahoe/browser/tahoe_tiddly/TahoePlugin.js the TahoePlugin] for !TiddlyWiki, and experiment with [http://caja.appspot.com/ writing live caja applets].
    72100
    73 = Cloud Apps =
     101== Cloud Apps ==
    74102
    75103Difficulty: easy to hard, depending on project choice and how far you want to push it
     
    77105Invent your own Summer-of-Code project by building a new web app on top of Tahoe-LAFS. The [#SecureDecentralizedWiki Secure Decentralized Wiki] is one example of a Cloud App. See [wiki:GSoCIdeas/CloudApps] for other ideas.
    78106
    79 = WebDAV Support =
     107== WebDAV Support ==
    80108
    81109Difficulty: medium to hard, depending on how much of an existing WebDAV implementation you are able to reuse
     
    128156[http://allmydata.org/trac/tahoe-lafs/query?status=!closed&order=priority&keywords=~webdav Tickets labelled 'webdav']
    129157
    130 
    131 = Distributed Introduction =
     158== Distributed Introduction ==
    132159
    133160Implement a protocol for distributed introduction, thus removing the only remaining Single Point of Failure (SPoF) in the Tahoe-LAFS system. For details see [comment:11:ticket:68 ticket #68] which describes the distributed notification algorithm and points to the relevant source code.
    134161
    135 = DVCS Integration =
     162== DVCS Integration ==
    136163
    137164Write patches for the [http://git-scm.com/ git] or [http://darcs.net darcs] distributed revision control tool so that it reads and writes directly to a Tahoe-LAFS storage grid instead of its local filesystem. This creates a "revision control repository in the sky"—a repository that is distributed, fault-tolerant, and highly available. It also lends Tahoe-LAFS's unique security and access-control properties to your revision control system—you can share read-only access or read-write access with specific people through Tahoe-LAFS's capability access control system, and you can rely on the integrated digital signatures to verify that you are reading an authorized version of the repository.