wiki:GSoCIdeas/Notes

Version 10 (modified by zooko, at 2010-03-17T04:03:57Z) (diff)

remove Building Things On Top of Tahoe -- it lives on wiki:GSoCIdeas (or its descendants do)

Here are notes that should be added to wikiGSoCIdeas in a format emulating this GSoC page from NetBSD. Leslie Hawthorn wrote: "Currently there's only a laundry list of suggested ideas but there is not any specificity on those ideas of how students could get them done, areas for them to get started, etc. Each suggestion needs to be categorized by difficulty, there needs to be pointers to where in the code base or documentation people can look for a better idea of how to proceed, etc." Later she said the wiki:GSoCIdeas page was a good improvement.

(See also last year's page: Ideas For Google Summer of Code of 2009.)

Deep Security Issues

Want to implement strong security features which advance the state of the art? It isn't easy! To tackle these you'll need to think carefully and to integrate security and usability, which are two halves of the same coin. But you'll have excellent mentors and the support of a wide community of interested security hackers.

  • Fix Same-Origin-Policy design issue. Web content from different authors can interact in unintended ways in the victim's browser, such as JavaScript peeking at other frames or referrer headers. Before this project is undertaken, the problem description and proposed solutions need careful design review and consideration! The solutions should be considered prototypes and should be backwards compatible with the Tahoe network. Main ticket: #615 (Can JavaScript loaded from Tahoe access all your content which is loaded from Tahoe?) Tickets labelled 'capleak'
    • Domain Mangling approaches:
      • HTTP proxy approach
      • Special scheme handling in browser add-ons
    • Caja approach: Require all Javascript to pass the Caja verifier in the Tahoe-LAFS web frontend, then create an interface to the tahoe webapi which matches the intended capability semantics.
  • Tahoe-LAFS Cryptography:

Server Selection

Which servers are connected to your client, and which of them have which shares of your files?

Dynamically migrate shares to maintain file health.

Difficulty: medium - hard

When uploading a file to a grid, Tahoe-LAFS will make sure that the file is healthy (a good discussion of what healthy means is found in #778) before reporting that the file is uploaded successfully. Tools to effectively maintain file health (or to adapt to new definitions of health) aren't quite complete, however -- our users have had several use cases that aren't easily addressed with what we have. Students taking this project would be building tools to address those use cases.

A good starting point would be to become familiar with how files are placed on a grid. architecture.txt, file-encoding.txt, mutable.txt, the immutable file upload code, and the mutable file upload code are good places to do that. Also, you might want to look at the storage server code to understand that better. Some good tickets to start looking at are #699, #543, and #232; you'll find that those link to other tickets.

There are many ways to help address these issues. Some ideas:

  • Alter the CLI and the WUI to give users the ability to rebalance files that they've uploaded already. (#699)
  • Build tools that allow node administrators to moves shares around a grid (#543, #864)
  • Alter Tahoe-LAFS to rebalance mutable files when uploading a new version of them. (#232)

(it is doubtful that any one of these projects is enough to fill a summer, but, combined, they would be a big usability improvement for Tahoe-LAFS)

Depending on how you address this, this is tightly integrated with ideas of file health and accounting, so prospective students would do well to explore those open issues, too. A good accounting jumping-off point is #666. A good jumping-off point for health is #778.

  • Use Zeroconf or similar so nodes can find each other on a local network to enable quick local share migration.
  • Deal with unreliable nodes and connections in general, getting away from allmydata.com's assumption that the grid is a big collection of reliable machines in a colo under a single administrative jurisdiction. Tickets labelled 'availability'
  • Abstract out the server selection part of Tahoe-LAFS so that the projects in this category of "grid membership and server selection" can be mostly independent of the rest of Tahoe-LAFS. See also this note about standardization of LAFS.
  • Write a GUI to visualize and manipulate the set of servers connected and the set holding shares of files.

Networking Improvements

Free The Windows Client

  • Make the Windows client use only free open-source software. (Implementing WebDAV as described earlier is an alternative that would achieve a similar effect.)

Connecting Tahoe-LAFS To Other Things