wiki:GSoCIdeas2010

Version 50 (modified by zooko, at 2009-03-19T03:45:16Z) (diff)

edit: fix typo

Google Summer of Code

UPDATE: Tahoe was not selected to be sponsored by GSoC this summer. (They had something on the order of 400 applications and accepted only 116, so this isn't too surprising.) However, this isn't the end of our nefarious schemes for world domination! We have the Mentors (and some excellent Mentors too, which any student would be lucky to study under), we have the Ideas, we even have the interest of a few Students, at least one of whom has said he wants to hack on Tahoe this summer even if Google won't pay him to do it. So the next step is for me to see if I can arrange for some Tahoe Summer of Code hacking even without google's commercial sponsorship. Stay tuned! --Zooko 2009-03-18

Ideas

What could a smart student do in one summer, if they didn't need to worry about getting a summer job to pay the bills?

Server Selection

Which servers are connected to your client, and which of them have which shares of your files?

  • Dynamically migrate shares to maintain file health.
  • Use Zeroconf or similar so nodes can find each other on a local network to enable quick local share migration.
  • Deal with unreliable nodes and connections in general, getting away from allmydata's assumption that the grid is a big collection of reliable machines in a colo under a single administrative jurisdiction
  • Abstract out the server selection part of Tahoe so that the projects in this category of "grid membership and server selection" can be mostly independent of the rest of Tahoe. See also this note about standardization of LAFS.
  • Write a GUI to visualize and manipulate the set of servers connected and the set holding shares of files.

Networking Improvements

  • Dealing with NAT, ideally making it as easy to ignore as possible (taking advantage of upnp-igd and Zeroconf NAT-PMP).
  • 'tahoe sync'. The proposed #601 bidirectional sync option would be great for using tahoe as we would with dropbox (http://www.getdropbox.com/). Like the latter, the user could have a daemon which keeps things in sync in pollings within a one or two seconds schedule (maybe using inotify for uploads). In practical terms an user could have many machines pointing to the same tahoe:dir, each machine mapping this resource to a local directory, and all these machines could then have their local copies in sync, via tahoe:dir. I think this is good when someone has many machines and alternates use between them, like a notebook, a home desktop and an office desktop, for instance.
  • Optimize upload/download transfer speed.
  • Implement storage server protocol over HTTP.

Free The Windows Client

Deep Security Issues

Want to implement strong security features which advance the state of the art? It isn't easy! To tackle these you'll need to think carefully and to integrate security and usability, which are two halves of the same coin. But you'll have excellent mentors and the support of a wide community of interested security hackers.

  • Fix Same-Origin-Policy design issue. Web content from different authors can interact in unintended ways in the victims browser, such as Javascript iterating over open windows, or peeking at a referrer header. Before this project is undertaken, the problem description and proposed solutions need careful design review and consideration! The solutions should be considered prototypes and should be backwards compatible with the Tahoe network. tickets: #615 (Can JavaScript loaded from Tahoe access all your content which is loaded from Tahoe?)
    • Domain Mangling approaches:
      • HTTP proxy approach
      • Special scheme handling in browser add-ons
    • CAJA approach: Require all Javascript to pass the CAJA verifier in the Tahoe web frontend, then create an interface to the tahoe webapi which matches the intended capability semantics.
  • Tahoe Cryptography:
    • Help us author a paper proving the security of the "Semi-Private Keys" construction from http://allmydata.org/~zooko/lafs.pdf . Implement small, secure Tahoe capabilities based on Semi-Private Keys.

Building Things On Top Of Tahoe

  • an interactive tree browser web frontend in JavaScript (Nathan has written most of one -- what can it grow into?)
  • a blog-like web app (perhaps addressing tiddly wishlist items)
  • Extend and improve the tiddly_on_tahoe implementation.
  • Retarget the TiddlyWeb to use Tahoe as its backend storage.
  • Port another light-weight open source web app to Tahoe+javascript (calendar, photo album, Bespin).

Connecting Tahoe To Other Things

  • Help with the C client library libtahoeclient_webapi.
  • Explore running a Tahoe grid over Tor or I2P to provide anonymity to servers and/or clients.
  • Integrate Tahoe with the operating system kernel through FUSE. source code, mailing list thread, ticket: #36 (FUSE integration), #621 (Make automated fuse tests run against blackmatch.).
  • Integrate a distributed revision control tool such as darcs, git, bzr, mercurial or monotone with Tahoe so that there is a single distributed, secure revision control repository stored on a Tahoe grid. ticket #663

Mentors

Who is willing to spend about five hours a week (according to Google) helping a student figure out how to do it right?

Tasks Too Small To Be A Whole Project Unto Themselves

But they could perhaps be the starting point of a summer project -- i.e. get into the code by fixing this bug and then build a solid addition to this part of the system.

  • sshfs working properly in linux boxes. Yeah, my Fedora 9 isn't ok with trunk revision, it keeps showing me the same first level directories in any level :)
  • Shell friendly errors. When cli (the shell command tool) is failing, it would be good, for shell users, to have a nicer output in text format, not html/css. The latter could be kept for webgui errors only. ticket: #646 (CLI should report webapi errors better)