Here are notes that should be added to wikiGSoCIdeas in a format emulating [http://www.netbsd.org/contrib/soc-projects.html this GSoC page from NetBSD]. Leslie Hawthorn writes: "Currently there's only a laundry list of suggested ideas but there is not any specificity on those ideas of how students could get them done, areas for them to get started, etc. Each suggestion needs to be categorized by difficulty, there needs to be pointers to where in the code base or documentation people can look for a better idea of how to proceed, etc." (See also last year's page: [wiki:GSoCIdeas2009 Ideas For Google Summer of Code of 2009].) [http://code.google.com/soc Google Summer of Code] Students: you don't have to use one of the following Ideas. You can come up with your own Ideas, either inspired by these or your own Blue Sky idea. The most important thing is to e-mail the Mentor team (listed at the bottom of this page) saying that you are interested. = Ideas = ''What could a smart student do in one summer, if they didn't need to worry about getting a summer job to pay the bills?'' [http://allmydata.org/trac/tahoe-lafs/query?status=!closed&order=priority&keywords=~gsoc Trac tickets labelled 'gsoc'] (please add this label to any tickets that might make a good GSoC project). == Deep Security Issues == ''Want to implement strong security features which advance the state of the art? It isn't easy! To tackle these you'll need to think carefully and to integrate security and usability, which are two halves of the same coin. But you'll have excellent mentors and the support of a wide community of interested security hackers.'' * Fix Same-Origin-Policy design issue. Web content from different authors can interact in unintended ways in the victim's browser, such as !JavaScript peeking at other frames or referrer headers. Before this project is undertaken, the problem description and proposed solutions need careful design review and consideration! The solutions should be considered prototypes and should be backwards compatible with the Tahoe network. Main ticket: #615 (Can !JavaScript loaded from Tahoe access all your content which is loaded from Tahoe?) [http://allmydata.org/trac/tahoe-lafs/query?status=!closed&order=priority&keywords=~capleak Tickets labelled 'capleak'] * Domain Mangling approaches: * HTTP proxy approach * Special scheme handling in browser add-ons * [http://code.google.com/p/google-caja Caja] approach: Require all Javascript to pass the Caja verifier in the Tahoe-LAFS web frontend, then create an interface to the tahoe webapi which matches the intended capability semantics. * Tahoe-LAFS Cryptography: * Help us author a paper proving the security of the crypto that will be used to implement new shorter caps (such as the [NewCaps/WhatCouldGoWrong Elk Point protocol] or the "Semi-Private Key" construction from http://allmydata.org/~zooko/lafs.pdf ). [http://allmydata.org/trac/tahoe-lafs/query?status=!closed&order=priority&keywords=~newcaps Tickets labelled 'newcaps'] == WebDAV support == Difficulty: medium - hard [http://allmydata.org/trac/tahoe-lafs/query?status=!closed&order=priority&keywords=~webdav Tickets labelled 'webdav'] WebDAV is a set of extensions to HTTP, specified in [http://tools.ietf.org/html/rfc2518.html RFC 2518] and [http://www.ics.uci.edu/~ejw/authoring/ a few other documents], that allow it to be used as a filesystem access protocol. Supporting WebDAV in Tahoe would mean extending the [source:/src/frontends/webapi.txt webapi frontend] to implement this protocol. The main attraction of implementing a WebDAV interface is that several operating systems have bundled and somewhat integrated support for it, including Mac OS X, Windows, and some Linux distributions. In fact WebDAV may turn out to be an easier alternative to [http://en.wikipedia.org/wiki/Server_Message_Block SMB/CIFS] for allowing filesystem access from Windows. However, there is currently no working WebDAV implementation in Twisted Python. There used to be one (the {{{web2.dav}}} package), [http://twistedmatrix.com/trac/ticket/3081 but it bitrotted]. You'll have to decide whether to help fix that implementation, use a non-Twisted implementation such as [http://code.google.com/p/wsgidav/ WsgiDAV] that might be more difficult to integrate wth the existing Tahoe code, or write your own. In any case, WebDAV is a complicated protocol and you will need to decide what subset of it gives most "bang for the buck" and is practical to support in the time available. For example, locking is optional in the WebDAV spec; is it needed to interoperate with commonly used WebDAV clients? Unlike most filesystems which are constrained to be trees, the structure of a Tahoe is in general a cyclic graph. [http://tools.ietf.org/html/draft-ietf-webdav-bind draft-ietf-webdav-bind] is an Internet Draft that clarifies how WebDAV servers should handle cycles. [http://savannah.nongnu.org/projects/davfs2 davfs2] is a FUSE-based WebDAV filesystem client for Linux. To ensure that this runs correctly over your implementation of WebDAV, you'll probably need to adapt the tests for the existing Tahoe [source:contrib/fuse/impl_c/blackmatch.py "blackmatch"] FUSE interface (this would not be redundant since the blackmatch implementation has limitations that davfs2 would not). The [http://en.wikipedia.org/wiki/WebDAV#Microsoft_Windows WebDAV mini-redirector] is the component of Windows providing its WebDAV filesystem support. It is actually the less buggy of [http://www.zorched.net/2006/03/01/more-webdav-tips-tricks-and-bugs/ two implementations], but it still has had [http://greenbytes.de/tech/webdav/webdav-redirector-list.html bugs], and [http://www.microsoft.com/technet/security/bulletin/MS08-007.mspx security vulnerabilities] that you may need to take into account. == Server Selection == ''Which servers are connected to your client, and which of them have which shares of your files?'' === Dynamically migrate shares to maintain file health. === Difficulty: medium - hard When uploading a file to a grid, Tahoe-LAFS will make sure that the file is healthy (a good discussion of what healthy means is found in #778) before reporting that the file is uploaded successfully. Tools to effectively maintain file health (or to adapt to new definitions of health) aren't quite complete, however -- our users have had several use cases that aren't easily addressed with what we have. Students taking this project would be building tools to address those use cases. A good starting point would be to become familiar with how files are placed on a grid. [http://allmydata.org/trac/tahoe-lafs/browser/docs/architecture.txt architecture.txt], [http://allmydata.org/trac/tahoe-lafs/browser/docs/specifications/file-encoding.txt file-encoding.txt], [http://allmydata.org/trac/tahoe-lafs/browser/docs/specifications/mutable.txt mutable.txt], [http://allmydata.org/trac/tahoe-lafs/browser/src/allmydata/immutable/upload.py the immutable file upload code], and [http://allmydata.org/trac/tahoe-lafs/browser/src/allmydata/mutable/publish.py the mutable file upload code] are good places to do that. Also, you might want to look at the [http://allmydata.org/trac/tahoe-lafs/browser/src/allmydata/storage/server.py storage server code] to understand that better. Some good tickets to start looking at are #699, #543, and #232; you'll find that those link to other tickets. There are many ways to help address these issues. Some ideas: * Alter the CLI and the WUI to give users the ability to rebalance files that they've uploaded already. (#699) * Build tools that allow node administrators to moves shares around a grid (#543, #864) * Alter Tahoe-LAFS to rebalance mutable files when uploading a new version of them. (#232) (it is doubtful that any one of these projects is enough to fill a summer, but, combined, they would be a big usability improvement for Tahoe-LAFS) Depending on how you address this, this is tightly integrated with ideas of file health and accounting, so prospective students would do well to explore those open issues, too. A good accounting jumping-off point is #666. A good jumping-off point for health is #778. * Use Zeroconf or similar so nodes can find each other on a local network to enable quick local share migration. * Deal with unreliable nodes and connections in general, getting away from allmydata.com's assumption that the grid is a big collection of reliable machines in a colo under a single administrative jurisdiction. [http://allmydata.org/trac/tahoe-lafs/query?status=!closed&order=priority&keywords=~availability Tickets labelled 'availability'] * Abstract out the server selection part of Tahoe-LAFS so that the projects in this category of "grid membership and server selection" can be mostly independent of the rest of Tahoe-LAFS. See also [http://testgrid.allmydata.org:3567/uri/URI:DIR2-RO:j74uhg25nwdpjpacl6rkat2yhm:kav7ijeft5h7r7rxdp5bgtlt3viv32yabqajkrdykozia5544jqa/wiki.html#2009-02-06 this note about standardization of LAFS]. * Write a GUI to visualize and manipulate the set of servers connected and the set holding shares of files. == Networking Improvements == * Dealing with NAT, ideally making it as easy to ignore as possible (taking advantage of upnp-igd and Zeroconf NAT-PMP). [http://allmydata.org/trac/tahoe-lafs/query?status=!closed&order=priority&keywords=~firewall Tickets labelled 'firewall'] * 'tahoe sync'. Like dropbox (http://www.getdropbox.com/), the user could have a daemon which keeps the grid in sync with the local filesystem (maybe using inotify for uploads). * Optimize upload/download transfer speed. [http://allmydata.org/trac/tahoe-lafs/query?status=!closed&order=priority&keywords=~performance Tickets labelled 'performance'] * Implement storage server protocol over HTTP. #510 == Free The Windows Client == * Make the [http://allmydata.org/trac/tahoe-w32-client Windows client] use only free open-source software. (Implementing WebDAV as described earlier is an alternative that would achieve a similar effect.) == Connecting Tahoe-LAFS To Other Things == * Filesystem access: * improve the FUSE frontend ([source:contrib/fuse source code]). [http://allmydata.org/trac/tahoe-lafs/query?status=!closed&order=priority&keywords=~fuse Tickets labelled 'fuse'] * integrate Tahoe-LAFS with the GVFS Gnome virtual filesystem * Explore running a Tahoe-LAFS grid over [https://torproject.org Tor] or [https://i2p2.de I2P] to provide anonymity to servers and/or clients. * Rescue the neglected C client library [http://allmydata.org/trac/libtahoeclient_webapi libtahoeclient_webapi]. == Building Things On Top Of Tahoe == Difficulty: easy to hard, depending on project choice and how far you want to push it There are a lot of applications that could potentially make good use of Tahoe replacing the typical centralized storage of flat files or SQL databases. Currently supported projects include [http://www.tiddlywiki.com/ TiddlyWiki] (one of the Tahoe developers hosts his blog using [http://allmydata.org/trac/tiddly_on_tahoe TiddlyWiki stored in Tahoe]), [http://hadoop.apache.org/ Hadoop], and [RelatedProjects a number of others]. There are still many useful and interesting things that have yet to be built using Tahoe. Perhaps the most promising is in the area of web applications; what applications can you think of that could make use of a highly reliable filesystem accessible from both desktops and [ http://github.com/ctrlaltdel/TahoeLAFS-android handheld devices]? Keep in mind that Tahoe's architecture allows sharing and delegation opportunities that are difficult or impossible to implement using other backends. Some ideas people have suggested include a calender or photo album, or porting Mozilla's [https://bespin.mozilla.com Bespin] editor). Nathan Wilcox wrote most of interactive tree browser frontend in !JavaScript; what interesting ways might this be extended? This is in some ways the most interesting area for development as it combines security and distributed systems problems with providing a user interface that lets a person who isn't particularly security minded operate safely by default. This is a hard problem, but offers great rewards in terms of learning, and even the ability to break new ground in safe-by-default interface design. Required skills: HTML and !JavaScript for web applications. For other tie-ins, will depend on the base project (for instance porting the git DVCS to run on Tahoe would good C-fu, with git experience helpful). = Mentors = ''Who is willing to spend about five hours a week (according to Google) helping a student figure out how to do it right?'' [[br]] * [http://testgrid.allmydata.org:3567/uri/URI:DIR2-RO:j74uhg25nwdpjpacl6rkat2yhm:kav7ijeft5h7r7rxdp5bgtlt3viv32yabqajkrdykozia5544jqa/wiki.html Zooko O'Whielacronx] (core coding, Python/C/C++/JavaScript, cryptography) * [http://www.randombit.net Jack Lloyd] (C/C++/Python, cryptography) * David-Sarah Hopwood (david-sarah at jacaranda.org) (Python/C/JavaScript, SFTP frontend, security+cryptography)