See also OneHundredYearCryptography. = Goals for new filecaps = This is a place to record desiderata for the next version of our mutable/immutable filecaps. Many of the design requirements are spread out across separate tickets: this page is here to consolidate them. We should not release a new filecap format without checking it against everything on this list. There will be a related pair of new encoding designs/protocols (or equivalent; it has also been suggested that a single protocol could support both mutable and immutable files). The NewImmutableEncodingDesign and NewMutableEncodingDesign pages will hold those design discussions. Ticket #432 was the starting point: it contained a list of features. [query:keywords~=newcaps|newurls All tickets tagged 'newcaps' or 'newurls'] ([[TicketQuery(keywords~=newcaps|newurls, count)]]) == make them real URIs == Kevin Reid points out that the Tahoe calls URIs are not actually URIs (in the established sense). To make them real, we need to: * make them start with {{{x-tahoe:}}} or {{{tahoe:}}} (or {{{lafs:}}}), register {{{tahoe:}}} with IANA (#418) (#683) * understand how URI/URL/URNs are built, decide about hierarchical segments vs non-hierarchical segments. What's magical about a leading double-slash? Do we need one? {{{#!comment Edited chat excerpt from #tahoe-lafs on irc.freenode.org regarding above point: [7:52pm] zooko: havent heard of new-caps, what are those or will those be? [7:54pm] Zarutian: http://tahoe-lafs.org/trac/tahoe-lafs/wiki/NewCapDesign [7:58pm] zooko: I can answer the question in second point in the section "make them real URIs": the leading // arent required in many uri schemes, it was used to indicate if the uri is hierchical or not. [7:58pm] right, but we want the URIs to be hierarchical [7:58pm] davidsarah: really? [7:58pm] that's necessary in order for relative URI resolution to work correctly [7:59pm] I thought // meant that the thing that comes after // is the "authority" for the rest of it which comes after the next /. [7:59pm] e.g. you have a web page with a lafs:// uri as its base, and you want relative links in that page to work [7:59pm] Would you say that there is an "authority"? Like if that part is a gateway address+port num, or a grid id? [7:59pm] What about if the cap is the "authority" and then the path from that cap is the "rest of it". :-) [8:00pm] That matches my model of the world fairly nicely. :-) [8:00pm] whether there is an authority is less important than the resolution algorithm [8:00pm] davidsarah: really? I thought tahoe URI were like paths through linked namespaces in Gnosis/KeyKos/Eros/Capros [8:00pm] URI resolution is purely syntactic [8:00pm] » davidsarah finds the relevant spec [8:01pm] http://tools.ietf.org/html/rfc3986#section-5 [8:01pm] zooko, hmm, perhaps you should talk with Jonathan Rees about that. He knows a lot about URIs, their intent, and their interpretation. [8:04pm] "As relative references can only be used within the context of a hierarchical URI, designers of new URI schemes should use a syntax consistent with the generic syntax's hierarchical components unless there are compelling reasons to forbid relative referencing within that scheme." [8:04pm] in http://tools.ietf.org/html/rfc3986#section-1.2.3 [8:04pm] You don't need an authority in order for a URI to have a hierarchical path component. [8:04pm] you do need // though, iirc [8:05pm] No, I don't think so. [8:05pm] "If a URI does not contain an authority component, then the path cannot begin with two slash characters ("//")." [8:05pm] in section 3.3 [8:06pm] URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ]; hier-part = "//" authority path-abempty / path-absolute / path-rootless / path-empty [8:06pm] Oh, excuse me. When you said `you do need //', did you mean `you do need // for an authority', or `you do need // for a hierarchical path component'? [8:06pm] » zooko reads http://labs.apache.org/webarch/uri/rfc/rfc3986.html#authority [8:06pm] what is an authority component in this context? user:pass@ before the hostname? [8:06pm] Zarutian, the user info, hostname, and port number. [8:07pm] Everything between the // and the first / of the path, if any. [8:07pm] right, usually (although it's strictly speaking scheme-specific) [8:08pm] So, this makes it sound like you can have paths without an authority: http://labs.apache.org/webarch/uri/rfc/rfc3986.html#path [8:08pm] Although, FWIW, I still kind of favor letting the cap itself occupy the "authority" slot... [8:08pm] zooko: yurl style? [8:08pm] zooko, see Section 3 `Syntax Components', from which I just quoted a grammar excerpt. [8:08pm] » davidsarah was sure that hierarchical uris required // (and therefore an authority), but is having difficulty finding where it says that [8:08pm] Zarutian: yeah, now that you mention it, a bit like! [8:09pm] Interesting. :-) [8:09pm] Riastradh: yes, I was just looking at Section 3 -- what about it? [8:09pm] zooko, the hier-part production rule includes options with paths and no authority. [8:10pm] hmm, maybe this changed since RFC 2396 [8:10pm] » davidsarah checks that [8:10pm] yes, it did [8:10pm] http://tools.ietf.org/html/rfc2396#appendix-A [8:11pm] an RFC 2396 hierarchical URI requires an authority; an RFC 3986 one does not [8:12pm] That's not what I see in RFC 2396, davidsarah. absoluteURI = scheme ":" ( hier_part | opaque_part ); hier_part = ( net_path | abs_path ) [ "?" query ]; abs_path = "/" path_segments. [8:13pm] oh, right, I misread it [8:13pm] you're correct [8:16pm] OTOH, if the URI includes a grid id, that would effectively be an authority [8:17pm] Funny thing, "authority". :-) I would sort of make gateway and cap the authority. [8:17pm] That sounds reasonable to me, provided that the `grid id' is in a sufficiently global namespace such as the DNS. [8:17pm] But not grid id. [8:17pm] grid id determines if your attempt to access it will succeed, but does not determine what contents you'll find on a read or whether you can write. [8:17pm] That's all determined by the cap. [8:18pm] this should be discussed on an archived medium! [8:19pm] Yes... [8:21pm] Namely #683, I think. [8:21pm] #683 (handle arbitrary URIs in directories) [8:21pm] http://tahoe-lafs.org/trac/tahoe-lafs/ticket/683 [8:21pm] Zarutian: would you be willing to update #683 to reflect the conversation from this channel? [8:33pm] should be #432 I think [8:33pm] #432 (writing down filecaps: revise URI scheme) [8:33pm] http://tahoe-lafs.org/trac/tahoe-lafs/ticket/432 [8:33pm] see http://tahoe-lafs.org/trac/tahoe-lafs/query?status=!closed&keywords=~newurls for all tickets related to new URLs }}} * according to #683, a URI '''identifies''' a resource, but does not necessarily provide enough information to actually access it (i.e. if you have a URI and somebody pointed you at a file, you could confidently tell them whether or not it was the right file, but if you only have the URI, then you might not be able to find the file without additional information). If the cap has both identifying and location information, it's called a URL. * Tahoe filecaps are meant to be URLs (they are intended to provide location information), but to really make that work, you also need to define which grid you're talking about. So far this has always been implicit, but that has caused us problems. #403 talks about making an explicit "gridid" and would provide a procedure to get from a gridid string to a set of storage servers. The existing tahoe codebase could use the introducer FURL as a gridid, if there were a good place to put it in the filecap (#683 touches on this). * from the point of view of a web browser, you also need a gateway service (the Tahoe client node with a webapi frontend). The tahoe URLs that we've been passing around so far always reference one of these, either by assuming that {{{http://localhost:8123}}} is a suitable gateway or by explicitly referencing an external gateway like testgrid.allmydata.org (with deleterious effects on security and availability). I hope that our new filecaps are defined independently of a webapi gateway used to access them, and that we have a clear procedure for starting with a filecap and a gateway HTTP URL, and ending with the contents of the file. == Make them shorter, prettier, and easier to use == * Short and not so ugly. This is important to enable cut-and-paste (see below), but also just because people are suspicious and averse to long and ugly URLs. See #882 for notes in which dozens of people have spontaneously complained about the current URLs. By contrast, tiny URLs such as tinyurl.com, bit.ly, etc. are ubiquitous nowadays; users have no problem with those -- see Twitter. * I (warner) am curious about where the suspicion comes from. Do long URLs make people think they're being attacked, some sort of browser buffer overrun thing? Or that they're being phished, with a URL that a human would evaluate differently than their browser? I agree that people (including me) don't like long URLs, but I've never pushed anyone to explain the "suspicion" aspect. One comment in #217 says "smells a bit spammy", and a later one says "Spooks me every time". * It's likely because it's difficult for a human to verify there isn't hidden information in there, or a hidden URL, that they're sending out or visiting that they therefore can't anticipate or intelligently control. When people see a long hex string, perhaps it represents information that the person crafting it wants to hide from the person using it. I totally understand the skepticism; however, in this case there's nothing to be done, I think. -midnightmagic * Enable convenient cut-and-paste. If caps are too long they'll wrap in email. If they contain lots of word-breaking characters then you have to drag after you've double clicked (this is probably ok). If the word-broken sections are small and at the beginning or end then you have to be very precise about that drag. The best design would be a single short non-word-breaking string. The next best will be to have a large non-word-breaking string at the start and end, with smaller segments (if necessary) in the middle. Note that {{{tahoe:}}} is an easy target, but {{{x-tahoe:}}} is not (you'd have to double-click on the "x"). * Usable in a browser. Specifically, it should be easy to actually use a filecap that you get in email or IM, and many email/IM clients will look for http URLs and make them clickable. If tahoe filecaps start with {{{http:}}}, then they'll be made clickable. This is at odds with the IANA-friendly {{{tahoe:}}} prefix. Clients may make {{{tahoe:}}} URIs clickable too (I've seen them make other letters-then-colon strings clickable, even when the letters are not "http"), so perhaps a reasonable solution is to provide an OS-level URI handler for the {{{tahoe:}}} scheme, which could embed the filecap in an http URL and submit it to a webbrowser (i.e. when you click on {{{tahoe:foo}}}, a helper program is launched with {{{tahoe:foo}}}, and that in turn launches your web browser with {{{http://localhost:8123/foo}}}). (#52) == make them long enough to be secure == We want filecaps to be as short as possible, but no shorter. There are several lower bounds on the length: * confidentiality: A large computing effort should not be able to obtain the plaintext of a tahoe file without knowing the readcap. We require reasonable margin against improvements in hardware speed and organizational efficiency/motivation of distributed efforts (e.g. could a million PS3 owners break a filecap?). This currently implies a 128 bit confidentiality field. * integrity: a large computing effort should not be able to produce shares which will be accepted by the readcap holder but which do not result in the same file as created the original uploader (and retrieved by other downloaders). We desire all three of the standard hash properties (collision resistance, first-pre-image resistance, second-pre-image resistance) to also apply to tahoe immutable files and their filecaps. This currently implies a 128bit (or 256bit?) integrity field. * variable-length integrity field (#102, comment 16+17), allowing users to decide between short caps and strong integrity guarantees * storage collision resistance (#753): a Tahoe grid should be able to store trillions of files and still have a vanishingly small chance of two files using the same storage-index (and thus confusing each other's shares). The storage-index is generally compressed out of the filecap, by deriving it with various hashing stages on the other filecap parameters. The shortest value in this derivation chain must be at least 128bits long, and preferably about 192bits long. == other features == * Self-identifying. It should be visually clear what sort of filecap the string represents: read-write or read-only, mutable-or-immutable, file-or-directory. This is especially important when sharing tahoe objects over out-of-band channels like IM and email: it should be easy for the user to tell whether they're giving away readonly access or read-write access. We've considered prefixes like {{{DWM..}}} for "Directory Writeable Mutable" and {{{FRI..}}} for "File Readonly Immutable" (#102 comment 12). If these are jammed against the (base62) crypto bits it may be difficult to tell where the prefix ends and the crypto bits begin, especially because the crypto bits will be using the same character set ({{{FRIDWM...}}}). It might be a good idea to separate the type prefix from the cryptobits: {{{FRI-cryptobits}}} or {{{FRI/cryptobits}}}. * in addition, tahoe URIs should be distinguishable from local filenames by a CLI tool, so that {{{tahoe cp $CAP local/foo.txt}}} is unambiguous. (unfortunately, the current practice of using "tahoe:" as a default alias name collides with this badly, but perhaps if the new URIs include the double-slash, this won't be a problem: {{{tahoe cp tahoe://CAP local/foo.txt}}} copies from a specific URI, while {{{tahoe cp tahoe:blah local/foo.txt}}} copies from a child of the "tahoe:" alias). * I'd like to make it easy to layer uses on top of one another: since directories are just a specific way of interpreting the contents of a (mutable) file, let's make the directory cap be closely related to the underlying filecap. For example, if we end up using {{{tahoe://MR/cryptobits}}} to describe a read-only mutable file referenced by "cryptobits", then we could use {{{tahoe://D/MR/cryptobits}}} for the directory that uses it as a backing store. The rule would be that {{{tahoe://D/$A}}} would be handled by fetching {{{tahoe://$A}}} and then interpreting its contents as a directory structure. Then reading immutable-dirnodes (#607) would be trivial. Another way to think about this is that if our filecaps were verbose s-expressions, these caps could be expressed as "(readonly (mutable cryptobits))" and "(directory (readonly (mutable cryptobits)))". * provide for verifycaps, repaircaps, and traversalcaps (#308, #217). Repaircaps in particular may require a grant of storage authority, which might entail a cap format that can accept arbitrary extra non-hierarchical fields. Appendcaps or "drop-box" writecaps might fall into this same space. But remember that URIs should identify objects, not the action that you want to do on it: a webapi scheme may use a POST/PUT/DELETE method, or append a t=json adverb, or alternatively encode the verb/adverb into the HTTP url (think {{{GET .../filecap/json}}} or {{{PUT unlinked/ciphertext}}}), but these are independent of the underlying filecap. Exotic cap types do not need to be short. * provide ciphertext access. Reading from a verifycap should give you ciphertext. It should be possible to upload ciphertext directly. * provide for a grid-identifier, possibly on the MSB end, e.g. {{{tahoe://grid1234/IR/cryptobits}}}. Perhaps let some contexts define a "default grid id", such that {{{tahoe://IR/cryptobits}}} is expanded to mean {{{tahoe://grid1234/IR/cryptobits}}}. Something like {{{tahoe://grid1234/D/MR/cryptobits}}} should reference {{{tahoe://grid1234/MR/cryptobits}}}. (#403) * permit multiple encodings of the same file (same k, different N) to use each other's shares (#678, #711) * be derived from a hash of the plaintext to detect bugs in the decryption or erasure-decoding (#453) and possibly also to be able to easily give users a hash of the plaintext if they want it (#280) == Can caps have Forward Secrecy? == What would forward secrecy mean here?