[tahoe-dev] Perforce backend served by tahoe.

Peter Secor secorp at allmydata.com
Mon Jun 29 13:24:20 PDT 2009


Very cool. I haven't used Perforce before - what size project are you 
using it for? It would be interesting to know how Tahoe will handle the 
"lots of small files" situation over time. The Allmydata production grid 
seems to be doing fine, but a source code control application will 
probably give it (Tahoe) a more specific workout in that area.

Ps

Marc Tooley wrote:
> One of the upcoming features of Perforce which is available in the beta 
> version (2009.1 beta) right now is the ability to override the backend 
> storage of the versioned files with.. well pretty much anything you 
> want, really. All you have to do is write a script which takes three 
> arguments in the form of:
> 
> %op% %filename% %revision%
> 
> ... and then reads in actual file data in stdin (or prints it out on 
> stdout.)
> 
> Then, tell Perforce that the files you store should be fed through this 
> script with its trigger mechanism.
> 
> I've written such a script, available here:
> 
> http://public.perforce.com:8080/guest/marc_tooley/trig/archive/tahoe_backend.pl
> 
> It seems to work very nicely, primarily because tahoe allows you to 
> store arbitrary filenames with arbitrary paths without creating the 
> directories first to store them. (It does that itself.) I can sync 
> files, submit them.. everything I tested.
> 
> Note that this is not for the database/metadata files. Those are 
> read/written in 8K pages and tahoe is not a suitable backend for that. 
> But that doesn't really matter, because you can store the flat-text 
> checkpoint files (metadata backup files) in tahoe and use it as the 
> backup mechanism for those, too, so you'll never be behind more than 
> your last checkpoint/backup cycle.
> 
> The script itself has no instructions, so here they are:
> 
> 1. Set up your tahoe grid. Make sure the user you will be running 
> Perforce as can run "tahoe ls" without difficulty and you have a root 
> node/alias with a full-permissions directory cap.
> 
> 2. Create a Perforce root directory. This is basically as simple as:
> 
> i) Download p4d and p4 for your platform.
> ii) Run: "p4d -r. -d"
> 
> 3. Set up your archive trigger with:
> 
> i) p4 triggers
> ii) Underneath the word "Triggers:" add the following trigger:
>     happyday archive //... "/path/to/tahoe_backend.pl %op% %file% %rev%"
>     (Note there needs to be a tab at the beginning.)
> iii) Save form and exit you editor.
> 
> 4. Set up Perforce typemap:
> 
> i) p4 typemap
> ii) Under the line "TypeMap:" put:
>     +X //...
>     (Note again the leading tab.)
> iii) Save your form, and exit.
> 
> 5. (OPTIONAL) Inside the script, if you want it to put it somewhere 
> other than in your default "tahoe:" alias then change $tahoetop to your 
> other alias. Don't forget the trailing colon.
> 
> Voila! Virtually an instant Perforce server running with a tahoe 
> backend.
> 
> The script may be called on to store files that contain multiple 
> revisions. Thus, inside the tahoe backend store are not the filenames 
> themselves, but directories named after the filenames, containing files 
> that are named after the revision. For example, when Perforce asks us 
> to store:
> 
> //depot/path/to/filename#10
> 
> ... we are actually dumping it into:
> 
> tahoe:/depot/path/to/filename/10
> 
> ... and it turns out this works nicely.
> 
> Also nice (but likely unnecessary) is the additional double-check 
> Perforce can do when verifying files. Perforce stores an MD5 sum 
> per-file which can then be double-checked via the "p4 verify" command 
> to ensure that the files are retrievable and untouched.
> 
> Further niceties include the fact that not only does Perforce make use 
> of lazy copies, but tahoe does so, too, automatically on a per-file 
> basis!
> 
> That is, when you branch in Perforce:
> 
> p4 integ a b
> p4 submit -d "new file b"
> 
> ... b doesn't exist. It knows that it's the same file as "a" and goes by 
> itself off to fetch "b" from the real backend file "a" automatically.
> 
> Similarly, tahoe clients via convergence encryption can encrypt 
> identical files identically, which means if ten developers all check-in 
> the same file, only one of them actually exists on the backend! 
> Normally in Perforce they'd all exist as separate files. (You don't 
> even have to share around tahoe's convergence key for this, because the 
> same client is doing all the encryption.)
> 
> WARNINGS: +X files are NOT proxied by the perforce proxy (P4P)! Also, 
> this backend is quite slow anyway even without the proxy, just as tahoe 
> is, for many small files. But really, who cares?! :-) Tahoe is awesome.
> 
> _______________________________________________
> tahoe-dev mailing list
> tahoe-dev at allmydata.org
> http://allmydata.org/cgi-bin/mailman/listinfo/tahoe-dev


More information about the tahoe-dev mailing list