[tahoe-dev] Perforce backend served by tahoe.

Marc Tooley tahoe-devPOST at quake.ca
Mon Jun 29 12:20:01 PDT 2009


One of the upcoming features of Perforce which is available in the beta 
version (2009.1 beta) right now is the ability to override the backend 
storage of the versioned files with.. well pretty much anything you 
want, really. All you have to do is write a script which takes three 
arguments in the form of:

%op% %filename% %revision%

... and then reads in actual file data in stdin (or prints it out on 
stdout.)

Then, tell Perforce that the files you store should be fed through this 
script with its trigger mechanism.

I've written such a script, available here:

http://public.perforce.com:8080/guest/marc_tooley/trig/archive/tahoe_backend.pl

It seems to work very nicely, primarily because tahoe allows you to 
store arbitrary filenames with arbitrary paths without creating the 
directories first to store them. (It does that itself.) I can sync 
files, submit them.. everything I tested.

Note that this is not for the database/metadata files. Those are 
read/written in 8K pages and tahoe is not a suitable backend for that. 
But that doesn't really matter, because you can store the flat-text 
checkpoint files (metadata backup files) in tahoe and use it as the 
backup mechanism for those, too, so you'll never be behind more than 
your last checkpoint/backup cycle.

The script itself has no instructions, so here they are:

1. Set up your tahoe grid. Make sure the user you will be running 
Perforce as can run "tahoe ls" without difficulty and you have a root 
node/alias with a full-permissions directory cap.

2. Create a Perforce root directory. This is basically as simple as:

i) Download p4d and p4 for your platform.
ii) Run: "p4d -r. -d"

3. Set up your archive trigger with:

i) p4 triggers
ii) Underneath the word "Triggers:" add the following trigger:
    happyday archive //... "/path/to/tahoe_backend.pl %op% %file% %rev%"
    (Note there needs to be a tab at the beginning.)
iii) Save form and exit you editor.

4. Set up Perforce typemap:

i) p4 typemap
ii) Under the line "TypeMap:" put:
    +X //...
    (Note again the leading tab.)
iii) Save your form, and exit.

5. (OPTIONAL) Inside the script, if you want it to put it somewhere 
other than in your default "tahoe:" alias then change $tahoetop to your 
other alias. Don't forget the trailing colon.

Voila! Virtually an instant Perforce server running with a tahoe 
backend.

The script may be called on to store files that contain multiple 
revisions. Thus, inside the tahoe backend store are not the filenames 
themselves, but directories named after the filenames, containing files 
that are named after the revision. For example, when Perforce asks us 
to store:

//depot/path/to/filename#10

... we are actually dumping it into:

tahoe:/depot/path/to/filename/10

... and it turns out this works nicely.

Also nice (but likely unnecessary) is the additional double-check 
Perforce can do when verifying files. Perforce stores an MD5 sum 
per-file which can then be double-checked via the "p4 verify" command 
to ensure that the files are retrievable and untouched.

Further niceties include the fact that not only does Perforce make use 
of lazy copies, but tahoe does so, too, automatically on a per-file 
basis!

That is, when you branch in Perforce:

p4 integ a b
p4 submit -d "new file b"

... b doesn't exist. It knows that it's the same file as "a" and goes by 
itself off to fetch "b" from the real backend file "a" automatically.

Similarly, tahoe clients via convergence encryption can encrypt 
identical files identically, which means if ten developers all check-in 
the same file, only one of them actually exists on the backend! 
Normally in Perforce they'd all exist as separate files. (You don't 
even have to share around tahoe's convergence key for this, because the 
same client is doing all the encryption.)

WARNINGS: +X files are NOT proxied by the perforce proxy (P4P)! Also, 
this backend is quite slow anyway even without the proxy, just as tahoe 
is, for many small files. But really, who cares?! :-) Tahoe is awesome.



More information about the tahoe-dev mailing list