[tahoe-dev] Modifying the robots.txt file on allmydata.org

Zooko Wilcox-O'Hearn zooko at zooko.com
Wed Feb 24 10:38:18 PST 2010


On Wednesday, 2010-02-24, at 1:50 , David-Sarah Hopwood wrote:

> Allowing crawlers to index some of the dynamically generated pages  
> under /trac could cause horrible breakage, given darcs+trac's  
> performance problems. You'd have to look at what subsets of that  
> are sufficiently static.

The main thing to avoid is URLs that have "rev=XYZ" in them, like these:

http://allmydata.org/trac/tahoe-lafs/browser/setup.cfg?rev=3996
http://allmydata.org/trac/tahoe-lafs/browser/setup.cfg? 
annotate=blame&rev=3996

Those are asking darcs to reconstruct what a particular file or  
directory looked like at some point in the past, which is relatively  
expensive.

On the other hand the trac-darcs plugin caches the results of those  
in its sqlite db, so perhaps letting a spider laboriously crawl the  
whole thing is a way to fix the performance problems. :-)

Regards,

Zooko


More information about the tahoe-dev mailing list