#1719 new defect

Improve google search results for phrases like "tahoe file storage"

Reported by: amiller Owned by:
Priority: normal Milestone: undecided
Component: website Version: n/a
Keywords: transparency usability Cc:
Launchpad Bug:

Description

Tahoe-LAFS could benefit from some SEO.

If you search for "tahoe lafs", the first result is tahoe-lafs.org - straight to where you'd expect. However, if you search for tahoe secure file storage, tahoe secure, or other reasonable phrases (omitting "lafs"), the results are much less useful. The pycon talk notes tend to show up as the first result -they're filled with allmydata.org links that correctly redirect to https://tahoe-lafs.org, at least.

<zooko> I think we may be telling google not to index any of https://tahoe-lafs.org 
        with our robots.txt, which would be the first thing to change for that.
<zooko> There might be a ticket about the terrible anti-SEO.

Beyond that, perhaps by helping web crawlers access the site, we can benefit from the external search engines when searching for tickets, code, etc. (See #1691 for trac search delays)

Change History (4)

comment:1 Changed at 2012-04-13T15:09:10Z by zooko

I was wrong about robots.txt. https://tahoe-lafs.org/robots.txt currently says:

User-agent: *
Disallow: /trac/
Allow: /trac/tahoe-lafs/wiki/
Disallow: /source/
Disallow: /darcs.cgi/
Disallow: /buildbot
Crawl-Delay: 30

Which I think ought to allow search engines to inde the wiki. I don't know what else is needed to get search engines to give useful results to people making those sorts of services.

comment:2 Changed at 2012-05-09T19:08:36Z by zooko

Some of our content, such as https://tahoe-lafs.org/trac/tahoe-lafs/browser/docs/about.rst for example, is served up directly from the trac source browser. To let that stuff be indexable, at Tony Arcieri's suggestion, I removed the exclusion of trac from robots.txt. It now looks like this:

User-agent: *
Disallow: /source/
Disallow: /buildbot-tahoe-lafs
Disallow: /buildbot-zfec
Disallow: /buildbot-pycryptopp
Crawl-Delay: 60

This might impose too much CPU and disk-IO load on our server. We'll see.

comment:3 Changed at 2012-05-09T19:15:36Z by zooko

Brian pointed out that this might also clobber the trac.db, which contains cached information from darcs. Specifically, it caches the "annotate" results (a.k.a. "blame") from darcs. I don't know if it caches anything else.

It currently looks like this:

-rw-rw-r--  1 trac source 408165376 2012-05-09 19:13 trac.db

But "annotate"/"blame" has been broken ever since I upgraded the darcs executable from v2.5 to v2.8, so maybe nothing will get cached.

comment:4 Changed at 2012-05-31T21:55:22Z by warner

Looking at the HTTP logs, I'm seeing hits with the Googlebot UA happening a lot faster than every 60 seconds, e.g. 18 hits in a 4 minute period. The "Crawl-Delay" wasn't changed, though, so I'm wondering if maybe that's the wrong field name.

The site feels slower than it did a few months ago, but I don't have any measurements to support it.

The trac.db file today (2012-05-31) is currently at 567MB, up from 408MB in the last three weeks.

Note: See TracTickets for help on using tickets.