[tahoe-dev] Log files

Brian Warner warner at allmydata.com
Mon Dec 10 12:01:27 PST 2007


> I've prepared some scripts to perform an installation of a grid on the
> university LAN; a friend of mine has been working on some php scripts
> to analyze the log file of each client and store the information on a
> database

Neat!

> We was wondering how to include in the log who's the receiver of  a
> particular share, because we need to track every shareholder for every
> upload operation

There are two approaches. The first is to put this information into the logs
(and then use your scripts to parse it out of the logs afterwards). You'll
want to add code to emit the list of shareholders (also known as "landlords")
who accepted shares during the upload process. The place to do this is in
src/allmydata/encode.py in the Encoder.done() method, almost at the end of
the file. The "self.landlords" attribute is a dictionary that maps from share
number (a small integer, usually 0 through 9), and the value is an
upload.PeerTracker instance which contains .peerid attribute (that is a
binary peerid). So the code you want is probably something like:

  for (shareid, tracker) in self.landlords.items():
    log.msg("uploaded %s:%d to peer %s" % (idlib.b2a(self._storage_index),
                                           shareid,
                                           idlib.b2a(tracker.peerid))

The second approach is to do this from the server side. The idea would be to
periodically walk the storage servers' filesystems and build up the sharemap
from the directory and filenames there. Each storage server has a directory
named storage/shares/, and there is a directory in that for each Storage
Index (one per file that's been uploaded, for both CHK files and the new
mutable files). The directory name is just the base32-representation of the
storage index, so it's possible to take the CHK URI and figure out what the
corresponding storage index is. Inside those directories, there is a file for
each share that is located on this server, named with a simple integer. For
example:

  162:warner at fluxx% ls storage/shares/
  15gxw3dt6yjxhwdpi9b3iq33rh/  iwdgtgyt1ju3p88kdck4bg4pma/
  64w1f1p1itxyajgtew4jtqcmdc/  nt6gnbh8fo9enxjkiyaws6grpy/
  7d11naa89wrktpcid7abq438hh/  qk1yf4rn46hdbpisiwfuxzcbnh/
  heuib3b4mudg1aszztd1cf7nma/  tr9zwmiyix1jyc7s15rgnydomc/
  incoming/                    w7q363dapjzf65k9z3t3xrqr1y/
  iugmwuefjnqqjpxy9jxqe9orso/  zt9bmwtf5pnaxadj93errcb4ry/

  163:warner at fluxx% ls storage/shares/15gxw3dt6yjxhwdpi9b3iq33rh/
  0  5

Ignore the incoming/ directory: that's used as a tempdir while shares are
being uploaded. Also, if you're using current HEAD or the release that's due
out this week, you might wish to ignore any mutable files (which contain
directories): to do that, you'll need to write a script that examines the
share files and ignores any that start with the words "Tahoe mutable
container v1". Mutable files contain that string as a magic number, whereas
regular CHK files do not (they begin with a single four-byte version number,
"\x00\x00\x00\x01").


You might find the following interesting: the new release this week will
contain code to make it easy to have your nodes all send their logs to a
central gatherer, which will then save them all to disk. This might make it a
bit easier to do this sort of centralized analysis.

hope that helps,
 -Brian


More information about the tahoe-dev mailing list