#1161 closed defect (fixed)

improve new-downloader logging

Reported by: zooko Owned by:
Priority: normal Milestone: 1.9.2
Component: code-network Version: 1.8β
Keywords: immutable download error logging Cc: warner
Launchpad Bug:

Description

Updating my blog, using tahoe-lafs v1.8.0b2, I got this incident report file, attached, which includes this line:

13:59:48.590 [1513]: WEIRD Error in Segmentation FAILURE:
[CopiedFailure instance: Traceback from remote host -- Traceback (most recent call last):
Failure: allmydata.interfaces.NoSharesError: no shares (need 3). Last failure: None
]
[INCIDENT-TRIGGER]

Attachments (2)

incident-2010-08-09-135948-6t3jtly.flog.bz2 (18.9 KB) - added by zooko at 2010-08-09T21:22:06Z.
logging.diff (10.2 KB) - added by warner at 2010-08-10T06:45:36Z.
fix maybe half the logging issues in this ticket

Download all attachments as: .zip

Change History (11)

comment:1 Changed at 2010-08-09T22:17:30Z by davidsarah

  • Keywords regression added

comment:2 Changed at 2010-08-09T22:52:35Z by warner

  • Keywords regression removed
  • Milestone changed from 1.8.0 to 1.9.0
  • Priority changed from critical to minor
  • Summary changed from WEIRD Error in Segmentation to improve new-downloader logging

Talking with Zooko, it sounds like this was a filecap from one grid that was linked into a directory from a different grid. The logs show an ordinary NoSharesError exception. So there's nothing critical to fix here.

The log events could be better, though.. here are some notes for post-1.8.0 about how they should be improved:

  • the ShareFinder loop messages show a build-up of received responses, followed by 750ms of successive removals from the pending= list. I think that pending= should be removed as the responses are received, and shouldn't take so long
  • The umid="wAlnHQ" in node.py:DownloadNode._start_new_segment needs a logparent
  • the umid="EYlXBg" in segmentation.py:Segmentation._error doesn't need to be log.WEIRD when it's due to NoSharesError: i.e. we don't need to make an Incident out of NoSharesError, using log.UNUSUAL is probably more appropriate
  • it'd be nice if the Node.read/_fetch_next/get_segment() logparent tree that included the Segmentation._error message were part of the same log tree as the main _Node/ShareFinder tree. Ideally the top-level call to FileNode.read would create a single log tree.

We should check that the NoSharesError made it out to the WUI or CLI, displayed properly (as something human-readable).

comment:3 Changed at 2010-08-10T00:01:01Z by davidsarah

  • Keywords error logging added

Changed at 2010-08-10T06:45:36Z by warner

fix maybe half the logging issues in this ticket

comment:4 Changed at 2010-08-10T06:48:55Z by warner

that patch adds logparents and cleans up some missing umids. It does not yet fix the log.WEIRD on NoSharesError. Also, the ShareFinder loop messages I mentioned are actually ok: I was misreading the events (if I'd looked at the dump in chronological order, instead of nested, it would have been obvious that the pending= list shrinks upon each receipt of a DYHB response).

So the log.WEIRD issue is the main one left. My patch rearranges some code, so it will definitely wait for after the 1.8.0 release before going in.

comment:5 Changed at 2011-08-29T06:42:48Z by warner

  • Milestone changed from 1.9.0 to 1.10.0

comment:6 Changed at 2012-04-01T05:15:32Z by davidsarah

  • Priority changed from minor to normal

comment:7 Changed at 2012-04-01T05:16:14Z by davidsarah

  • Keywords review-needed added
  • Milestone changed from 1.11.0 to 1.10.0

comment:8 Changed at 2012-05-13T04:44:40Z by warner

  • Keywords review-needed removed
  • Resolution set to fixed
  • Status changed from new to closed

The logging changes actually landed, with some other improvements, in 00e9e4e. Closing this out.

comment:9 Changed at 2012-05-14T04:58:34Z by davidsarah

  • Milestone changed from 1.10.0 to 1.9.2
Note: See TracTickets for help on using tickets.