#1106 assigned defect

review #1037 (SFTP)

Reported by: zooko Owned by: zooko
Priority: major Milestone: soon
Component: code Version: 1.7.0
Keywords: sftp unfinished-business Cc:
Launchpad Bug:

Description

Per comment:23:ticket:1037, the patches that fixed #1037 need to be reviewed. I've reviewed about half of them. I'll start posting my notes.

Attachments (2)

sftp-comments.dpatch (9.8 KB) - added by davidsarah at 2010-07-12T03:16:33Z.
SFTP: address some of the comments in zooko's review (#1106).
sftp-no-trunc-files-opened-with-append.dpatch (3.9 KB) - added by davidsarah at 2010-07-12T03:17:09Z.
SFTP: refuse to truncate files opened with FXF_APPEND (see ticket:1037#comment:20).

Download all attachments as: .zip

Change History (10)

comment:1 Changed at 2010-06-29T02:27:38Z by zooko

  • Status changed from new to assigned

comment:2 follow-up: Changed at 2010-07-08T14:14:24Z by zooko

Review notes (the review is not complete)

David-Sarah: overall this is excellent work! It is readable and correct, with a few but not many parts that are un-idiomatic.

Things that need to be responded to:

  • Hey look sftpd.noisy is True. Is that intentional?
  • Maybe document this with "assert" (and check it, too, when asserts are compiled in): # invariant: self.download_size <= self.current_size
  • Is it our job to unregister any extant producer before self.producer=p in registerProducer()?
  • !! Is that a busy eventually loop in _iterate() ? Bad!
  • add doc for self.async. I guess it means download is finished.
  • add doc for _convert_error()
  • data[offset:min(offset+length, len(data))] is the same as data[offset:offset+length]
  • Why do we need this catch-all-exceptions? Because err.value is not always an exception? This is noisy to the reader of source code.
        try:
            if noisy: logmsg(traceback.format_exc(err.value), level=NOISY)
        except:  # pragma: no cover
            pass
    
  • "The message argument to SFTPError must not reveal information that might compromise anonymity." Huh? Who's anonymity might be revealed to whom?
  • The following TODO can be removed based on testing in the 1.7.0 development cycle (and feedback from 1.7.0 release) TODO: check that clients are okay with this being a "?". (They should be because the longname is intended for human consumption.)

Ideas that don't necessarily need to be responded to:

  • class OverwriteableFileConsumer could use unit tests
  • I think ".finish()" means the download is finished but local writes can proceed apace; docstring needed; Oh! I guess (based on the name) that it is defined by IFinishableConsumer. Maybe docstring that fact.
  • # TODO: use download interface described in #993 when implemented. I don't like "TODO" in comments in source code. They aren't usually noticed by the people who need to be reminded, and they often become stale (the thing no longer needs to be done, but nobody thinks to find the TODO and remove it.) I prefer issue tickets. This "TODO" should probably be deleted or moved to an issue ticket or maybe do the thing that it says to do. :-)
  • I find "eventually_callback(d)(expr)" harder to understand than "eventually(d.callback, expr)". (They do mean the same thing, right?) On the other hand I find self.async.addCallbacks(..., eventually_errback(d)) to be mor easily read than any alternative.
  • idea: encourage users to experiment with hacking their SIZE_THRESHOLD, or perhaps even better do some more systematic measurements of how long it takes to download an entire file vs. to read parts out of a larger file in order to choose a more optimized SIZE_THRESHOLD.

comment:3 in reply to: ↑ 2 Changed at 2010-07-09T20:49:59Z by davidsarah

Replying to zooko:

  • Hey look sftpd.noisy is True. Is that intentional?

Yes, at this point the noisy logging is still very useful. It doesn't have much of a performance impact I think, so I might just s/if noisy: //g.

  • Maybe document this with "assert" (and check it, too, when asserts are compiled in): # invariant: self.download_size <= self.current_size

The invariant is documenting that this condition is always true. That it's true at that point (actually, made true by the next two lines) is fairly obvious -- so I'm not sure an assert would really help, but I don't object to adding one.

  • Is it our job to unregister any extant producer before self.producer=p in registerProducer()?

No. According to the twisted API docs, a RuntimeError should be raised if a producer is already registered. I'll fix that.

  • !! Is that a busy eventually loop in _iterate() ? Bad!

It is. Why is this bad? The synchronous alternative would be:

while not self.is_done:
    p.resumeProducing()

which is worse wrt potentially starving concurrent activities (but that is what Twisted does in some of its consumers, and we do it in source:src/allmydata/util/consumer.py).

  • add doc for self.async. I guess it means download is finished.

No, see the doc comment for GeneralSFTPFile:

    I wrap an instance of OverwriteableFileConsumer, which is responsible for
    storing the file contents. In order to allow write requests to be satisfied
    immediately, there is effectively a FIFO queue between requests made to this
    file handle, and requests to my OverwriteableFileConsumer. This queue is
    implemented by the callback chain of self.async.

(There should be a similar comment for ShortReadOnlySFTPFile.) Is this unclear?

  • add doc for _convert_error()

Will do.

  • data[offset:min(offset+length, len(data))] is the same as data[offset:offset+length]

Oh, so it is. I was not aware of that aspect of Python slicing semantics (maybe because we would have made it raise an error ;-)

  • Why do we need this catch-all-exceptions? Because err.value is not always an exception? This is noisy to the reader of source code.
        try:
            if noisy: logmsg(traceback.format_exc(err.value), level=NOISY)
        except:  # pragma: no cover
            pass
    

traceback.format_exc was in practice raising exceptions in obscure cases (I think related to implicit unicode<->str conversion, but I forget the details). Rather than trying to make sure that it never raises an exception, I decided to suppress them. The traceback is very useful for debugging when it does get logged, so I'm loath to remove this code.

  • "The message argument to SFTPError must not reveal information that might compromise anonymity." Huh? Who's anonymity might be revealed to whom?

In case we are running over Tor or I2P, we shouldn't reveal information that would identify the node, as discussed in #1008.

  • The following TODO can be removed based on testing in the 1.7.0 development cycle (and feedback from 1.7.0 release) TODO: check that clients are okay with this being a "?". (They should be because the longname is intended for human consumption.)

Agreed.

Ideas that don't necessarily need to be responded to:

  • class OverwriteableFileConsumer could use unit tests

Yes, absolutely. It's difficult to test because it depends on the ordering of reads and overwrites relative to writes coming from the producer, but that could be made deterministic by using a custom test producer.

  • I think ".finish()" means the download is finished but local writes can proceed apace; docstring needed; Oh! I guess (based on the name) that it is defined by IFinishableConsumer. Maybe docstring that fact.

It is indeed defined by IFinishableConsumer. I'll add a docstring.

  • # TODO: use download interface described in #993 when implemented. I don't like "TODO" in comments in source code. They aren't usually noticed by the people who need to be reminded, and they often become stale (the thing no longer needs to be done, but nobody thinks to find the TODO and remove it.) I prefer issue tickets. This "TODO" should probably be deleted or moved to an issue ticket or maybe do the thing that it says to do. :-)

It is documented in ticket:993#comment:5, so I'll remove the TODO.

  • I find "eventually_callback(d)(expr)" harder to understand than "eventually(d.callback, expr)". (They do mean the same thing, right?)

They do. (They didn't in a previous version, where I had extra logging in eventually_callback, but that's not needed any more.) I will simplify the Schönfinkel'd calls.

On the other hand I find self.async.addCallbacks(..., eventually_errback(d)) to be more easily read than any alternative.

  • idea: encourage users to experiment with hacking their SIZE_THRESHOLD, or perhaps even better do some more systematic measurements of how long it takes to download an entire file vs. to read parts out of a larger file in order to choose a more optimized SIZE_THRESHOLD.

Will do (but first the tests will need to be changed to not be dependent on SIZE_THRESHOLD being 1000).

Changed at 2010-07-12T03:16:33Z by davidsarah

SFTP: address some of the comments in zooko's review (#1106).

Changed at 2010-07-12T03:17:09Z by davidsarah

SFTP: refuse to truncate files opened with FXF_APPEND (see ticket:1037#comment:20).

comment:4 follow-up: Changed at 2010-07-12T03:51:11Z by zooko

sftp-comments.dpatch looks good.

re: sftp-no-trunc-files-opened-with-append.dpatch : should we add a unit test?

comment:5 in reply to: ↑ 4 Changed at 2010-07-15T20:11:23Z by davidsarah

  • Keywords sftp added

Replying to zooko:

re: sftp-no-trunc-files-opened-with-append.dpatch : should we add a unit test?

This patch caused a regression in test_openFile_write. It needs more work.

comment:6 Changed at 2010-07-17T05:31:34Z by davidsarah

  • Keywords unfinished-business added

comment:7 Changed at 2010-07-18T02:22:38Z by davidsarah

  • Milestone changed from 1.7.1 to soon

sftp-comments.dpatch was applied in 15ddab08edebd7fe.

comment:8 Changed at 2011-12-31T14:05:09Z by zooko

  • Keywords review-needed removed
Note: See TracTickets for help on using tickets.