[tahoe-dev] [tahoe-lafs] #83: Extend external interfaces for operation monitoring.

Sat Dec 12 19:59:38 PST 2009

#83: Extend external interfaces for operation monitoring.
-------------------------+--------------------------------------------------
 Reporter:  nejucomo     |           Owner:  nejucomo 
     Type:  enhancement  |          Status:  new      
 Priority:  minor        |       Milestone:  undecided
Component:  code         |         Version:  0.6.1    
 Keywords:               |   Launchpad_bug:           
-------------------------+--------------------------------------------------

Comment(by zooko):

 So I definitely would have preferred the simplicity of using in-band
 progress indicators and cancellation as described in comment:1, but Brian
 persuaded me that this just wasn't good enough.  The part of his argument
 that I remember being unable to counter was that we have some operations
 that take longer than an HTTP connection can reliably last.  For example
 if you want to do a deep-verify-and-repair which is going to walk a large
 directory structure and download every bit of every share of every file
 and, if necessary, upload replacement shares.  This could take days or
 weeks or months, and if your control of the process is a single HTTP
 connection then you're quite likely to suffer a network glitch which
 closes your TCP connection or encounter some kind of stupid timeout in an
 HTTP proxy or something.

 (The way I like to think of this is that the comms abstraction of TCP is
 insufficiently robust -- there isn't a widely understood and implemented
 way to force your HTTP transaction to outlive temporary disconnections of
 the underlying TCP connection.  That means that HTTP, while a wonderful
 lingua franca for some protocols, can't be used for long-running
 operations or operations which cannot be cannot be safely retried when the
 first try might or might not have failed to get through.)

 So, Brian went ahead and invented "operation handles", documented here:
 [source:docs/frontends/webapi.txt at 4112#L203].

 Hm, reading those docs again, I see this new text:

 {{{
 Many "slow" operations can begin to use unacceptable amounts of memory
 when
 operation on large directory structures. The memory usage increases when
 the
 ophandle is polled, as the results must be copied into a JSON string, sent
 over the wire, then parsed by a client. So, as an alternative, many "slow"
 operations have streaming equivalents. These equivalents do not use
 operation
 handles. Instead, they emit line-oriented status results immediately.
 Client
 code can cancel the operation by simply closing the HTTP connection.
 }}}

 Oh dear, so it appears that neither the operation-handles nor the single
 HTTP connection is really good enough in all dimensions.  Hm.

 So what shall we do with this ticket?  I guess we'll close it as "fixed",
 and then maybe open a new ticket saying "Make operation-handle-querying
 use only a little memory" and maybe open a new ticket saying "Invent
 robust HTTP so that streaming operations handles can be used on operations
 that last longer than a TCP connection lasts".

 I'm not actually going to open either of those two tickets right now.  I
 just took painkillers for my knee (recuperating from surgery).

 If Brian, Nathan, or David-Sarah (or anyone) have any ideas on how to
 follow-up on this by all means post to the list or comment on this or some
 other ticket.

-- 
Ticket URL: <http://allmydata.org/trac/tahoe/ticket/83#comment:4>
tahoe-lafs <http://allmydata.org>
secure decentralized file storage grid