[tahoe-lafs-trac-stream] [Tahoe-LAFS] #2885: Magic-folder status API often fails to display queued upload operations

Tahoe-LAFS trac at tahoe-lafs.org
Thu Jun 22 05:40:20 UTC 2017


#2885: Magic-folder status API often fails to display queued upload operations
--------------------------+---------------------------
 Reporter:  cypher        |          Owner:
     Type:  defect        |         Status:  new
 Priority:  normal        |      Milestone:  undecided
Component:  unknown       |        Version:  1.12.1
 Keywords:  magic-folder  |  Launchpad Bug:
--------------------------+---------------------------
 While working on an application that uses the magic-folder status API (in
 this case, a monitor module for [http://gridsync.io Gridsync] that
 displays and updates the total progress and time remaining of an in-
 progress magic-folder sync operation), I've noticed that the status API
 will often fail to display entries for queued operations and/or drop
 previously-displayed entries from its output.

 To demonstrate what I mean by this, I have a magic-folder already
 configured and a simple python script that `pprint()`s the JSON output of
 the magic-folder status API..

 To start, I create five 100M junk files inside the local magic-folder
 directory (by `dd`ing `/dev/urandom` into `file_1`, `file_2`, and so on),
 simulating the behavior of a user that might dump a handful of large-ish
 files into a magic-folder.

 Later, I send an authenticated POST request to magic-folder status API and
 get back the following output:

 {{{
 [{'kind': 'upload',
   'path': 'file_2',
   'percent_done': 0,
   'queued_at': 1498095903.754096,
   'status': 'queued'},
  {'kind': 'upload',
   'path': 'file_3',
   'percent_done': 0,
   'queued_at': 1498095913.339961,
   'status': 'queued'},
  {'kind': 'upload',
   'path': 'file_4',
   'percent_done': 0,
   'queued_at': 1498095922.907665,
   'status': 'queued'},
  {'kind': 'upload',
   'path': 'file_5',
   'percent_done': 0,
   'queued_at': 1498095932.553877,
   'status': 'queued'},
  {'kind': 'upload',
   'path': 'file_1',
   'percent_done': 54.37499999999999,
   'queued_at': 1498095893.985629,
   'started_at': 1498095894.420489,
   'status': 'started'}]
 }}}

 This is exactly the output that I'd expect to see: `file_1` is just over
 half-way uploaded and `file_2` through `file_5` are sitting in the
 `queued` state (i.e., pending upload). If I know the filesize of each file
 (and, in this case, I do!) I can easily calculate and display the total
 progress of the overall sync operation (10.9%!) and make rough estimates
 as to how long the process as a whole will take based on the
 speed/duration of the upload as I continue to monitor it. Perfect!

 Moments later, however, I notice something strange: After `file_1`
 finishes and `file_2` begins uploading, `file_3` through `file_5`
 disappear from the status output altogether; I see only this:

 {{{
 [{'kind': 'upload',
   'path': 'file_2',
   'percent_done': 2.875,
   'queued_at': 1498095903.754096,
   'started_at': 1498096006.468258,
   'status': 'started'},
  {'kind': 'upload',
   'path': 'file_1',
   'percent_done': 100.0,
   'queued_at': 1498095893.985629,
   'started_at': 1498095894.420489,
   'status': 'success',
   'success_at': 1498096006.467593}]
 }}}

 If I go by what the status API tells me alone, I might think that those
 files have disappeared (and if I hadn't seen the previous output, I
 wouldn't even know that `file_3` through `file_5` exist!), however, a
 quick look into my local magic-folder directory reveals that they're still
 sitting there:

 {{{
 dev ~ % ls -al Magic-Folder
 total 501M
 drwxr-xr-x  2 user user 4.0K Jun 21 21:45 .
 drwxr-xr-x 64 user user 4.0K Jun 21 22:39 ..
 -rw-r--r--  1 user user 100M Jun 21 21:44 file_1
 -rw-r--r--  1 user user 100M Jun 21 21:45 file_2
 -rw-r--r--  1 user user 100M Jun 21 21:45 file_3
 -rw-r--r--  1 user user 100M Jun 21 21:45 file_4
 -rw-r--r--  1 user user 100M Jun 21 21:45 file_5
 }}}


 Shortly after, I check again and see that `file_3` is now uploading.
 `file_4` and `file_5`, however, are still absent:

 {{{
 [{'kind': 'upload',
   'path': 'file_3',
   'percent_done': 16.625,
   'queued_at': 1498095913.339961,
   'started_at': 1498096121.977303,
   'status': 'started'},
  {'kind': 'upload',
   'path': 'file_2',
   'percent_done': 100.0,
   'queued_at': 1498095903.754096,
   'started_at': 1498096006.468258,
   'status': 'success',
   'success_at': 1498096121.977229},
  {'kind': 'upload',
   'path': 'file_1',
   'percent_done': 100.0,
   'queued_at': 1498095893.985629,
   'started_at': 1498095894.420489,
   'status': 'success',
   'success_at': 1498096006.467593}]
 }}}

 I wait a few moments again and then I see this:

 {{{
 {'kind': 'upload',
   'path': 'file_2',
   'percent_done': 0,
   'queued_at': 1498096313.364319,
   'status': 'queued'},
  {'kind': 'upload',
   'path': 'file_1',
   'percent_done': 0,
   'queued_at': 1498096313.364558,
   'status': 'queued'},
  {'kind': 'upload',
   'path': 'file_4',
   'percent_done': 0,
   'queued_at': 1498096313.364875,
   'status': 'queued'},
  {'kind': 'upload',
   'path': 'file_3',
   'percent_done': 0,
   'queued_at': 1498096313.365039,
   'status': 'queued'},
  {'kind': 'upload',
   'path': 'file_4',
   'percent_done': 88.75,
   'queued_at': 1498095922.907665,
   'started_at': 1498096236.681625,
   'status': 'started'},
  {'kind': 'upload',
   'path': 'file_3',
   'percent_done': 100.0,
   'queued_at': 1498095913.339961,
   'started_at': 1498096121.977303,
   'status': 'success',
   'success_at': 1498096236.681511},
  {'kind': 'upload',
   'path': 'file_2',
   'percent_done': 100.0,
   'queued_at': 1498095903.754096,
   'started_at': 1498096006.468258,
   'status': 'success',
   'success_at': 1498096121.977229},
  {'kind': 'upload',
   'path': 'file_1',
   'percent_done': 100.0,
   'queued_at': 1498095893.985629,
   'started_at': 1498095894.420489,
   'status': 'success',
   'success_at': 1498096006.467593}]
 }}}

 For some reason, `file_1` through `file_4` now have ''two'' entries each.
 I haven't modified these files since their creation so I'm not sure why
 magic-folder suddenly wants to upload them again (''if'' that's what it's
 planning on doing) but I assume, for now, that they'll go away later --
 maybe after `file_5` re-appears and completes its upload..

 Checking later, I see that `file_5` is now back -- and the total operation
 progress can be calculated with confidence at 85.1%:

 {{{
 [{'kind': 'upload',
   'path': 'file_2',
   'percent_done': 0,
   'queued_at': 1498096313.364319,
   'status': 'queued'},
  {'kind': 'upload',
   'path': 'file_1',
   'percent_done': 0,
   'queued_at': 1498096313.364558,
   'status': 'queued'},
  {'kind': 'upload',
   'path': 'file_4',
   'percent_done': 0,
   'queued_at': 1498096313.364875,
   'status': 'queued'},
  {'kind': 'upload',
   'path': 'file_3',
   'percent_done': 0,
   'queued_at': 1498096313.365039,
   'status': 'queued'},
  {'kind': 'upload',
   'path': 'file_5',
   'percent_done': 25.5,
   'queued_at': 1498095932.553877,
   'started_at': 1498096351.210107,
   'status': 'started'},
  {'kind': 'upload',
   'path': 'file_4',
   'percent_done': 100.0,
   'queued_at': 1498095922.907665,
   'started_at': 1498096236.681625,
   'status': 'success',
   'success_at': 1498096351.209893},
  {'kind': 'upload',
   'path': 'file_3',
   'percent_done': 100.0,
   'queued_at': 1498095913.339961,
   'started_at': 1498096121.977303,
   'status': 'success',
   'success_at': 1498096236.681511},
  {'kind': 'upload',
   'path': 'file_2',
   'percent_done': 100.0,
   'queued_at': 1498095903.754096,
   'started_at': 1498096006.468258,
   'status': 'success',
   'success_at': 1498096121.977229},
  {'kind': 'upload',
   'path': 'file_1',
   'percent_done': 100.0,
   'queued_at': 1498095893.985629,
   'started_at': 1498095894.420489,
   'status': 'success',
   'success_at': 1498096006.467593}]
 }}}

 For whatever it's worth, I see that the `queued_at` time for my `file_5`
 upload is earlier than that of the second `file_1` upload and yet, despite
 being temporally prior, it failed to remain present when the second
 `file_1` operation appeared. More interestingly, the `queued_at` value for
 `file_5` is the same as what it was when I originally saw it (i.e., just
 before it disappeared from the JSON output) leading me to believe that
 magic-folder "knew" about the `file_5` operation all along but, for
 whatever reason, stopped displaying it in the JSON output along the way.

 With all that being said, given this behavior, the magic-folder status API
 seemingly cannot be relied upon to provide accurate information regarding
 future/pending upload operations (while my goal of providing a single
 status indicator for a set of upload/download operations seemingly can't
 be achieved without resorting to extremely wasteful measures); without
 catching that brief early moment when all of the queued files are
 faithfully present (i.e., before they disappear from the JSON output),
 there appears to be no simple or direct way of knowing which files magic-
 folder is planning to upload until that upload actually begins.

 Ideally, this would work in a more reliable fashion: when magic-folder's
 filesystem monitor detects a local file change, the subsequent upload
 operation should be added to whatever queue the status API gathers its
 information from and remain there indefinitely. Having entries disappear
 and reappear requires that any applications relying on this API perform
 additional and ongoing checks to assess the accuracy of the information
 being provided (for example, having to check and see whether `file_5`
 ''actually'' disappeared from the disk).

--
Ticket URL: <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/2885>
Tahoe-LAFS <https://Tahoe-LAFS.org>
secure decentralized storage


More information about the tahoe-lafs-trac-stream mailing list