[tahoe-lafs-trac-stream] [Tahoe-LAFS] #2431: Magic Folder: implement "stability delay" (was: drop-upload: implement "stability delay")

Thu Oct 29 02:25:33 UTC 2015

#2431: Magic Folder: implement "stability delay"
-------------------------------------+-------------------------------------
     Reporter:  daira                |      Owner:  daira
         Type:  enhancement          |     Status:  new
     Priority:  normal               |  Milestone:  undecided
    Component:  code-frontend-       |    Version:  1.10.0
  magic-folder                       |   Keywords:  stability reliability
   Resolution:                       |  integrity performance magic-folder
Launchpad Bug:                       |
-------------------------------------+-------------------------------------
Changes (by daira):

 * keywords:
     drop-upload, stability, reliability, integrity, performance, magic-
     folder
     => stability reliability integrity performance magic-folder

Old description:

> This ticket concerns a possible improvement to the drop-upload/magic-
> folder design to reduce the risk of inconsistency when reading a changed
> file in order to upload it.
>
> Quoting from [source:docs/proposals/magic-folder/remote-to-local-
> sync.rst]:
>
> Short of filesystem-specific features on Unix or the
> [https://technet.microsoft.com/en-us/library/ee923636%28v=ws.10%29.aspx
> shadow copy service] on Windows (which is per-volume and therefore
> difficult to use in this context), there is no way to *read* the whole
> contents of a file atomically. Therefore, when we read a file in order to
> upload it, we may read an inconsistent version if it was also being
> written locally.
>
> A well-behaved application can avoid this problem for its writes:
>
> * On Unix, if another process modifies a file by renaming a temporary
> file onto it, then we will consistently read either the old contents or
> the new contents.
> * On Windows, if the other process uses sharing flags to deny reads while
> it is writing a file, then we will consistently read either the old
> contents or the new contents, unless a sharing error occurs. In the case
> of a sharing error we should retry later, up to a maximum number of
> retries.
>
> In the case of a not-so-well-behaved application writing to a file at the
> same time we read from it, the magic folder will still be eventually
> consistent, but inconsistent versions may be visible to other users'
> clients. This may also interfere with conflict/overwrite detection for
> those users [TODO EXPLAIN].
>
> In #1440 we implemented a delay, called the *pending delay*, after the
> notification of a filesystem change and before the file is read in order
> to upload it. If another change notification occurs within the pending
> delay time, the delay is restarted. This helps to some extent because it
> means that if files are written more quickly than the pending delay and
> less frequently than the pending delay, we shouldn't encounter this
> inconsistency.
>
> The likelihood of inconsistency could be further reduced, even for writes
> by not-so-well-behaved applications, by delaying the actual upload for a
> further period —called the *stability delay*— after the file has finished
> being read. If a notification occurs between the end of the pending delay
> and the end of the stability delay, then the read would be aborted and
> the notification requeued.
>
> This would have the effect of ensuring that no write notifications have
> been received for the file during a time window that brackets the period
> when it was being read, with margin before and after this period defined
> by the pending and stability delays. The delays are intended to account
> for asynchronous notification of events, and caching in the filesystem.
>
> Note however that we cannot guarantee that the delays will be long enough
> to prevent inconsistency in any particular case. Also, the stability
> delay would potentially affect performance significantly because (unlike
> the pending delay) it is not overlapped when there are multiple files on
> the upload queue. This performance impact could be mitigated by uploading
> files in parallel where possible (#1459).

New description:

 This ticket concerns a possible improvement to the Magic Folder design to
 reduce the risk of inconsistency when reading a changed file in order to
 upload it.

 Quoting from [source:docs/proposals/magic-folder/remote-to-local-
 sync.rst]:

 Short of filesystem-specific features on Unix or the
 [https://technet.microsoft.com/en-us/library/ee923636%28v=ws.10%29.aspx
 shadow copy service] on Windows (which is per-volume and therefore
 difficult to use in this context), there is no way to *read* the whole
 contents of a file atomically. Therefore, when we read a file in order to
 upload it, we may read an inconsistent version if it was also being
 written locally.

 A well-behaved application can avoid this problem for its writes:

 * On Unix, if another process modifies a file by renaming a temporary file
 onto it, then we will consistently read either the old contents or the new
 contents.
 * On Windows, if the other process uses sharing flags to deny reads while
 it is writing a file, then we will consistently read either the old
 contents or the new contents, unless a sharing error occurs. In the case
 of a sharing error we should retry later, up to a maximum number of
 retries.

 In the case of a not-so-well-behaved application writing to a file at the
 same time we read from it, the magic folder will still be eventually
 consistent, but inconsistent versions may be visible to other users'
 clients. This may also interfere with conflict/overwrite detection for
 those users [TODO EXPLAIN].

 In #1440 we implemented a delay, called the *pending delay*, after the
 notification of a filesystem change and before the file is read in order
 to upload it. If another change notification occurs within the pending
 delay time, the delay is restarted. This helps to some extent because it
 means that if files are written more quickly than the pending delay and
 less frequently than the pending delay, we shouldn't encounter this
 inconsistency.

 The likelihood of inconsistency could be further reduced, even for writes
 by not-so-well-behaved applications, by delaying the actual upload for a
 further period —called the *stability delay*— after the file has finished
 being read. If a notification occurs between the end of the pending delay
 and the end of the stability delay, then the read would be aborted and the
 notification requeued.

 This would have the effect of ensuring that no write notifications have
 been received for the file during a time window that brackets the period
 when it was being read, with margin before and after this period defined
 by the pending and stability delays. The delays are intended to account
 for asynchronous notification of events, and caching in the filesystem.

 Note however that we cannot guarantee that the delays will be long enough
 to prevent inconsistency in any particular case. Also, the stability delay
 would potentially affect performance significantly because (unlike the
 pending delay) it is not overlapped when there are multiple files on the
 upload queue. This performance impact could be mitigated by uploading
 files in parallel where possible (#1459).

--

--
Ticket URL: <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/2431#comment:3>
Tahoe-LAFS <https://Tahoe-LAFS.org>
secure decentralized storage