Opened at 2012-12-05T03:32:23Z
Last modified at 2020-10-30T12:35:44Z
#1885 closed defect
cloud backend: redundant reads of chunks from cloud when downloading large files — at Version 5
Reported by: | davidsarah | Owned by: | |
---|---|---|---|
Priority: | normal | Milestone: | 1.15.0 |
Component: | code-storage | Version: | 1.9.2 |
Keywords: | cloud-backend cache download performance | Cc: | |
Launchpad Bug: |
Description (last modified by daira)
I uploaded a 7.7 MiB video as an MDMF file using the cloud backend on S3 (as of 1819-cloud-merge/022796fb), and then downloaded it. From flogtool tailing the storage server, I saw that it was reading the same chunks multiple times during the download. That suggests that the chunk cache is not operating well enough.
The file was being downloaded by playing it as a video in Chromium; I don't think that makes a difference.
Update: this also applies to immutable files if they are large enough.
Change History (5)
comment:1 Changed at 2012-12-05T03:37:28Z by davidsarah
comment:2 Changed at 2012-12-05T03:43:09Z by davidsarah
Same behaviour for a straight download, rather than playing a video. Each chunk seems to get read 5 times, and the first chunk (containing the header) many more times.
comment:3 Changed at 2013-05-24T22:12:10Z by daira
- Description modified (diff)
- Summary changed from cloud backend: redundant reads of chunks from S3 when downloading large MDMF file to cloud backend: redundant reads of chunks from cloud when downloading large MDMF file
comment:4 Changed at 2013-05-24T22:13:15Z by daira
- Description modified (diff)
comment:5 Changed at 2013-05-28T16:01:45Z by daira
- Description modified (diff)
- Keywords mdmf removed
- Summary changed from cloud backend: redundant reads of chunks from cloud when downloading large MDMF file to cloud backend: redundant reads of chunks from cloud when downloading large files
I changed ChunkCache to use a true LRU replacement policy, and that seems to have fixed this problem. (LRU is not often used because keeping track of ages can be inefficient for a large cache, but here we only need a cache of a few elements. In practice 5 chunks seems to be sufficient for the sizes of files I've tested; will investigate whether it's enough for larger files later.)
During the upload and download, the server memory usage didn't go above 50 MiB according to the statmover graph.