[tahoe-lafs-trac-stream] [tahoe-lafs] #1456: High latency for 'tahoe get' if 'tahoe put' in parallel
tahoe-lafs
trac at tahoe-lafs.org
Sun Jul 31 12:33:32 PDT 2011
#1456: High latency for 'tahoe get' if 'tahoe put' in parallel
-------------------------+-------------------------------------------------
Reporter: T_X | Owner: T_X
Type: defect | Status: new
Priority: | Milestone: undecided
critical | Version: 1.8.2
Component: code | Keywords: download upload latency performance
Resolution: | gateway vm kvm vpn trickle
Launchpad Bug: |
-------------------------+-------------------------------------------------
Changes (by zooko):
* keywords: latency performance gateway vm kvm vpn trickle => download
upload latency performance gateway vm kvm vpn trickle
* owner: somebody => T_X
* priority: major => critical
Comment:
T_X: thank you for the bug report. It sounds like it might be a serious
problem in Tahoe-LAFS. I'm glad you've taken the effort to record detailed
measurements and include notes about how you tries to make a minimal,
reproducible case. I especially appreciate that you included your test
script—very good!
However, I'm still confused and need more help from you to understand
what's going on. Could you summarize in one paragraph of English -- like
not more than 3 or 4 sentences what is wrong and how you know it is
happening?
You're observing dramatically high latency on {{{tahoe get}}} in some
cases. In fact, in 10 runs of {{{tahoe get}}} ([attachment:tahoe-
stats-2.log]), it took this many seconds:
{{{
1 15.38
2 213.31
3 564.83
4 11.87
5 11.99
6 11.99
7 12.56
8 12.11
9 12.50
10 12.83
}}}
The fact that it took 560 seconds to do a {{{tahoe get}}} (after which it
completed successfully instead of erroring out?) is definitely an
indication of something very wrong. I'm still hoping it turns out to be
something wrong in your test rig or scripts rather than in Tahoe-LAFS, but
we'll see. :-)
So, that's a question. How do we know that the runs that took an order of
magnitude longer completed successfully? As far as I can tell from a quick
scan of [attachment:test-run.sh#L16 your script], it isn't checking the
return value or inspecting the resulting downloaded file to be sure it
worked.
(Note this would still be a major problem in Tahoe-LAFS if it waited 560
seconds and failed as if it waited 560 seconds and succeeded, but it would
help to understand which is happening).
Thanks!
--
Ticket URL: <http://tahoe-lafs.org/trac/tahoe-lafs/ticket/1456#comment:1>
tahoe-lafs <http://tahoe-lafs.org>
secure decentralized storage
More information about the tahoe-lafs-trac-stream
mailing list