[tahoe-lafs-trac-stream] [tahoe-lafs] #1392: if you have fewer than 1000 measurements, return None (meaning "I don't know") when asked for the 99.9% percentile.
tahoe-lafs
trac at tahoe-lafs.org
Sat Apr 23 10:45:58 PDT 2011
#1392: if you have fewer than 1000 measurements, return None (meaning "I don't
know") when asked for the 99.9% percentile.
-------------------------------+----------------------------------
Reporter: arch_o_median | Owner: arch_o_median
Type: defect | Status: new
Priority: minor | Milestone: undecided
Component: code-storage | Version: 1.8.2
Resolution: | Keywords: design-review-needed
Launchpad Bug: |
-------------------------------+----------------------------------
Comment (by arch_o_median):
Per Zooko's request the latest version of test_latencies/get_latencies has
behavior that depends on the latency sample size.
Recapitulation:
(The Problem)
The notion of a percentile becomes ambiguous as the precision in the
`percentile' reported becomes over specific for the quantity of data
provided. For example, if the size of a sample is less than 10 then the
01th percentile and the 10th percentile refer to the same index (the
first) in the sorted list of samples. This matches the definition of a
percentile, that is both 1 and 10 percent of the data is less than the
first element, but can be misleading in interpretation. If the consumer
believes that the 01th and 10th percentile should refer to different
indices in the list then they will be mistaken.
The intuition that different percentiles are references to different
indices is reasonable and should be supported. The degree to which the
percentiles _are_ distinct is a function of their precision and the size
of the sample. Larger samples permit more precise percentiles to be
meaningful. I use the word `resolution' in my head when I think of this
concept. Larger samples sizes permit higher `resolution'.
(The Solution):
Indistinct percentiles are indicative of insufficient resolution for
the specified percentile. `Indistinct' can be simply defined as multiple
references by different percentiles to the same index. The fix is quite
simple, if percentiles are indistinct, they should return/report None
instead of an index.
Caveat: It is, of course, possible to render all percentiles indistinct
by specifying over-precise adjacent percentiles. This hack was created
with the given percentile list in mind, that is, I am operating on the
assumption that the consumer believes .99 and .999 to be different things
but does not need to know whether .999 and .9999 are different quantities.
--
Ticket URL: <http://tahoe-lafs.org/trac/tahoe-lafs/ticket/1392#comment:12>
tahoe-lafs <http://tahoe-lafs.org>
secure decentralized storage
More information about the tahoe-lafs-trac-stream
mailing list