[tahoe-lafs-trac-stream] [tahoe-lafs] #1418: "cannot convert float NaN to integer" in next_power_of_k, during upload via helper
tahoe-lafs
trac at tahoe-lafs.org
Thu Jun 23 06:11:00 PDT 2011
#1418: "cannot convert float NaN to integer" in next_power_of_k, during upload via
helper
------------------------+---------------------------
Reporter: rycee | Owner: rycee
Type: defect | Status: new
Priority: major | Milestone: undecided
Component: code | Version: 1.8.2
Resolution: | Keywords: helper upload
Launchpad Bug: |
------------------------+---------------------------
Comment (by zooko):
Yes, this is getting weird. It *looks* like {{{next_power_of_k_math}}}
raised this exception when its inputs were {{{n=30, k=2}}}, but when you
tried it yourself in the REPL, the same calls to {{{math.log()}}} on the
**same** computer worked. It could be a failure in your CPU's floating
point support, but I would have expected that to be sporadic or permanent,
rather than to fail every time under {{{tahoe}}} and work every time under
the REPL! I wonder if executing {{{tahoe}}} is somehow changing the
floating point mode of your CPU...
Maybe there's a bug in {{{next_power_of_k_math()}}}. Could you please try
something like:
{{{
HACK zompu:~/playground/tahoe-lafs/trunk$ PYTHONPATH=src python
Python 2.7.1+ (r271:86832, Apr 11 2011, 18:13:53)
[GCC 4.5.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from allmydata.util import mathutil
>>> mathutil.next_power_of_k(30, 2)
32
>>> mathutil.next_power_of_k_math(30, 2)
32
>>> mathutil.next_power_of_k_alt(30, 2)
32
}}}
A next-step on this for me is to try to understand why upload is
succeeding but then verify failing. I guess this error is happening during
''write''. If the error is causing it to get an incorrect value for
{{{effective_segments}}}:
{{{
File "/home/rycee/allmydata-
tahoe-1.8.2/src/allmydata/immutable/layout.py", line 108, in __init__
effective_segments = mathutil.next_power_of_k(num_segments,2)
}}}
Then it might write the data out incorrectly. If the file you are
uploading was previously uploaded then the deduplication will prevent your
gateway from uploading a new copy. This would explain why changing it to
use {{{next_power_of_k_alt()}}} and then uploading and verifying
previously written files got the same failure-to-verify. Oh wait, did you
upload a new random file instead of a previously uploaded file, when you
did the experiment that showed {{{next_power_of_k_alt}}} had the same
problem? Gah! If you already tried that, then perhaps there are ''two''
bugs here -- the {{{NaN}}} exception and a different bug that is
corrupting files on write.
Anyway it wouldn't make sense for this {{{NaN}}} exception to result in an
incorrect value of {{{effective_segments}}}, when we can clearly see that
what results is an uncaught exception!
Here are a couple of ideas you could try:
1. Run {{{next_power_of_k()}}} (the version that uses {{{_math}}} and then
uses {{{_alt}}} if {{{_math}}} raised an exception) in a tight loop,
possibly with multiple processes doing it, and leave those running and
reporting if they got any exceptions.
2. Edit the code to use {{{next_power_of_k_alt}}} exclusively and {{{mv}}}
your entire {{{storage}}} directory aside, or create an entirely separate
storage server and introducer for testing, and upload and then verify a
random file. (If you haven't already done this.)
--
Ticket URL: <http://tahoe-lafs.org/trac/tahoe-lafs/ticket/1418#comment:11>
tahoe-lafs <http://tahoe-lafs.org>
secure decentralized storage
More information about the tahoe-lafs-trac-stream
mailing list