[tahoe-lafs-trac-stream] [tahoe-lafs] #1418: "cannot convert float NaN to integer" in next_power_of_k, during upload via helper

Thu Jun 23 06:11:00 PDT 2011

#1418: "cannot convert float NaN to integer" in next_power_of_k, during upload via
helper
------------------------+---------------------------
     Reporter:  rycee   |      Owner:  rycee
         Type:  defect  |     Status:  new
     Priority:  major   |  Milestone:  undecided
    Component:  code    |    Version:  1.8.2
   Resolution:          |   Keywords:  helper upload
Launchpad Bug:          |
------------------------+---------------------------

Comment (by zooko):

 Yes, this is getting weird. It *looks* like {{{next_power_of_k_math}}}
 raised this exception when its inputs were {{{n=30, k=2}}}, but when you
 tried it yourself in the REPL, the same calls to {{{math.log()}}} on the
 **same** computer worked. It could be a failure in your CPU's floating
 point support, but I would have expected that to be sporadic or permanent,
 rather than to fail every time under {{{tahoe}}} and work every time under
 the REPL! I wonder if executing {{{tahoe}}} is somehow changing the
 floating point mode of your CPU...

 Maybe there's a bug in {{{next_power_of_k_math()}}}. Could you please try
 something like:
 {{{
 HACK zompu:~/playground/tahoe-lafs/trunk$ PYTHONPATH=src python
 Python 2.7.1+ (r271:86832, Apr 11 2011, 18:13:53)
 [GCC 4.5.2] on linux2
 Type "help", "copyright", "credits" or "license" for more information.
 >>> from allmydata.util import mathutil
 >>> mathutil.next_power_of_k(30, 2)
 32
 >>> mathutil.next_power_of_k_math(30, 2)
 32
 >>> mathutil.next_power_of_k_alt(30, 2)
 32
 }}}

 A next-step on this for me is to try to understand why upload is
 succeeding but then verify failing. I guess this error is happening during
 ''write''. If the error is causing it to get an incorrect value for
 {{{effective_segments}}}:
 {{{
   File "/home/rycee/allmydata-
 tahoe-1.8.2/src/allmydata/immutable/layout.py", line 108, in __init__
     effective_segments = mathutil.next_power_of_k(num_segments,2)
 }}}
 Then it might write the data out incorrectly. If the file you are
 uploading was previously uploaded then the deduplication will prevent your
 gateway from uploading a new copy. This would explain why changing it to
 use {{{next_power_of_k_alt()}}} and then uploading and verifying
 previously written files got the same failure-to-verify. Oh wait, did you
 upload a new random file instead of a previously uploaded file, when you
 did the experiment that showed {{{next_power_of_k_alt}}} had the same
 problem? Gah! If you already tried that, then perhaps there are ''two''
 bugs here -- the {{{NaN}}} exception and a different bug that is
 corrupting files on write.

 Anyway it wouldn't make sense for this {{{NaN}}} exception to result in an
 incorrect value of {{{effective_segments}}}, when we can clearly see that
 what results is an uncaught exception!

 Here are a couple of ideas you could try:

 1. Run {{{next_power_of_k()}}} (the version that uses {{{_math}}} and then
 uses {{{_alt}}} if {{{_math}}} raised an exception) in a tight loop,
 possibly with multiple processes doing it, and leave those running and
 reporting if they got any exceptions.

 2. Edit the code to use {{{next_power_of_k_alt}}} exclusively and {{{mv}}}
 your entire {{{storage}}} directory aside, or create an entirely separate
 storage server and introducer for testing, and upload and then verify a
 random file. (If you haven't already done this.)

-- 
Ticket URL: <http://tahoe-lafs.org/trac/tahoe-lafs/ticket/1418#comment:11>
tahoe-lafs <http://tahoe-lafs.org>
secure decentralized storage