[tahoe-lafs-trac-stream] [tahoe-lafs] #1418: "cannot convert float NaN to integer" in next_power_of_k, during upload via helper

tahoe-lafs trac at tahoe-lafs.org
Thu Jun 23 01:56:45 PDT 2011


#1418: "cannot convert float NaN to integer" in next_power_of_k, during upload via
helper
------------------------+---------------------------
     Reporter:  rycee   |      Owner:  rycee
         Type:  defect  |     Status:  new
     Priority:  major   |  Milestone:  undecided
    Component:  code    |    Version:  1.8.2
   Resolution:          |   Keywords:  helper upload
Launchpad Bug:          |
------------------------+---------------------------

Comment (by rycee):

 Yes, I have applied the patches you've given on the helper node, not the
 client node.  With the new patch I did indeed get some output but being a
 Python novice I feel more confused, not less.  The stacktrace says:

 {{{
   File "/home/rycee/allmydata-tahoe-1.8.2-bug1418/support/lib/python2.6
 /site-
 packages/Twisted-10.2.0-py2.6-linux-i686.egg/twisted/internet/defer.py",
 line 542, in _runCallbacks
     current.result = callback(current.result, *args, **kw)
   File "/home/rycee/allmydata-
 tahoe-1.8.2/src/allmydata/immutable/upload.py", line 926, in
 locate_all_shareholders
     num_segments, n, k, desired)
   File "/home/rycee/allmydata-
 tahoe-1.8.2/src/allmydata/immutable/upload.py", line 225, in
 get_shareholders
     None)
   File "/home/rycee/allmydata-
 tahoe-1.8.2/src/allmydata/immutable/layout.py", line 88, in
 make_write_bucket_proxy
     num_share_hashes, uri_extension_size_max, nodeid)
   File "/home/rycee/allmydata-
 tahoe-1.8.2/src/allmydata/immutable/layout.py", line 108, in __init__
     effective_segments = mathutil.next_power_of_k(num_segments,2)
   File "/home/rycee/allmydata-tahoe-1.8.2/src/allmydata/util/mathutil.py",
 line 49, in next_power_of_k
     return next_power_of_k_math(n, k)
   File "/home/rycee/allmydata-tahoe-1.8.2/src/allmydata/util/mathutil.py",
 line 35, in next_power_of_k_math
     x = int(math.log(n, k) + 0.5)
 exceptions.ValueError: ('cannot convert float NaN to integer', 30L, 2, 32)
 ]'>
 }}}

 and in the node's twistd.log I found

 {{{
 2011-06-23 08:19:10+0200 [Negotiation,1,46.10.48.88] XXX n: 30 :: <type
 'long'>, k: 2 :: <type 'int'>, next_power_of_k_alt: 32
 }}}

 In the python REPL on the '''same''' computer I get

 {{{
 $ python
 Python 2.6.6 (r266:84292, Dec 27 2010, 00:02:40)
 [GCC 4.4.5] on linux2
 Type "help", "copyright", "credits" or "license" for more information.
 >>> import math
 >>> math.log(30L, 2)
 4.9068905956085187
 >>> int(math.log(30L, 2) + 0.5)
 5
 >>> 2**5
 32
 }}}

 This, together with the `Success: files copied`, makes me quite confused.
 Feels like the NaN error is a decoy put under our noses while the real
 problem slips quietly into the night.

 I also tried creating a completely pristine 1.8.2 build on my helper node
 and now it also fails in the same way as the cp did, i.e., it claims
 success and the CHK-caps reported by a verbose backup are OK when running
 `check` but fails when running `check --verify`.  For example, backup says
 {{{
 /home/rycee/photos.new/2011/05/22/IMGP4679.JPG ->
 URI:CHK:k3fpasihz7g7ogsrmbywdfgdy4:jlcyer5z43nuuvdm72qyqlw2eq5uyjubpxey25gfdizidmcdlnrq:1:3:3877153
 }}}

 and checking gives

 {{{
 $ tahoe check
 URI:CHK:k3fpasihz7g7ogsrmbywdfgdy4:jlcyer5z43nuuvdm72qyqlw2eq5uyjubpxey25gfdizidmcdlnrq:1:3:3877153
 Summary: Healthy
  storage index: 7sxgsu3edgy43at77tuskouuay
  good-shares: 3 (encoding is 1-of-3)
  wrong-shares: 0
 $ tahoe check --verify
 URI:CHK:k3fpasihz7g7ogsrmbywdfgdy4:jlcyer5z43nuuvdm72qyqlw2eq5uyjubpxey25gfdizidmcdlnrq:1:3:3877153
 Summary: Not Healthy: 0 shares (enc 1-of-3)
  storage index: 7sxgsu3edgy43at77tuskouuay
  good-shares: 0 (encoding is 1-of-3)
  wrong-shares: 0
  corrupt shares:
   server bzyf23mghgxycnr34pdkqdmybnevf4ks, SI 7sxgsu3edgy43at77tuskouuay,
 shnum 2
   server 44g5kkgwulzrrrntdzci7jtt5rgt6nuo, SI 7sxgsu3edgy43at77tuskouuay,
 shnum 0
   server 5yea4my3w3frgp524lgthrb7rdd6frtr, SI 7sxgsu3edgy43at77tuskouuay,
 shnum 1
 }}}

 I will attach the result of a `diff -ur` between the version giving the
 exception (including zooko's patches from this bug) and the pristine
 version.  Note that the version giving exception contains edits of mine
 that changes minimum_cycle_time in `crawler.py` and `expirer.py` but I
 have had those changes without trouble since a long time (many versions
 ago).

 Oh, finally.  Since `next_power_of_k_alt` returns a sensible result, I
 tried making `next_power_of_k` return that value in my build that produces
 exceptions.  Running a backup then proceeds in the same way as the
 pristine version, i.e., the client reports success but `check --verify`
 fails.

-- 
Ticket URL: <http://tahoe-lafs.org/trac/tahoe-lafs/ticket/1418#comment:10>
tahoe-lafs <http://tahoe-lafs.org>
secure decentralized storage


More information about the tahoe-lafs-trac-stream mailing list