[tahoe-lafs-trac-stream] [tahoe-lafs] #1418: "cannot convert float NaN to integer" in next_power_of_k, during upload via helper
tahoe-lafs
trac at tahoe-lafs.org
Thu Jun 23 01:56:45 PDT 2011
#1418: "cannot convert float NaN to integer" in next_power_of_k, during upload via
helper
------------------------+---------------------------
Reporter: rycee | Owner: rycee
Type: defect | Status: new
Priority: major | Milestone: undecided
Component: code | Version: 1.8.2
Resolution: | Keywords: helper upload
Launchpad Bug: |
------------------------+---------------------------
Comment (by rycee):
Yes, I have applied the patches you've given on the helper node, not the
client node. With the new patch I did indeed get some output but being a
Python novice I feel more confused, not less. The stacktrace says:
{{{
File "/home/rycee/allmydata-tahoe-1.8.2-bug1418/support/lib/python2.6
/site-
packages/Twisted-10.2.0-py2.6-linux-i686.egg/twisted/internet/defer.py",
line 542, in _runCallbacks
current.result = callback(current.result, *args, **kw)
File "/home/rycee/allmydata-
tahoe-1.8.2/src/allmydata/immutable/upload.py", line 926, in
locate_all_shareholders
num_segments, n, k, desired)
File "/home/rycee/allmydata-
tahoe-1.8.2/src/allmydata/immutable/upload.py", line 225, in
get_shareholders
None)
File "/home/rycee/allmydata-
tahoe-1.8.2/src/allmydata/immutable/layout.py", line 88, in
make_write_bucket_proxy
num_share_hashes, uri_extension_size_max, nodeid)
File "/home/rycee/allmydata-
tahoe-1.8.2/src/allmydata/immutable/layout.py", line 108, in __init__
effective_segments = mathutil.next_power_of_k(num_segments,2)
File "/home/rycee/allmydata-tahoe-1.8.2/src/allmydata/util/mathutil.py",
line 49, in next_power_of_k
return next_power_of_k_math(n, k)
File "/home/rycee/allmydata-tahoe-1.8.2/src/allmydata/util/mathutil.py",
line 35, in next_power_of_k_math
x = int(math.log(n, k) + 0.5)
exceptions.ValueError: ('cannot convert float NaN to integer', 30L, 2, 32)
]'>
}}}
and in the node's twistd.log I found
{{{
2011-06-23 08:19:10+0200 [Negotiation,1,46.10.48.88] XXX n: 30 :: <type
'long'>, k: 2 :: <type 'int'>, next_power_of_k_alt: 32
}}}
In the python REPL on the '''same''' computer I get
{{{
$ python
Python 2.6.6 (r266:84292, Dec 27 2010, 00:02:40)
[GCC 4.4.5] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import math
>>> math.log(30L, 2)
4.9068905956085187
>>> int(math.log(30L, 2) + 0.5)
5
>>> 2**5
32
}}}
This, together with the `Success: files copied`, makes me quite confused.
Feels like the NaN error is a decoy put under our noses while the real
problem slips quietly into the night.
I also tried creating a completely pristine 1.8.2 build on my helper node
and now it also fails in the same way as the cp did, i.e., it claims
success and the CHK-caps reported by a verbose backup are OK when running
`check` but fails when running `check --verify`. For example, backup says
{{{
/home/rycee/photos.new/2011/05/22/IMGP4679.JPG ->
URI:CHK:k3fpasihz7g7ogsrmbywdfgdy4:jlcyer5z43nuuvdm72qyqlw2eq5uyjubpxey25gfdizidmcdlnrq:1:3:3877153
}}}
and checking gives
{{{
$ tahoe check
URI:CHK:k3fpasihz7g7ogsrmbywdfgdy4:jlcyer5z43nuuvdm72qyqlw2eq5uyjubpxey25gfdizidmcdlnrq:1:3:3877153
Summary: Healthy
storage index: 7sxgsu3edgy43at77tuskouuay
good-shares: 3 (encoding is 1-of-3)
wrong-shares: 0
$ tahoe check --verify
URI:CHK:k3fpasihz7g7ogsrmbywdfgdy4:jlcyer5z43nuuvdm72qyqlw2eq5uyjubpxey25gfdizidmcdlnrq:1:3:3877153
Summary: Not Healthy: 0 shares (enc 1-of-3)
storage index: 7sxgsu3edgy43at77tuskouuay
good-shares: 0 (encoding is 1-of-3)
wrong-shares: 0
corrupt shares:
server bzyf23mghgxycnr34pdkqdmybnevf4ks, SI 7sxgsu3edgy43at77tuskouuay,
shnum 2
server 44g5kkgwulzrrrntdzci7jtt5rgt6nuo, SI 7sxgsu3edgy43at77tuskouuay,
shnum 0
server 5yea4my3w3frgp524lgthrb7rdd6frtr, SI 7sxgsu3edgy43at77tuskouuay,
shnum 1
}}}
I will attach the result of a `diff -ur` between the version giving the
exception (including zooko's patches from this bug) and the pristine
version. Note that the version giving exception contains edits of mine
that changes minimum_cycle_time in `crawler.py` and `expirer.py` but I
have had those changes without trouble since a long time (many versions
ago).
Oh, finally. Since `next_power_of_k_alt` returns a sensible result, I
tried making `next_power_of_k` return that value in my build that produces
exceptions. Running a backup then proceeds in the same way as the
pristine version, i.e., the client reports success but `check --verify`
fails.
--
Ticket URL: <http://tahoe-lafs.org/trac/tahoe-lafs/ticket/1418#comment:10>
tahoe-lafs <http://tahoe-lafs.org>
secure decentralized storage
More information about the tahoe-lafs-trac-stream
mailing list