Opened at 2009-05-12T19:06:51Z
Closed at 2011-12-29T06:43:08Z
#704 closed defect (wontfix)
utf-8 decoding fails when certain pyOpenSSL library is used
Reported by: | bewst | Owned by: | bewst |
---|---|---|---|
Priority: | major | Milestone: | undecided |
Component: | packaging | Version: | 1.4.1 |
Keywords: | utf-8 unicode openssl | Cc: | midnightmagic |
Launchpad Bug: | 434411 |
Description
Please see attached test log
Attachments (2)
Change History (29)
Changed at 2009-05-12T19:07:15Z by bewst
comment:1 follow-up: ↓ 4 Changed at 2009-05-12T19:33:59Z by warner
comment:2 follow-up: ↓ 5 Changed at 2009-05-12T20:01:07Z by warner
Also, could you run the following steps to generate a new certificate and then examine it to see what the "Subject" names are?
% python >>> from foolscap import Tub >>> t = Tub(certFile="dummy.pem") >>> (Control-D) % ls dummy.pem dummy.pem % openssl x509 -in dummy.pem -text
On my OS-X system, I see "Subject: CN=newpb_thingy". Do you get the same? It might also help us if you could attach that dummy.pem file to this ticket (but of course don't use it for anything else).
My current hunch is that the Foolscap-generated x509 certificates are either being created with corrupt (i.e. non-UTF-8) subject-name strings, or they're somehow being corrupted afterwards.
comment:3 Changed at 2009-05-14T20:30:03Z by zooko
- Component changed from unknown to code-network
- Owner changed from nobody to bewst
We're waiting for more information from the original bug reporter, bewst.
comment:4 in reply to: ↑ 1 Changed at 2009-05-28T04:00:22Z by bewst
Replying to warner:
Interesting.. I see a lot of unicode decode exceptions while trying to parse the "subject" of an X.509 certificate (must be some underlying SSL thing, since foolscap doesn't care about fields like that).
Does your system perhaps have a non-ascii hostname?
Nope. The hostname command yields: “zreba.local”
Could you run the Foolscap unit tests (see http://foolscap.lothar.com/trac to download a tarball directly) and see if they complain about the same sort of thing?
Looks like it does. See attached foolscap.log.
What version of Twisted are these tests using? ("twistd --version" is probably the easiest way to get it, although "./bin/tahoe --version" from your Tahoe tree will give even more information if it works)
Hmm,
$ twistd --version twistd (the Twisted daemon) 2.5.0 Copyright (c) 2001-2006 Twisted Matrix Laboratories. See LICENSE for details. $ # err, OK, that was the one installed with the system's python (2.4) $ twistd2.5 --version twistd (the Twisted daemon) 8.2.0 Copyright (c) 2001-2008 Twisted Matrix Laboratories. See LICENSE for details. $ ./bin/tahoe --version allmydata-tahoe: 1.4.1, foolscap: 0.3.2, pycryptopp: 0.5.10, zfec: 1.4.2, Twisted: 8.2.0, Nevow: 0.9.32, zope.interface: 3.3.0, python: 2.5.4, platform: Darwin-9.7.0-i386-32bit, simplejson: 2.0.1, argparse: 0.8.0, pyOpenSSL: 0.7, pyutil: 1.3.28, zbase32: 1.1.1, setuptools: 0.6c12dev
Also, please check to see what Python's default encodings are.. here's how I look at them on my system:
<schnipp>
Looks the same as yours:
python2.5 Python 2.5.4 (r254:67916, May 6 2009, 18:40:46) [GCC 4.0.1 (Apple Inc. build 5490)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import sys >>> sys.getdefaultencoding() 'ascii' >>> sys.getfilesystemencoding() 'utf-8' >>>
Changed at 2009-05-28T04:00:57Z by bewst
comment:5 in reply to: ↑ 2 Changed at 2009-05-28T04:05:59Z by bewst
Replying to warner:
Also, could you run the following steps to generate a new certificate and then examine it to see what the "Subject" names are?
<schnipp>
On my OS-X system, I see "Subject: CN=newpb_thingy". Do you get the same? It might also help us if you could attach that dummy.pem file to this ticket (but of course don't use it for anything else).
My current hunch is that the Foolscap-generated x509 certificates are either being created with corrupt (i.e. non-UTF-8) subject-name strings, or they're somehow being corrupted afterwards.
Looks like things are going wrong much earlier:
$ python2.5 Python 2.5.4 (r254:67916, May 6 2009, 18:40:46) [GCC 4.0.1 (Apple Inc. build 5490)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> from foolscap import Tub >>> t = Tub(certFile="dummy.pem") Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/opt/local/lib/python2.5/site-packages/foolscap-0.3.2-py2.5.egg/foolscap/pb.py", line 222, in __init__ self.setupEncryptionFile(certFile) File "/opt/local/lib/python2.5/site-packages/foolscap-0.3.2-py2.5.egg/foolscap/pb.py", line 234, in setupEncryptionFile self.setupEncryption(certData) File "/opt/local/lib/python2.5/site-packages/foolscap-0.3.2-py2.5.egg/foolscap/pb.py", line 249, in setupEncryption cert = self.createCertificate() File "/opt/local/lib/python2.5/site-packages/foolscap-0.3.2-py2.5.egg/foolscap/pb.py", line 442, in createCertificate 132) File "/opt/local/lib/python2.5/site-packages/twisted/internet/_sslverify.py", line 539, in signCertificateRequest hlreq = CertificateRequest.load(requestData, requestFormat) File "/opt/local/lib/python2.5/site-packages/twisted/internet/_sslverify.py", line 310, in load dn._copyFrom(req.get_subject()) File "/opt/local/lib/python2.5/site-packages/twisted/internet/_sslverify.py", line 64, in _copyFrom value = getattr(x509name, name, None) UnicodeDecodeError: 'utf8' codec can't decode bytes in position 0-5: unsupported Unicode code range >>>
comment:6 Changed at 2009-05-28T20:55:02Z by bewst
I don't know if this is any help, but pdb is showing me this:
(Pdb) p x509name <X509Name object '/CN=\xFD\xAE\x99\x97\x9D\xB0\xFD\xA2\x97\xB7\x91\xA8\xFD\xA9\x9B\xA6\x9D\xB9'>
comment:7 Changed at 2009-05-29T00:44:03Z by bewst
Problem solved, I guess. I mean, it's still a mystery how this could have happened, but I had a pyOpenSSL egg installed that was causing the problem... and it masked the py25-openssl package that I subsequently installed with macports. Everything started working once I had removed the original egg. My strong suspicion is that it was built with a different Python2.5, with a UCS4 setting.
My current Python says:
$ python -c "import sys;print(sys.maxunicode<66000)and'UCS2'or'UCS4'" UCS2
This page put me onto that possibility.
comment:8 Changed at 2009-05-29T00:44:23Z by bewst
- Resolution set to invalid
- Status changed from new to closed
comment:9 Changed at 2009-05-29T03:30:34Z by warner
Wow, that's wacky. My OS-X box also reports UCS2, while my linux box reports UCS4. I wonder if that means the pyopenssl library is doing naieve string conversion: interpreting some underlying openssl field as a unicode string, and hoping that openssl is using the same representation as python is using.
Anyways, thanks for tracking this down! I'm sure others will run into this problem again in the future, and it's great to have a searchable page that explains how to fix it.
comment:10 Changed at 2009-05-29T15:18:56Z by bewst
Looks like this is an old, old problem: http://mail.python.org/pipermail/distutils-sig/2006-August/006585.html
:(
comment:11 Changed at 2009-05-29T15:27:41Z by bewst
A better link, perhaps: http://markmail.org/message/bla5vrwlv3kn3n7e
comment:12 Changed at 2009-06-10T17:55:27Z by zooko
- Component changed from code-network to packaging
- Resolution invalid deleted
- Status changed from closed to reopened
I opened a ticket for setuptools:
http://bugs.python.org/setuptools/issue78 # egg platform names don't reflect unicode variant (UCS2, UCS4)
comment:13 Changed at 2009-06-10T17:55:40Z by zooko
Thanks for tracking this one down, bewst.
comment:14 Changed at 2009-06-10T17:56:59Z by zooko
- Summary changed from Test failures on MacOS to eggs don't say whether they have UCS2 or UCS4 unicode implementation
comment:15 Changed at 2009-09-22T02:20:02Z by bewst
Zooko, what are you waiting for me to do/answer? I don't see it above.
comment:16 Changed at 2009-09-22T02:27:56Z by bewst
- Owner changed from bewst to zooko
- Status changed from reopened to new
comment:17 Changed at 2009-09-22T02:33:41Z by zooko
- Owner changed from zooko to bewst
There was no request for you outstanding, so this should have been unassigned from you. However, just recently I started a discussion on the python-dev list, and referenced this ticket, and they said that the symptoms that we observed are not the symptoms they would expect from having an inconsistency of internal unicode format between Python interpreter and Python module. If that were the problem, we should have seen something like "undefined symbol: PyUnicodeUCS4_FromUnicode", not the utf-8 decode error that we saw.
Here is the comment on python-dev to that effect:
http://mail.python.org/pipermail/python-dev/2009-September/091943.html
So, now there is something you could do to help: see if you still have that pyOpenSSL library that you mentioned, the removal of which fixed this problem for you, so we can try to see what was wrong with it.
comment:18 Changed at 2009-09-22T02:36:30Z by zooko
- Summary changed from eggs don't say whether they have UCS2 or UCS4 unicode implementation to utf-8 decoding fails when certain pyOpenSSL library is used
comment:19 Changed at 2009-09-22T02:37:14Z by zooko
By the way, over on http://bugs.python.org/setuptools/issue78 midnightmagic says that he had the same symptoms. Maybe he could help us diagnose it.
comment:20 Changed at 2009-09-22T02:37:43Z by zooko
- Cc midnightmagic added
comment:21 Changed at 2009-09-22T03:11:17Z by zooko
I opened a bug report with the pyOpenSSL project: https://bugs.launchpad.net/setuptools/+bug/434411 . pyOpenSSL uses launchpad as its issue tracker, and launchpad has a nice quality of integrating with other issue trackers in order to track issues which span multiple projects. launchpad bug 434411 is currently linked to pyOpenSSL, Tahoe-LAFS, and setuptools, although it may turn out that this issue is independent of the setuptools issue, which has to do with whether your python packages use UCS4 or UCS2 internal unicode encoding.
comment:22 Changed at 2009-09-22T03:21:00Z by launchpad
- Launchpad Bug set to 434411
Updating Launchpad bug reference
comment:23 Changed at 2009-10-27T03:09:07Z by zooko
- Resolution set to wontfix
- Status changed from new to closed
Okay, we can't reproduce this issue so I'm going to close this ticket as "wontfix".
comment:24 Changed at 2009-11-23T02:42:36Z by davidsarah
- Keywords utf-8 unicode added
comment:25 Changed at 2010-04-11T17:35:18Z by zooko
- Resolution wontfix deleted
- Status changed from closed to reopened
kpreid encountered this same issue. I will add details to http://launchpad.net/bugs/434411 .
comment:26 Changed at 2010-06-17T21:13:06Z by davidsarah
- Keywords openssl added
comment:27 Changed at 2011-12-29T06:43:08Z by zooko
- Resolution set to wontfix
- Status changed from reopened to closed
I'm closing this (again) as wontfix -- only the pyOpenSSL project, or possibly Python or setuptools or someone -- can fix this.
Interesting.. I see a lot of unicode decode exceptions while trying to parse the "subject" of an X.509 certificate (must be some underlying SSL thing, since foolscap doesn't care about fields like that).
Does your system perhaps have a non-ascii hostname?
Could you run the Foolscap unit tests (see http://foolscap.lothar.com/trac to download a tarball directly) and see if they complain about the same sort of thing?
What version of Twisted are these tests using? ("twistd --version" is probably the easiest way to get it, although "./bin/tahoe --version" from your Tahoe tree will give even more information if it works)
Also, please check to see what Python's default encodings are.. here's how I look at them on my system: