#704 closed defect (wontfix)

utf-8 decoding fails when certain pyOpenSSL library is used

Reported by: bewst Owned by: bewst
Priority: major Milestone: undecided
Component: packaging Version: 1.4.1
Keywords: utf-8 unicode openssl Cc: midnightmagic
Launchpad Bug: 434411

Description

Please see attached test log

Attachments (2)

tahoe.log (130.9 KB) - added by bewst at 2009-05-12T19:07:15Z.
foolscap.log (168.2 KB) - added by bewst at 2009-05-28T04:00:57Z.

Download all attachments as: .zip

Change History (29)

Changed at 2009-05-12T19:07:15Z by bewst

comment:1 follow-up: Changed at 2009-05-12T19:33:59Z by warner

Interesting.. I see a lot of unicode decode exceptions while trying to parse the "subject" of an X.509 certificate (must be some underlying SSL thing, since foolscap doesn't care about fields like that).

Does your system perhaps have a non-ascii hostname?

Could you run the Foolscap unit tests (see http://foolscap.lothar.com/trac to download a tarball directly) and see if they complain about the same sort of thing?

What version of Twisted are these tests using? ("twistd --version" is probably the easiest way to get it, although "./bin/tahoe --version" from your Tahoe tree will give even more information if it works)

Also, please check to see what Python's default encodings are.. here's how I look at them on my system:

% python
Python 2.5.1 (r251:54869, Apr 18 2007, 22:08:04) 
[GCC 4.0.1 (Apple Computer, Inc. build 5367)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.getdefaultencoding()
'ascii'
>>> sys.getfilesystemencoding()
'utf-8'
>>> 

comment:2 follow-up: Changed at 2009-05-12T20:01:07Z by warner

Also, could you run the following steps to generate a new certificate and then examine it to see what the "Subject" names are?

% python
>>> from foolscap import Tub
>>> t = Tub(certFile="dummy.pem")
>>> (Control-D)
% ls dummy.pem
dummy.pem
% openssl x509 -in dummy.pem -text

On my OS-X system, I see "Subject: CN=newpb_thingy". Do you get the same? It might also help us if you could attach that dummy.pem file to this ticket (but of course don't use it for anything else).

My current hunch is that the Foolscap-generated x509 certificates are either being created with corrupt (i.e. non-UTF-8) subject-name strings, or they're somehow being corrupted afterwards.

comment:3 Changed at 2009-05-14T20:30:03Z by zooko

  • Component changed from unknown to code-network
  • Owner changed from nobody to bewst

We're waiting for more information from the original bug reporter, bewst.

comment:4 in reply to: ↑ 1 Changed at 2009-05-28T04:00:22Z by bewst

Replying to warner:

Interesting.. I see a lot of unicode decode exceptions while trying to parse the "subject" of an X.509 certificate (must be some underlying SSL thing, since foolscap doesn't care about fields like that).

Does your system perhaps have a non-ascii hostname?

Nope. The hostname command yields: “zreba.local”

Could you run the Foolscap unit tests (see http://foolscap.lothar.com/trac to download a tarball directly) and see if they complain about the same sort of thing?

Looks like it does. See attached foolscap.log.

What version of Twisted are these tests using? ("twistd --version" is probably the easiest way to get it, although "./bin/tahoe --version" from your Tahoe tree will give even more information if it works)

Hmm,

$ twistd --version
twistd (the Twisted daemon) 2.5.0
Copyright (c) 2001-2006 Twisted Matrix Laboratories.
See LICENSE for details.
$ # err, OK, that was the one installed with the system's python (2.4)
$ twistd2.5 --version
twistd (the Twisted daemon) 8.2.0
Copyright (c) 2001-2008 Twisted Matrix Laboratories.
See LICENSE for details.
$ ./bin/tahoe --version
allmydata-tahoe: 1.4.1, foolscap: 0.3.2, pycryptopp: 0.5.10, zfec: 1.4.2, Twisted: 8.2.0, Nevow: 0.9.32, zope.interface: 3.3.0, python: 2.5.4, platform: Darwin-9.7.0-i386-32bit, simplejson: 2.0.1, argparse: 0.8.0, pyOpenSSL: 0.7, pyutil: 1.3.28, zbase32: 1.1.1, setuptools: 0.6c12dev

Also, please check to see what Python's default encodings are.. here's how I look at them on my system:

<schnipp>

Looks the same as yours:

 python2.5
Python 2.5.4 (r254:67916, May  6 2009, 18:40:46) 
[GCC 4.0.1 (Apple Inc. build 5490)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.getdefaultencoding()
'ascii'
>>> sys.getfilesystemencoding()
'utf-8'
>>> 

Changed at 2009-05-28T04:00:57Z by bewst

comment:5 in reply to: ↑ 2 Changed at 2009-05-28T04:05:59Z by bewst

Replying to warner:

Also, could you run the following steps to generate a new certificate and then examine it to see what the "Subject" names are?

<schnipp>

On my OS-X system, I see "Subject: CN=newpb_thingy". Do you get the same? It might also help us if you could attach that dummy.pem file to this ticket (but of course don't use it for anything else).

My current hunch is that the Foolscap-generated x509 certificates are either being created with corrupt (i.e. non-UTF-8) subject-name strings, or they're somehow being corrupted afterwards.

Looks like things are going wrong much earlier:

$ python2.5
Python 2.5.4 (r254:67916, May  6 2009, 18:40:46) 
[GCC 4.0.1 (Apple Inc. build 5490)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from foolscap import Tub
>>> t = Tub(certFile="dummy.pem")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/local/lib/python2.5/site-packages/foolscap-0.3.2-py2.5.egg/foolscap/pb.py", line 222, in __init__
    self.setupEncryptionFile(certFile)
  File "/opt/local/lib/python2.5/site-packages/foolscap-0.3.2-py2.5.egg/foolscap/pb.py", line 234, in setupEncryptionFile
    self.setupEncryption(certData)
  File "/opt/local/lib/python2.5/site-packages/foolscap-0.3.2-py2.5.egg/foolscap/pb.py", line 249, in setupEncryption
    cert = self.createCertificate()
  File "/opt/local/lib/python2.5/site-packages/foolscap-0.3.2-py2.5.egg/foolscap/pb.py", line 442, in createCertificate
    132)
  File "/opt/local/lib/python2.5/site-packages/twisted/internet/_sslverify.py", line 539, in signCertificateRequest
    hlreq = CertificateRequest.load(requestData, requestFormat)
  File "/opt/local/lib/python2.5/site-packages/twisted/internet/_sslverify.py", line 310, in load
    dn._copyFrom(req.get_subject())
  File "/opt/local/lib/python2.5/site-packages/twisted/internet/_sslverify.py", line 64, in _copyFrom
    value = getattr(x509name, name, None)
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 0-5: unsupported Unicode code range
>>> 

comment:6 Changed at 2009-05-28T20:55:02Z by bewst

I don't know if this is any help, but pdb is showing me this:

(Pdb) p x509name
<X509Name object '/CN=\xFD\xAE\x99\x97\x9D\xB0\xFD\xA2\x97\xB7\x91\xA8\xFD\xA9\x9B\xA6\x9D\xB9'>

comment:7 Changed at 2009-05-29T00:44:03Z by bewst

Problem solved, I guess. I mean, it's still a mystery how this could have happened, but I had a pyOpenSSL egg installed that was causing the problem... and it masked the py25-openssl package that I subsequently installed with macports. Everything started working once I had removed the original egg. My strong suspicion is that it was built with a different Python2.5, with a UCS4 setting.

My current Python says:

$ python -c "import sys;print(sys.maxunicode<66000)and'UCS2'or'UCS4'"
UCS2

This page put me onto that possibility.

comment:8 Changed at 2009-05-29T00:44:23Z by bewst

  • Resolution set to invalid
  • Status changed from new to closed

comment:9 Changed at 2009-05-29T03:30:34Z by warner

Wow, that's wacky. My OS-X box also reports UCS2, while my linux box reports UCS4. I wonder if that means the pyopenssl library is doing naieve string conversion: interpreting some underlying openssl field as a unicode string, and hoping that openssl is using the same representation as python is using.

Anyways, thanks for tracking this down! I'm sure others will run into this problem again in the future, and it's great to have a searchable page that explains how to fix it.

comment:12 Changed at 2009-06-10T17:55:27Z by zooko

  • Component changed from code-network to packaging
  • Resolution invalid deleted
  • Status changed from closed to reopened

I opened a ticket for setuptools:

http://bugs.python.org/setuptools/issue78 # egg platform names don't reflect unicode variant (UCS2, UCS4)

comment:13 Changed at 2009-06-10T17:55:40Z by zooko

Thanks for tracking this one down, bewst.

comment:14 Changed at 2009-06-10T17:56:59Z by zooko

  • Summary changed from Test failures on MacOS to eggs don't say whether they have UCS2 or UCS4 unicode implementation

comment:15 Changed at 2009-09-22T02:20:02Z by bewst

Zooko, what are you waiting for me to do/answer? I don't see it above.

comment:16 Changed at 2009-09-22T02:27:56Z by bewst

  • Owner changed from bewst to zooko
  • Status changed from reopened to new

comment:17 Changed at 2009-09-22T02:33:41Z by zooko

  • Owner changed from zooko to bewst

There was no request for you outstanding, so this should have been unassigned from you. However, just recently I started a discussion on the python-dev list, and referenced this ticket, and they said that the symptoms that we observed are not the symptoms they would expect from having an inconsistency of internal unicode format between Python interpreter and Python module. If that were the problem, we should have seen something like "undefined symbol: PyUnicodeUCS4_FromUnicode", not the utf-8 decode error that we saw.

Here is the comment on python-dev to that effect:

http://mail.python.org/pipermail/python-dev/2009-September/091943.html

So, now there is something you could do to help: see if you still have that pyOpenSSL library that you mentioned, the removal of which fixed this problem for you, so we can try to see what was wrong with it.

comment:18 Changed at 2009-09-22T02:36:30Z by zooko

  • Summary changed from eggs don't say whether they have UCS2 or UCS4 unicode implementation to utf-8 decoding fails when certain pyOpenSSL library is used

comment:19 Changed at 2009-09-22T02:37:14Z by zooko

By the way, over on http://bugs.python.org/setuptools/issue78 midnightmagic says that he had the same symptoms. Maybe he could help us diagnose it.

comment:20 Changed at 2009-09-22T02:37:43Z by zooko

  • Cc midnightmagic added

comment:21 Changed at 2009-09-22T03:11:17Z by zooko

I opened a bug report with the pyOpenSSL project: https://bugs.launchpad.net/setuptools/+bug/434411 . pyOpenSSL uses launchpad as its issue tracker, and launchpad has a nice quality of integrating with other issue trackers in order to track issues which span multiple projects. launchpad bug 434411 is currently linked to pyOpenSSL, Tahoe-LAFS, and setuptools, although it may turn out that this issue is independent of the setuptools issue, which has to do with whether your python packages use UCS4 or UCS2 internal unicode encoding.

comment:22 Changed at 2009-09-22T03:21:00Z by launchpad

  • Launchpad Bug set to 434411

Updating Launchpad bug reference

comment:23 Changed at 2009-10-27T03:09:07Z by zooko

  • Resolution set to wontfix
  • Status changed from new to closed

Okay, we can't reproduce this issue so I'm going to close this ticket as "wontfix".

comment:24 Changed at 2009-11-23T02:42:36Z by davidsarah

  • Keywords utf-8 unicode added

comment:25 Changed at 2010-04-11T17:35:18Z by zooko

  • Resolution wontfix deleted
  • Status changed from closed to reopened

kpreid encountered this same issue. I will add details to http://launchpad.net/bugs/434411 .

comment:26 Changed at 2010-06-17T21:13:06Z by davidsarah

  • Keywords openssl added

comment:27 Changed at 2011-12-29T06:43:08Z by zooko

  • Resolution set to wontfix
  • Status changed from reopened to closed

I'm closing this (again) as wontfix -- only the pyOpenSSL project, or possibly Python or setuptools or someone -- can fix this.

Note: See TracTickets for help on using tickets.