[tahoe-lafs-trac-stream] [Tahoe-LAFS] #3672: UnicodeDecodeError in Eliot messages
Tahoe-LAFS
trac at tahoe-lafs.org
Wed Apr 21 13:58:02 UTC 2021
#3672: UnicodeDecodeError in Eliot messages
--------------------------+------------------------------
Reporter: itamarst | Owner:
Type: defect | Status: new
Priority: normal | Milestone: Support Python 3
Component: unknown | Version: n/a
Resolution: | Keywords:
Launchpad Bug: |
--------------------------+------------------------------
Comment (by itamarst):
More context:
1. Eliot was originally developed on Python 2, where bytestrings were the
norm.
2. JSON doesn't know about bytes.
For JSON serialization Eliot therefore followed Python's lead, where if
bytes looked like a UTF-8-encoded unicode string, they were serialized as
a JSON string.
With Python 3, bytestrings are no longer the default. Which means bytes
are more likely to be ... bytes, and so on Python 3 Eliot decided not to
handle bytes by default in log messages, since it's not clear what the
correct thing to do is. How to handle them is left up to individual
applications.
As a result, Tahoe-LAFS on Python 3 needs a policy decision on how to
handle byte serialization. The initial policy decision was "handle bytes
that look like UTF-8-encoded unicode strings".
However, it turns out Tahoe actually logs random byte strings, some of
which are very much not UTF-8 decodable. This PR allows Tahoe to continue
doing so by using hex quoting when necessary.
--
Ticket URL: <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/3672#comment:1>
Tahoe-LAFS <https://Tahoe-LAFS.org>
secure decentralized storage
More information about the tahoe-lafs-trac-stream
mailing list