#3672 new defect

UnicodeDecodeError in Eliot messages

Reported by: itamarst Owned by:
Priority: normal Milestone: Support Python 3
Component: unknown Version: n/a
Keywords: Cc:
Launchpad Bug:


Eliot in Tahoe-LAFS currently assumes bytes are always UTF-8 encoded. This is not always the case.

Running trial allmydata:

{"exception": "exceptions.UnicodeDecodeError", "timestamp": 1617718191.704726, "task_uuid": "a8ecfc19-947b-46e2-a1dd-bffcb4be5fc5", "reason": "'utf8' codec can't decode byte 0x9d in position 6: invalid start byte", "message": "{u\"u'message_type'\": u\"u'immutable:upload:get-share-placements'\", u\"u'timestamp'\": u'1617718191.704558', u\"'happiness_mappings'\": u\"{0: 'b3llgpwwqwozijzje7ydgossrdyqig5e'}\", u\"'happiness'\": u'1', u\"'existing_shares'\": u'{\"\\\\x0e\\\\xd6\\\\xb3>\\\\xd6\\\\x85\\\\x9d\\\\x94\\')\\'\\\\xf03:R\\\\x88\\\\xf1\\\\x04\\\\x1b\\\\xa4\": [0]}', u\"u'task_level'\": u'[2, 3, 2]', u\"u'task_uuid'\": u\"u'a8ecfc19-947b-46e2-a1dd-bffcb4be5fc5'\", u\"'total_shares'\": u'1', u\"'peers'\": u\"['6jdspiha6nw2az6fqglwfzbu2c2uvnfg', 'b3llgpwwqwozijzje7ydgossrdyqig5e']\", u\"'readonly_peers'\": u'[]'}", "message_type": "eliot:destination_failure", "task_level": [2, 3, 3]}
{"timestamp": 1617718191.715011, "task_uuid": "a8ecfc19-947b-46e2-a1dd-bffcb4be5fc5", "message_type": "immutable:upload:get-shareholders:converged-happiness", "effective_happiness": 1, "task_level": [2, 3, 4]}
{"exception": "exceptions.UnicodeDecodeError", "timestamp": 1617718191.715412, "task_uuid": "a8ecfc19-947b-46e2-a1dd-bffcb4be5fc5", "reason": "'utf8' codec can't decode byte 0x9d in position 6: invalid start byte", "message": "{u\"u'timestamp'\": u'1617718191.715314', u\"u'action_status'\": u\"u'succeeded'\", u\"'upload_trackers'\": u'[]', u\"u'task_level'\": u'[2, 3, 5]', u\"u'task_uuid'\": u\"u'a8ecfc19-947b-46e2-a1dd-bffcb4be5fc5'\", u\"'already_serverids'\": u'{0: set([\"\\\\x0e\\\\xd6\\\\xb3>\\\\xd6\\\\x85\\\\x9d\\\\x94\\')\\'\\\\xf03:R\\\\x88\\\\xf1\\\\x04\\\\x1b\\\\xa4\"])}', u\"u'action_type'\": u\"u'immutable:upload:locate-all-shareholders'\"}", "message_type": "eliot:destination_failure", "task_level": [2, 4]}

Change History (1)

comment:1 Changed at 2021-04-21T13:58:02Z by itamarst

More context:

  1. Eliot was originally developed on Python 2, where bytestrings were the norm.
  2. JSON doesn't know about bytes.

For JSON serialization Eliot therefore followed Python's lead, where if bytes looked like a UTF-8-encoded unicode string, they were serialized as a JSON string.

With Python 3, bytestrings are no longer the default. Which means bytes are more likely to be ... bytes, and so on Python 3 Eliot decided not to handle bytes by default in log messages, since it's not clear what the correct thing to do is. How to handle them is left up to individual applications.

As a result, Tahoe-LAFS on Python 3 needs a policy decision on how to handle byte serialization. The initial policy decision was "handle bytes that look like UTF-8-encoded unicode strings".

However, it turns out Tahoe actually logs random byte strings, some of which are very much not UTF-8 decodable. This PR allows Tahoe to continue doing so by using hex quoting when necessary.

Note: See TracTickets for help on using tickets.