[tahoe-dev] character encodings and binary data and URLs
zooko
zooko at zooko.com
Thu May 15 06:02:27 PDT 2008
Dear people of rest-discuss:
What's the right way to specify character encoding in URLs, POST
forms, and JSON-encoded data?
I work on an open source secure, decentralized filesystem -- the
"Tahoe" Least-Authority Filesystem [1] -- with a RESTful API [2].
We need to decide, when the user specifies a filename, either in the
URL or in a POST form, or or when the server returns a filename to
the user in an HTTP response, what character encoding to use.
Our current rules are like this:
1. Filenames in URLs are always utf-8 encoded. So after splitting
on "/" to get individual segments, we utf-8 decode each segment
before doing anything else with it.
2. POST forms have a _charset field which specifies the encoding of
all the values in the form, and if not present it is assumed to be
utf-8.
3. Responses are encoded in JSON, so the filenames are in unicode.
It seems unfortunate to constrain users of our system to unicode
filenames, and further to constrain them to utf-8 encoding, but I
can't think of another alternative which doesn't leave things
underspecified.
Are these type of rules for encoding fairly standard in the REST world?
Thanks!
Regards,
Zooko
[1] http://allmydata.org
[2] http://allmydata.org/trac/tahoe/browser/docs/webapi.txt
More information about the tahoe-dev
mailing list