[tahoe-dev] character encodings and binary data and URLs

zooko zooko at zooko.com
Thu May 15 06:02:27 PDT 2008


Dear people of rest-discuss:

What's the right way to specify character encoding in URLs, POST  
forms, and JSON-encoded data?

I work on an open source secure, decentralized filesystem -- the  
"Tahoe" Least-Authority Filesystem [1] -- with a RESTful API [2].

We need to decide, when the user specifies a filename, either in the  
URL or in a POST form, or or when the server returns a filename to  
the user in an HTTP response, what character encoding to use.

Our current rules are like this:

1.  Filenames in URLs are always utf-8 encoded.  So after splitting  
on "/" to get individual segments, we utf-8 decode each segment  
before doing anything else with it.

2.  POST forms have a _charset field which specifies the encoding of  
all the values in the form, and if not present it is assumed to be  
utf-8.

3.  Responses are encoded in JSON, so the filenames are in unicode.

It seems unfortunate to constrain users of our system to unicode  
filenames, and further to constrain them to utf-8 encoding, but I  
can't think of another alternative which doesn't leave things  
underspecified.

Are these type of rules for encoding fairly standard in the REST world?

Thanks!

Regards,

Zooko

[1] http://allmydata.org
[2] http://allmydata.org/trac/tahoe/browser/docs/webapi.txt


More information about the tahoe-dev mailing list