[tahoe-dev] character encodings and binary data and URLs

zooko zooko at zooko.com
Thu May 15 06:02:27 PDT 2008

Dear people of rest-discuss:

What's the right way to specify character encoding in URLs, POST  
forms, and JSON-encoded data?

I work on an open source secure, decentralized filesystem -- the  
"Tahoe" Least-Authority Filesystem [1] -- with a RESTful API [2].

We need to decide, when the user specifies a filename, either in the  
URL or in a POST form, or or when the server returns a filename to  
the user in an HTTP response, what character encoding to use.

Our current rules are like this:

1.  Filenames in URLs are always utf-8 encoded.  So after splitting  
on "/" to get individual segments, we utf-8 decode each segment  
before doing anything else with it.

2.  POST forms have a _charset field which specifies the encoding of  
all the values in the form, and if not present it is assumed to be  

3.  Responses are encoded in JSON, so the filenames are in unicode.

It seems unfortunate to constrain users of our system to unicode  
filenames, and further to constrain them to utf-8 encoding, but I  
can't think of another alternative which doesn't leave things  

Are these type of rules for encoding fairly standard in the REST world?




[1] http://allmydata.org
[2] http://allmydata.org/trac/tahoe/browser/docs/webapi.txt

More information about the tahoe-dev mailing list