[tahoe-dev] [tahoe-lafs] #534: "tahoe cp" command encoding issue

tahoe-lafs trac at allmydata.org
Sun Dec 6 13:18:06 PST 2009


#534: "tahoe cp" command encoding issue
-----------------------------------+----------------------------------------
     Reporter:  francois           |       Owner:  francois                                 
         Type:  defect             |      Status:  new                                      
     Priority:  minor              |   Milestone:  eventually                               
    Component:  code-frontend-cli  |     Version:  1.2.0                                    
   Resolution:                     |    Keywords:  cp unicode filename forward-compatibility
Launchpad_bug:                     |  
-----------------------------------+----------------------------------------

Comment(by davidsarah):

 To fix #734, {{{unicode_to_stdout}}} in the patched
 {{{util/stringutil.py}}} should be something like:
 {{{
 def unicode_to_stdout(s):
     """
     Encode an unicode object for representation on stdout.
     """

     if s is None:
         return None
     precondition(isinstance(s, unicode), s)

     try:
         return s.encode(sys.stdout.encoding, 'replace')
     catch LookupError:
         return s.encode('utf-8', 'replace')
 }
 }}}

 (This doesn't explain why {{{tahoe_ls.py}}}} was attempting to use None
 for {{{name}}} it was trying to convert, but that's presumably a separate
 issue not caused by the patch.)

 The reason for the {{{catch LookupError}}} is that Python can sometimes
 set {{{sys.stdout.encoding}}} to an encoding that it does not recognize.
 For example, if you have Windows {{{cmd.exe}}} set to use UTF-8 by {{{chcp
 65001}}}, {{{sys.stdin.encoding}}} and {{{sys.stdout.encoding}}} will be
 'cp65001', which is not recognized as being the same as UTF-8. (If the
 encoding was actually something other than UTF-8, I think that producing
 mojibake on stdout is better than throwing an exception. Note that
 terminals that do bad things when they receive control characters are
 already broken; this isn't making it worse, because plain ASCII can
 include such control characters.)

 There are other issues with the patch; I just wanted to comment on this
 part while it's swapped into my brain (and people are now calling me to go
 and play scrabble, which sounds more fun :-)

-- 
Ticket URL: <http://allmydata.org/trac/tahoe/ticket/534#comment:75>
tahoe-lafs <http://allmydata.org>
secure decentralized file storage grid


More information about the tahoe-dev mailing list