﻿id	summary	reporter	owner	description	type	status	priority	milestone	component	version	resolution	keywords	cc	launchpad_bug
629	'tahoe backup' doesn't tolerate 8-bit filenames	warner	zooko	"Sigh, another unicode problem, not unlike #565 or #534, but this time
affecting 'tahoe backup'. Sometimes the sqlite database doesn't like to
accept 8-bit strings.

I think that the database should hold whatever pathname we get back from
os.listdir(), i.e. if os.listdir() returns bytestrings that contain UTF-8
encoded filenames, then the database should hold bytestrings that happen to
contain UTF-8 encoded filenames. This means modifying the way we use sqlite
to tolerate bytestrings. According to this traceback provided by Andrej
Falout, this involves a custom text_factory:

{{{
  File ""/usr/src/tahoe/allmydata-tahoe-1.2.0-r3558/src/allmydata/scripts/tahoe_backup.py"", line 243, in process
    newfilecap, metadata = self.upload(childpath)
  File ""/usr/src/tahoe/allmydata-tahoe-1.2.0-r3558/src/allmydata/scripts/tahoe_backup.py"", line 322, in upload
    must_upload, bdb_results = self.check_backupdb(childpath)
  File ""/usr/src/tahoe/allmydata-tahoe-1.2.0-r3558/src/allmydata/scripts/tahoe_backup.py"", line 268, in check_backupdb
    r = self.backupdb.check_file(childpath, use_timestamps)
  File ""/usr/src/tahoe/allmydata-tahoe-1.2.0-r3558/src/allmydata/scripts/backupdb.py"", line 168, in check_file
    (path,))
sqlite3.ProgrammingError: You must not use 8-bit bytestrings unless
you use a text_factory that can interpret 8-bit bytestrings (like
text_factory = str). It is highly recommended that you instead just
switch your application to Unicode strings.
Command exited with non-zero status 1
}}}

This still leaves the question of what filenames we should pass to Tahoe. The
Tahoe webapi expects URL-encoded UTF-8-encoded bytestrings in the URL, and
the set_children() command expects JSON-encoded unicode strings as
childnames. The main question (as explored by #565/#534) is how to get from
the return value of os.listdir() (and the starting point in sys.argv) to a
tahoe-suitable unicode object.. that part depends upon what the local
encoding is, and on what convention is in use on any particular local
filesystem.
"	defect	closed	major	1.7.0	code-frontend-cli	1.3.0	fixed	tahoe-backup unicode reviewed	freestorm77@…	
