#1634 closed defect (fixed)

TypeError due to "size" being None in pyfilesystem+dokan

Reported by: zooko Owned by: Zancas
Priority: major Milestone: 1.10.1
Component: code-frontend-web Version: 1.9.0
Keywords: pyfilesystem dokan error webapi docs mutable review-needed Cc: clashthebunny@…, leif@…
Launchpad Bug:

Description (last modified by leif)

I saw this bug report to pyfilesystem:

http://code.google.com/p/pyfilesystem/issues/detail?id=96

tahoe-lafs 1.9.0 + pyfilesystem 0.4 (and svn r723) + dokan 0.6.0
OS Windows 7 x64 Russian + Python 2.7.2 x86

Traceback (most recent call last):
  File "_ctypes/callbacks.c", line 313, in 'calling callback function'
  File "C:\Program Files (x86)\Python27\lib\site-packages\fs\expose\dokan\__init__.py", line 256, in wrapper
    return func(self,*args)
  File "C:\Program Files (x86)\Python27\lib\site-packages\fs\expose\dokan\__init__.py", line 173, in wrapper
    res = func(*args,**kwds)
  File "C:\Program Files (x86)\Python27\lib\site-packages\fs\errors.py", line 191, in wrapper
    return func(*args,**kwds)
  File "C:\Program Files (x86)\Python27\lib\site-packages\fs\expose\dokan\__init__.py", line 604, in FindFilesWithPattern
    data = self._info2finddataw(fpath,finfo,None)
  File "C:\Program Files (x86)\Python27\lib\site-packages\fs\expose\dokan\__init__.py", line 794, in _info2finddataw
    data.nFileSizeHigh = info.get("size",0) >> 32
TypeError: unsupported operand type(s) for >>: 'NoneType' and 'int'

As far as I know, pyfilesystem doesn't have a strong test suite, so it will require more effort from someone to figure out where exactly the problem lies in pyfilesystem, Tahoe-LAFS, dokan, or the interfaces between them...

Change History (32)

comment:1 Changed at 2012-03-12T19:09:34Z by davidsarah

  • Keywords pyfilesystem dokan error added

comment:2 Changed at 2012-12-19T22:46:00Z by ClashTheBunny

  • Cc clashthebunny@… added

I'm thinking this is related to my same issue. I have a Tahoe-LAFS mounted through PyFS on Linux and I was using the same one earlier on Windows with docan. I saw similar things on docan, but didn't write them down. I mount the directory and do an 'ls'. This is the error I'm seeing on Linux:

Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/fs/expose/fuse/fuse_ctypes.py", line 294, in _wrapper_
    return func(*args, **kwargs) or 0
  File "/usr/local/lib/python2.7/dist-packages/fs/expose/fuse/fuse_ctypes.py", line 425, in readdir
    set_st_attrs(st, attrs)
  File "/usr/local/lib/python2.7/dist-packages/fs/expose/fuse/fuse_ctypes.py", line 239, in set_st_attrs
    setattr(st, key, val)
TypeError: an integer is required

This is due to me having some MDMF files in the directory. When I unlink them from the directory, I get immediate results from my 'ls'. I think the problem is that MDMF files don't have a "size" always. In the JSON for the DIR:CAP, I have the file size listed as:

"size": null

When I run a "check/renew/verify every bit" on the MDMF, it spits out the size for a while and then the 'ls' works for a time.

The change to fix this should either be in TahoeLAFS to output something like 0 for a null sized file, or at least an integer, like "-1", or PyFS should be updated to check "size" for null and replace it with 0 or a "-1".

UPDATE:

I did get it to do it again on Windows, and it seems to be the same error:

Traceback (most recent call last):
  File "_ctypes/callbacks.c", line 314, in 'calling callback function'
  File "C:\Python27\lib\site-packages\fs\expose\dokan\__init__.py", line 256, in
 wrapper
    return func(self,*args)
  File "C:\Python27\lib\site-packages\fs\expose\dokan\__init__.py", line 173, in
 wrapper
    res = func(*args,**kwds)
  File "C:\Python27\lib\site-packages\fs\errors.py", line 191, in wrapper
    return func(*args,**kwds)
  File "C:\Python27\lib\site-packages\fs\expose\dokan\__init__.py", line 604, in
 FindFilesWithPattern
    data = self._info2finddataw(fpath,finfo,None)
  File "C:\Python27\lib\site-packages\fs\expose\dokan\__init__.py", line 794, in
 _info2finddataw
    data.nFileSizeHigh = info.get("size",0) >> 32
TypeError: unsupported operand type(s) for >>: 'NoneType' and 'int'

Also, this may be a good idea to fix in the web interface. I don't know why size is ? for MDMF files when I look at a DIRCAP through the web interface. Is my file there? Is it corrupted? Is it just hard to calculate the size? Maybe a message other than "?" would be better

Last edited at 2012-12-19T22:54:19Z by ClashTheBunny (previous) (diff)

comment:3 Changed at 2012-12-19T22:55:19Z by ClashTheBunny

  • Keywords wui added

comment:4 follow-up: Changed at 2012-12-21T00:59:57Z by davidsarah

  • Component changed from code to code-frontend-web
  • Keywords webapi docs mutable test-needed added; wui removed
  • Milestone changed from undecided to 1.11.0

The '?' in the WUI is intended behaviour. It is too inefficient to calculate the size for every mutable file in a directory listing; it would require querying each file individually. Similarly, it's intended that the JSON output not include the actual file size. Representing the lack of a known file size by "size": null is dubious, though (even though it is valid JSON). We normally represent unknown information by omitting a field (which the pyFilesystem Tahoe-LAFS backend would be able to handle, using info.get("size",0)). For example, that's what we do for the "mutable" field on unknown nodes.

webapi.rst fails to document how an unknown file size is represented. tahoe_ls.py appears to assume that an unknown size is represented by omitting the "size" field. (So, I'm not sure why tahoe ls correctly prints "?" rather than "null" for mutable files. Perhaps "size": null is only included sometimes?)

In any case, the bug in the description doesn't seem to be related to the WUI. If we decide that omitting the "size" field is correct, then it's a bug in the web-API with JSON output; if we decide that "size": null is correct, it's a bug in the pyFilesystem Tahoe-LAFS backend. Either way, webapi.rst also needs to be clarified.

comment:5 in reply to: ↑ 4 Changed at 2012-12-21T01:18:44Z by davidsarah

Replying to davidsarah:

We normally represent unknown information by omitting a field (which the pyFilesystem Tahoe-LAFS backend would be able to handle, using info.get("size",0)).

Correction: a missing "size" field will actually be coerced to 0 earlier, in this code: http://code.google.com/p/pyfilesystem/source/browse/trunk/fs/contrib/tahoelafs/util.py#79

comment:6 Changed at 2013-04-22T14:17:00Z by daira

  • Owner changed from somebody to daira
  • Status changed from new to assigned

I'll accept this for 1.11. I intend to make it omit the "size" field.

comment:7 Changed at 2013-08-28T15:20:26Z by leif

  • Cc leif@… added
  • Description modified (diff)

comment:8 Changed at 2013-08-28T16:04:30Z by daira

  • Milestone changed from soon to 1.11.0

comment:9 Changed at 2014-09-19T19:08:22Z by Zancas

  • Owner changed from daira to Zancas
  • Status changed from assigned to new

comment:10 Changed at 2014-09-19T20:30:02Z by Zancas

The next thing I need to determine is which function is returning the "size" containing JSON which needs to have "size" omitted. I need to confirm the case(s) in which size should be omitted, and make sure that I do so at the correct point in the call stack.

comment:11 Changed at 2014-09-19T20:30:29Z by warner

I'm late to the party, but in my opinion:

  • It'd be wrong to claim size=0 when in fact we don't know. We need a different indicator
  • For a python interface, I'd return None to mean "we don't know". The two basic options are to have data["size"] = None or to omit data["size"] entirely. I like consistency. Both options require the client to be prepared to deal with an unknown-size, either with something like print 'size: %s' % (data['size'] or 'unknown') or print 'size: %s' % data.get('size', 'unknown'). I guess I'm slightly in favor of having the set of keys be constant, and put the variability in the values.
  • For JSON, I think it's reasonable to return null
  • But I'm -0 on omitting the field: I won't oppose it if that's what someone else prefers.
  • If a frontend like pyfilesystem is coercing null or a missing key into some other value like "0", that's their business. We shouldn't be lying to them about the size of the file, but they're welcome to lie to themselves or their own users.

comment:12 Changed at 2014-09-19T20:44:41Z by warner

Oh also, I think everyone already agrees with this, but human-facing tools like tahoe ls should print "?" instead of "null". Machine-facing tools like the WAPI should provide clearly-distinct types (whether using null or omitting the key). I don't want tahoe ls to print "FILENAME.. stuff.. size=null", that's just inhumane. And I don't want the WAPI to return size: "?" because that's inhumane to the poor program that's trying to deal with the data.

comment:13 Changed at 2014-09-19T23:18:37Z by warner

So mutable filenodes (SMDF or MDMF) are allowed to return None from IFilesystemNode.get_size() to indicate "I don't know yet". These mutable filenodes cache their size when it is fetched, so the first time you access them, you'll get None, and then if you look closely enough at them to warrant network traffic (i.e. you call get_current_size()), then you'll get real numbers from get_size() for as long as you hold onto that node (or it stays in the filenode cache).

We can change the JSON return value from the WAPI independently. I'm still slightly in favor of value=null over omit-the-key, but either way is pretty easy.

To use value=null, the tahoe code is already correct, and what needs to change is in pyfilesystem. In the code I've seen (https://code.google.com/p/pyfilesystem/source/browse/trunk/fs/contrib/tahoelafs/util.py#79), that data.get('size', 0) should be replaced with data.get('size') or 0.

To use omit-the-key, we must change tahoe's web/directory.py line 882 (in DirectoryJSONMetadata) to fetch childnode.get_size() but then only add it to kiddata[1]['size'] if it's not None.

comment:14 Changed at 2014-09-21T01:57:27Z by Zancas

My tentative preference is that tahoe returns '"size":None'.

This provides more information than simply removing the key.

It seems to me that if a third party asks for a key:value and gets KeyError they should have guarantees about what that means.

If it sometimes means that the requested metadata hasn't been collected for the file-of-interest, and at other times it means "that sort of metadata doesn't exist" then they'll learn less.

So, a query of a file's "size" attribute (which is sometimes known) should provoke a different response than a query of a file's "galactic coordinate" (which is never known).

Last edited at 2014-09-21T01:59:00Z by Zancas (previous) (diff)

comment:15 Changed at 2014-09-21T05:27:48Z by warner

Ok, then I think this ticket should have two tasks:

  • add a note to doc/frontends/webapi.rst that documents the size property as always being present, but sometimes being null, and explain when and why
  • file a ticket with the pyfilesystem (or dokan?) folks to handle this null properly

Then we can close this one.

comment:16 Changed at 2014-09-23T16:57:46Z by Zancas

We've changed our minds.

It turns out that the codebase has a well-established convention of omitting key:value JSON metadata in cases where the LASS (Least Authority Storage System) doesn't know the relevant value.

Therefore the new fix is to implement the fix warner mentioned above, i.e. only include a "size" key if childnode.get_size() is not None, on _this_ line:

https://github.com/tahoe-lafs/tahoe-lafs/blob/d7aef0cffb2a28465f75c883128e99045dadcb82/src/allmydata/web/directory.py#L882

comment:17 Changed at 2014-09-23T17:23:29Z by warner

We might also want to update the docs to remind devs that "size" can be missing (the docs currently warn about missing fields for unknown formats, but not known-format mutable files).

comment:18 Changed at 2014-09-23T18:29:38Z by Zancas

Acknowledged. Shall I propose a pull request that will update this doc:

https://github.com/tahoe-lafs/tahoe-lafs/blob/d7aef0cffb2a28465f75c883128e99045dadcb82/docs/frontends/webapi.rst

??

Are there other docs I should update as well?

comment:19 Changed at 2014-09-23T18:40:58Z by warner

Yeah, update webapi.rst, that's the only one that needs changing. Look for the places that describe t=json . I think there will be two: one for directories, and one for files. This will change what directories return, but not files (which might be worth calling out in the change: "note that, unlike directories, t=json for files *will* generally return the size." well, except for unknown URI types.). Thanks!

comment:20 Changed at 2014-09-23T21:46:38Z by daira

The place to change is this paragraph under docs/frontends/webapi.rst#getting-information-about-a-file-or-directory-as-json:

In the above example, note how 'children' is a dictionary in which the keys are child names and the values depend upon whether the child is a file or a directory. The value is mostly the same as the JSON representation of the child object (except that directories do not recurse -- the "children" entry of the child is omitted, and the directory view includes the metadata that is stored on the directory edge).

comment:21 Changed at 2014-09-23T23:38:57Z by Zancas

Sigh, well I've not gotten the test done yet. Hopefully I can sneak it in tonight. I'll do the doc last.

https://github.com/zancas/tahoe-lafs/commit/fd4313d44d90fd94d49b19afbb576b6969defd3f#diff-497c2097e92e34395971db2baefa14b4

comment:22 Changed at 2014-10-02T21:10:37Z by Zancas

If I Understand Correctly (IIUC), Warner and I agree that the following steps would constitute a reasonable approach to this issue:

  1. refactor synchronous node metadata acquisition as exemplified by get_size into a discrete function (which lives in web/common.py?):

get_childnode_metadata(childnode) --> {metadata1:value, metadata2:value, etc...}

[1a. If it increases DRYness also refactor JSON dumping into discrete function]

  1. use the new function[s] when the webserver handles requests against the backend asking for that metadata, see below "grep" results:
$ grep -n -r -e"node\.get_size" src/allmydata/web
src/allmydata/web/filenode.py:361:        filesize = self.filenode.get_size()
src/allmydata/web/filenode.py:422:        filesize = self.filenode.get_size()
src/allmydata/web/filenode.py:502:    data[1]['size'] = filenode.get_size()
src/allmydata/web/directory.py:882:                kiddata = ("filenode", {'size': childnode.get_size(),

CAVEAT EMPTOR:

"get_current_size" type requests which return a deferred (e.g. ~ web/info.py line 63) are not affected by this refactor

... ehhh still working on this ticket, but I'm switching to my laptop, so look forward to the next comment.

comment:23 Changed at 2014-10-02T21:13:00Z by daira

+1

comment:24 Changed at 2014-10-07T21:48:32Z by Zancas

Helper should be called "get_filenode_metadata" not "childnode".

comment:25 Changed at 2014-10-10T02:52:30Z by Zancas

OK, I've submitted a pull request, here:

https://github.com/tahoe-lafs/tahoe-lafs/pull/119

There's a significant enhancement which might be worth doing. We could make mock objects with "spec_set=RealObject()" arguments.

For example:

import mock

from allmydata.immutable.filenode import ImmutableFileNode

XXX

immutable_filenode_template_object = ImmutableFileNode(foo, blah, spam, ...)

mock_immutable_filenode = mock.Mock(spec_set=immutable_filenode_template_object)

Then do the same for the other filenode types we're mocking.

The reason I didn't do this already is because in the "XXX" above I'd need to make mock "foo", "blah", and "spam" objects. I think this is worth it _except_ mock filenodes of the appropriate type have almost certainly already been constructed (in test_web.py), so I thought I'd mention this potential work here before reinventing the wheel!

Also, https://github.com/rrymer contributed significantly to this code.

Last edited at 2014-10-10T02:53:32Z by Zancas (previous) (diff)

comment:27 Changed at 2014-10-10T16:22:23Z by Zancas

Warner recommended the refactor, and then recommended making it its own commit.

comment:28 Changed at 2014-10-10T18:08:18Z by Zancas

OK, please see a new pull request here:

https://github.com/tahoe-lafs/tahoe-lafs/pull/120

comment:29 Changed at 2014-10-10T18:08:46Z by Zancas

  • Keywords review needed added

comment:30 Changed at 2014-10-10T18:09:09Z by Zancas

  • Keywords review-needed added; review needed removed

comment:31 Changed at 2014-10-10T19:17:00Z by zooko

  • Resolution set to fixed
  • Status changed from new to closed

comment:32 Changed at 2014-10-10T19:36:16Z by Daira Hopwood <daira@…>

In 346b5c4c3d1cf5bed7389936dea72b5204b904ea/trunk:

Don't call filenode.get_size() twice. refs #1634

Signed-off-by: Daira Hopwood <daira@…>

Note: See TracTickets for help on using tickets.