#677 assigned defect

WebAPI: GET /uri/$FILECAP?t=json doesn't return size for mutable files, but the HTML version does

Reported by: soult Owned by: davidsarah
Priority: minor Milestone: soon
Component: code-frontend-web Version: 1.3.0
Keywords: test mutable test-needed Cc:
Launchpad Bug:

Description

Compare: HTML version JSON version

The JSON version is returning "?" as the file size, while the HTML version returns the correct size. This only happens on mutable files.

Attachments (3)

fix_mutable_size_in_json.patch (234.1 KB) - added by swillden at 2009-06-17T02:27:28Z.
Patch that appears to fix the problem
current-size-in-mutable-file-json.dpatch (2.4 KB) - added by davidsarah at 2010-07-23T00:18:37Z.
web.filenode: include the current (rather than cached) file size in the JSON metadata for a mutable file. fixes #677
omit-size-from-dir-json-if-unknown.dpatch (1.4 KB) - added by davidsarah at 2010-07-23T00:19:23Z.
web.directory: omit the 'size' field for a mutable file in the JSON rendering of a directory if it is not known.

Download all attachments as: .zip

Change History (23)

Changed at 2009-06-17T02:27:28Z by swillden

Patch that appears to fix the problem

comment:1 follow-up: Changed at 2009-06-17T02:30:44Z by swillden

This one is causing me some trouble, so I did some investigation.

The immediate culprit is them implementation of MutableFileNode?.get_size(), in mutable/filenode.py, line 195. The implementation is supremely simple, and not very useful:

return "?"

I'm assuming this is because finding out the size of a mutable node isn't easy. The implementation of the HTML "More Info" page, makes a deferred call to get_size_of_best_version(), so I presume that's what's needed for the JSON as well.

I've attached a patch that seems to fix the problem, by doing a deferred call to get_size_of_best_version(), but I'm not very familiar with this code so I don't know it's the best way to handle the issue. I'll create a test case, too.

comment:2 Changed at 2009-06-17T08:58:14Z by warner

yeah, get_size_of_best_version() is fairly new, and the JSON-rendering code was written before it was available. Also, I seem to remember that requiring a Deferred to get that size was annoying, and I didn't want to slow down a directory listing by doing a mapupdate for every single mutable file therein.

For CHK files, the size is stored in the filecap, so it's trivial to get it. For mutable files, you have to ask around and find out what versions are available. This "mapupdate" operation isn't as expensive as actually retrieving the file's contents, but it's close (a roundtrip or two).

Your approach sounds reasonable. Be sure to think about whether it's t=JSON on a single file that gets this fix, and/or t=JSON on a whole directory (which might not be a good idea, if there are lots of mutable files in it). It'd probably be better to leave the "size" field out of the elements of a directory fetch than to include "?" strings.

comment:3 Changed at 2009-11-03T03:23:08Z by davidsarah

Is it worth getting the size(s) only when asked for, for example using t=JSON+size?

comment:4 Changed at 2009-12-13T00:51:21Z by davidsarah

  • Keywords review added
  • Priority changed from trivial to minor

comment:5 Changed at 2009-12-21T02:54:14Z by davidsarah

  • Keywords review-needed added; review removed

comment:6 follow-up: Changed at 2010-01-07T19:13:57Z by kevan

This patch conflicts with the current code. Here is the conflict:

        if t == "json":
v v v v v v v
            if self.node.is_mutable():
                d = defer.maybeDeferred(self.node.get_size_of_best_version)
                d.addCallback(lambda sz: FileJSONMetadata(ctx, self.node, sz))
                return d
            else:
                return FileJSONMetadata(ctx, self.node)
*************
            if self.parentnode and self.name:
                d = self.parentnode.get_metadata_for(self.name)
            else:
                d = defer.succeed(None)
            d.addCallback(lambda md: FileJSONMetadata(ctx, self.node, md))
            return d
^ ^ ^ ^ ^ ^ ^

A quick fix for that conflict (with ugly nested function) is:

if self.parentnode and self.name:
    d = self.parentnode.get_metadata_for(self.name)
else:
    d = defer.succeed(None)
if self.node.is_mutable():
    def _get_best_size(md):
        d = self.node.get_size_of_best_version()
        d.addCallback(lambda sz: (sz, md))
        return d
    d.addCallback(_get_best_size)
    d.addCallback(lambda (md, sz):
                              FileJSONMetadata(ctx, self.node, md, sz))
else:
    d.addCallback(lambda md: FileJSONMetadata(ctx, self.node, md))
return d

There is another conflict in the definition of FileJSONMetadata:

v v v v v v v
def FileJSONMetadata(ctx, filenode, best_size = None):
*************
def FileJSONMetadata(ctx, filenode, edge_metadata):
^ ^ ^ ^ ^ ^ ^

The previous conflicting code passes some metadata to this function so that (IIRC) the 'tahoe ls' command woks correctly. Perhaps a fix here would be to take both edge metadata and best size as parameters?

Also, this fix should have a test written.

When these issues are fixed, I'll be happy to re-review the patch.

comment:7 in reply to: ↑ 6 ; follow-up: Changed at 2010-01-07T21:02:09Z by davidsarah

  • Keywords test added

Replying to kevan:

A quick fix for that conflict (with ugly nested function) is:

if self.parentnode and self.name:
    d = self.parentnode.get_metadata_for(self.name)
else:
    d = defer.succeed(None)
if self.node.is_mutable():
    def _get_best_size(md):
        d = self.node.get_size_of_best_version()
        d.addCallback(lambda sz: (sz, md))
        return d
    d.addCallback(_get_best_size)
    d.addCallback(lambda (md, sz):
                              FileJSONMetadata(ctx, self.node, md, sz))
else:
    d.addCallback(lambda md: FileJSONMetadata(ctx, self.node, md))
return d

I'm totally flummoxed by this code. (I even had to look up whether d in _get_best_size will shadow the outer d or reference it. According to PEP 227, this changed between versions of Python.) Also using (sz, md) in one place and (md, sz) in another looks wrong.

I suspect that using two separate deferreds with different variable names would be clearer.

comment:8 in reply to: ↑ 7 Changed at 2010-01-07T22:39:35Z by kevan

Replying to davidsarah:

I'm totally flummoxed by this code. (I even had to look up whether d in _get_best_size will shadow the outer d or reference it. According to PEP 227, this changed between versions of Python.) Also using (sz, md) in one place and (md, sz) in another looks wrong.

I suspect that using two separate deferreds with different variable names would be clearer.

Indeed, the ordering of md and sz is wrong -- I'd fixed it in my sandbox, but not in that comment. Sorry for the confusion! I also agree on the names of the deferreds.

comment:9 Changed at 2010-01-09T02:17:11Z by warner

FWIW, I usually use "d2" when nested callbacks want to use Deferreds too. Python will shadow the outer name, so it's safe to use "d" everywhere, but the time spent remembering that is worse than the time spent typing the "2" :).

Also, there's a new method named get_current_size that is better to use here. It's defined on both mutable and immutable files, and always returns a Deferred. So try something like the following instead (this includes a little bit of cleanup, and keeps the filenode-specific code inside !FileJSONMetadata.. needs a test, of course):

diff --git a/src/allmydata/web/filenode.py b/src/allmydata/web/filenode.py
index 9fd4402..104fc49 100644
--- a/src/allmydata/web/filenode.py
+++ b/src/allmydata/web/filenode.py
@@ -427,25 +427,29 @@ class FileDownloader(rend.Page):
 
 
 def FileJSONMetadata(ctx, filenode, edge_metadata):
-    if filenode.is_readonly():
-        rw_uri = None
-        ro_uri = filenode.get_uri()
-    else:
-        rw_uri = filenode.get_uri()
-        ro_uri = filenode.get_readonly_uri()
-    data = ("filenode", {})
-    data[1]['size'] = filenode.get_size()
-    if ro_uri:
-        data[1]['ro_uri'] = ro_uri
-    if rw_uri:
-        data[1]['rw_uri'] = rw_uri
-    verifycap = filenode.get_verify_cap()
-    if verifycap:
-        data[1]['verify_uri'] = verifycap.to_string()
-    data[1]['mutable'] = filenode.is_mutable()
-    if edge_metadata is not None:
-        data[1]['metadata'] = edge_metadata
-    return text_plain(simplejson.dumps(data, indent=1) + "\n", ctx)
+    d = filenode.get_current_size()
+    def _got_size(size):
+        data = ("filenode", {})
+        data[1]['size'] = size
+        if filenode.is_readonly():
+            rw_uri = None
+            ro_uri = filenode.get_uri()
+        else:
+            rw_uri = filenode.get_uri()
+            ro_uri = filenode.get_readonly_uri()
+        if ro_uri:
+            data[1]['ro_uri'] = ro_uri
+        if rw_uri:
+            data[1]['rw_uri'] = rw_uri
+        verifycap = filenode.get_verify_cap()
+        if verifycap:
+            data[1]['verify_uri'] = verifycap.to_string()
+        data[1]['mutable'] = filenode.is_mutable()
+        if edge_metadata is not None:
+            data[1]['metadata'] = edge_metadata
+        return text_plain(simplejson.dumps(data, indent=1) + "\n", ctx)
+    d.addCallback(_got_size)
+    return d
 
 def FileURI(ctx, filenode):
     return text_plain(filenode.get_uri(), ctx)

comment:10 Changed at 2010-01-15T01:16:08Z by davidsarah

  • Keywords review-needed removed

Needs to address comments before further review.

comment:11 in reply to: ↑ 1 Changed at 2010-02-08T00:54:24Z by davidsarah

Replying to swillden:

The immediate culprit is them implementation of MutableFileNode?.get_size(), in mutable/filenode.py, line 195. The implementation is supremely simple, and not very useful:

return "?"

Note that this was changed in e046744f40d59e70. It now returns the last cached size, initially None.

comment:12 Changed at 2010-03-17T01:03:38Z by davidsarah

  • Owner set to davidsarah
  • Status changed from new to assigned

comment:13 Changed at 2010-03-17T01:03:51Z by davidsarah

  • Keywords mutable added
  • Milestone changed from undecided to 1.7.0

comment:14 Changed at 2010-06-12T21:08:35Z by davidsarah

  • Milestone changed from 1.7.0 to 1.7.1

comment:15 Changed at 2010-07-18T02:20:28Z by davidsarah

  • Milestone changed from 1.7.1 to 1.8β

Out of time.

Changed at 2010-07-23T00:18:37Z by davidsarah

web.filenode: include the current (rather than cached) file size in the JSON metadata for a mutable file. fixes #677

Changed at 2010-07-23T00:19:23Z by davidsarah

web.directory: omit the 'size' field for a mutable file in the JSON rendering of a directory if it is not known.

comment:16 Changed at 2010-07-23T00:27:17Z by davidsarah

  • Keywords review-needed test-needed added

attachment:current-size-in-mutable-file-json.dpatch is an adaptation of Brian's code in comment:9 to current trunk. It only affects t=json for a mutable file, not for directory listings.

attachment:omit-size-from-dir-json-if-unknown.dpatch omits the "size" field from directory listings if it is not currently cached. I'm not sure that this is necessary; the current behaviour of including "size": null does not seem too unreasonable.

comment:17 follow-up: Changed at 2010-07-23T05:45:39Z by zooko

Hey, you've done this again. I don't understand what it means for a ticket to be both review-needed and test-needed. Shouldn't the reviewer just point out that tests and needed and then unset the review-needed flag?

Version 0, edited at 2010-07-23T05:45:39Z by zooko (next)

comment:18 in reply to: ↑ 17 Changed at 2010-09-11T00:59:22Z by davidsarah

  • Keywords review-needed removed
  • Milestone changed from 1.8β to 1.9.0

Replying to zooko:

Shouldn't the reviewer just point out that tests are needed and then unset the review-needed flag?

Yes. Unfortunately I forgot about this ticket, and we're out of time for 1.8.

comment:19 Changed at 2011-08-14T01:17:14Z by davidsarah

  • Milestone changed from 1.9.0 to 1.10.0

I forgot about this ticket again :-( Still needs a test.

comment:20 Changed at 2013-01-04T03:04:02Z by davidsarah

Updated to current trunk (taking into account MDMF) and gitified: https://github.com/davidsarah/tahoe-lafs/commits/677-current-size-in-mutable-json. Tests still needed.

Note: See TracTickets for help on using tickets.