#567 assigned enhancement

add version info to t=JSON output data

Reported by: warner Owned by: rvs
Priority: major Milestone: soon
Component: code-frontend-web Version: n/a
Keywords: version json docs easy forward-compatibility Cc: freestorm77@…
Launchpad Bug:

Description (last modified by rvs)

It will probably make life easier in the future if we add a version number to the JSON output from operations like deep-stats, deep-manifest, and deep-check. A related piece would be to tell developers that they should tolerate the presence or absence of each field. On the other hand, the only customers of the t=JSON are:

1) The Tahoe CLI
2) Tools written to use the webapi functions

I'm ok with requiring that you use e.g. a 1.3.0 CLI with a 1.3.x backend instead of a 1.4.x backend, and developers have to play catchup all the time anyways. So I'm not sure how much energy I want to put into this.

Attachments (3)

add_version_to_json.darcs.txt (19.4 KB) - added by freestorm at 2010-05-30T19:48:36Z.
add_tahoe_version_to_json.darcs.txt (20.0 KB) - added by freestorm at 2010-06-04T22:47:55Z.
tahoe_version_running_tests.dpatch.txt (40.8 KB) - added by freestorm at 2010-06-04T22:48:38Z.
Running tests

Download all attachments as: .zip

Change History (28)

comment:1 Changed at 2010-03-25T01:11:36Z by davidsarah

  • Keywords version json docs easy forward-compatibility added

comment:2 Changed at 2010-05-30T19:46:43Z by freestorm

  • Description modified (diff)

It will probably make life easier in the future if we add a version number to the JSON output from operations like deep-stats, deep-manifest, and deep-check.
A related piece would be to tell developers that they should tolerate the presence or absence of each field.
On the other hand, the only customers of the t=JSON are:
1) The Tahoe CLI
2) Tools written to use the webapi functions
I'm ok with requiring that you use e.g. a 1.3.0 CLI with a 1.3.x backend instead of a 1.4.x backend, and developers have to play catchup all the time anyways.
So I'm not sure how much energy I want to put into this.

Changed at 2010-05-30T19:48:36Z by freestorm

comment:3 Changed at 2010-05-30T19:49:35Z by freestorm

I create a patch with tahoe version.

comment:4 Changed at 2010-05-30T20:17:39Z by zooko

  • Milestone changed from undecided to 1.7.0
  • Owner set to freestorm

Thank you! It looks fairly good. I'm going to examine it more closely. While I'm doing so, please also write a unit test, probably in src/allmydata/test/test_web.py which calls the Python API methods that you changed. Actually there are already tests that call those methods, so find those tests and add a check of the version field to each one. For example, you changed directory.py ManifestResults to have a version number. I can see from directory.py line 475 that ManifestResults is created when you send a POST with certain arguments. Looking around in the tests, I find allmydata.test.test_web.Web.test_POST_DIRURL_manifest, which exercises that POST. So I think you can add a check inside test_POST_DIRURL_manifest() that a version string is present. You can tell when you're done because there is a unit test which goes red whenever you comment-out each of the lines of code that you've added in your patch.

I'm glad you are contributing to this! It is a "forward-compatibility" issue, which always makes me feel good about getting it completed and deployed in a new version of Tahoe-LAFS. Hopefully we can get it into v1.7.

comment:5 Changed at 2010-05-31T04:20:55Z by davidsarah

  • Description modified (diff)

I'm not sure that the Tahoe version is necessarily what we want for forward-compatibility (although it might be useful if we later find a bug in particular Tahoe versions that clients have to work around). My impression was that the description was referring to a version number that is updated whenever the output from each particular operation is changed.

comment:6 Changed at 2010-05-31T07:45:15Z by freestorm

  • Status changed from new to assigned

Davidsarah:

Do you think more a protocol version?
like:
tahoe_version: "1.6.1-r4285"
protocol_version: "JSON 1.0"

Or maybe an output revision:
tahoe_version: "1.6.1-r4285"
output_revision: "1.1"

Fred

Changed at 2010-06-04T22:47:55Z by freestorm

Changed at 2010-06-04T22:48:38Z by freestorm

Running tests

comment:7 Changed at 2010-06-04T23:10:09Z by freestorm

  • Keywords review-needed added
  • Owner changed from freestorm to somebody
  • Status changed from assigned to new

comment:8 Changed at 2010-06-05T18:40:12Z by kevan

  • Owner changed from somebody to kevan

comment:9 Changed at 2010-06-05T21:20:20Z by kevan

Is there a good reason to only add this field to deep-stats, deep-manifest, and deep-check? (that's not specifically for you to answer, FreeStorm?, though you can if you have an opinion -- just a general question) Regardless of the answer to that, it won't be a regression to add the version field to only deep-stats, deep-manifest, and deep-check, so if it turns out to be a good idea to add the version information to other operations, that idea can be a new ticket.

Instead of calling your new field tahoe_version, call it tahoe-version -- this is more consistent with how other fields are named.

It is not necessary to import version into web/directory.py -- you can replace the change you made around line 932 with

    'tahoe_version': s['stats']['tahoe_version']

and use the version field you introduced in dirnode.py.

The change in json_check_counts seems unnecessary -- that function only gets called within the context of a non-deep operation, or to build nested results for a deep operation that already has tahoe_version somewhere else in the JSON. In other words, it seems like this ticket could be solved without that change. Do you agree?

You can eliminate the use of version in check_results.py (assuming that the first modification is unnecessary) by altering the change around line 350 to be something like:

    res = self.monitor.get_status()
    stats = res.get_stats()
    data["tahoe_version"] = stats['tahoe_version']

and the change around line 509 to be

   stats = res.get_stats()
   data["tahoe_version"] = stats['tahoe_version']

This will make those methods use the version string you set in dirnode.py.

Actually, when I do that, the tests fail -- sometimes the stats don't include tahoe_version. This is because they don't get sent from the DeepStats object contained within the code that performs a deep check or deep repair or deep verify or some combination of those three to the results object -- DeepCheckResults for a deep check, DeepCheckAndRepairResults for a deep check and repair -- until the deep operation has finished, which can take a while. You can fix this by changing the __init__ method in DeepChecker in dirnode.py to have the following at the end:

    self._results.update_stats(self._stats.get_results())

This will initialize the stats dictionary in the results object to have some meaningful initial values before the check starts, including the version number.

Change

    self.failUnlessIn("tahoe_version",res,msg=None)

to

    self.failUnlessIn("tahoe_version", res, res)

in test_deepcheck.py.

Change

    self.failUnlessIn("tahoe_version",res,msg=None)

to

    self.failUnlessIn("tahoe_version", res, res)

and

    self.failUnlessIn("tahoe_version",stats,msg=None)

to

    self.failUnlessIn("tahoe_version", stats, stats)

in test_web.py.

Code coverage reports look good -- all of your changes are executed by tests, and if I remove any one of your changes, at least one test fails, which is also good.

comment:10 Changed at 2010-06-05T21:20:36Z by kevan

  • Keywords review-needed removed

comment:11 Changed at 2010-06-05T21:21:41Z by kevan

  • Owner changed from kevan to freestorm

comment:12 Changed at 2010-06-10T10:39:00Z by freestorm

I'm looking to apply kevan's comment

I have a question about where need to be "tahoe-version"

In the example below, "tahoe-version" appears 2 times, is it correct?

Deep-Check with Add/renew lease:

{
 "list-unhealthy-files": [], 
 "stats": {
  "count-immutable-files": 0, 
  "count-literal-files": 0, 
  "largest-directory-children": 0, 
  "largest-directory": 0, 
  "size-directories": 0, 
  "tahoe-version": "1.6.1-r4472", 
  "largest-immutable-file": 0, 
  "size-files-histogram": [], 
  "size-immutable-files": 0, 
  "count-unknown": 0, 
  "count-files": 0, 
  "count-mutable-files": 0, 
  "count-directories": 0, 
  "size-literal-files": 0
 }, 
 "count-objects-healthy": 0, 
 "tahoe-version": "1.6.1-r4472", 
 "count-objects-checked": 0, 
 "finished": false, 
 "count-objects-unhealthy": 0, 
 "root-storage-index": "5345rmad34jtdiclrjqtq63m4i", 
 "count-corrupt-shares": 0, 
 "list-corrupt-shares": []
}


comment:13 Changed at 2010-06-10T19:21:31Z by zooko

  • Milestone changed from 1.7.0 to 1.8.0

There are two kinds of version we might want to convey: the version of the JSON output that we're emitting and the version of the implementation that produced it. For the latter, please call the key application-version instead of tahoe-version and let the value be allmydata.__full_version__. For the former, I guess we should do something like you suggested in comment:6: something like protocol-version: "JSON 1.0" oe json-version: "1.0".

comment:14 Changed at 2010-06-11T21:02:59Z by freestorm

IRC discussion with David-Sarah:

davidsarah: re: comment 12, 'tahoe-version' only needs to appear once, in the outermost dict

FreeStorm?: davidsarah: is it a good idea to create a new *object* "version" and put application-version and protocol-version insite?

FreeStorm?: davidsarah: by *object* I mean like "stats" or "result" (I don't know what the name for that)

davidsarah: no, I don't think that's necessary

davidsarah: (the name is "subkey")

FreeStorm?: davidsarah: okay for subkey, I think about that, because in future we can maybe add some other version properties

davidsarah: YAGNI ("you ain't gonna need it")

FreeStorm?: davidsarah: Okay, last question, also I put them in "root" or in an existing subkey?

davidsarah: in root should be fine

comment:15 Changed at 2010-06-11T21:33:07Z by warner

This ticket is about having each webapi call that returns JSON output with non-trivial structure provide a field that lets a caller be confident that they know what the structure will be. For deep-stats, I'd name this field "deep-stats-version" and set it equal to the integer 1.

  • don't use tahoe-version: a developer can't be expected to enumerate all of the tahoe versions (including between-release intermediate versions) and identify which of those provides the output in shape X versus shape Y. "tahoe-version" is much too coarse for this purpose.
  • don't use the same namespace for both "deep-stats" and "deep-check": those two structures will have separate versions. If we change the deep-stats output, but leave the deep-check output alone, we should increment deep-stats-version but leave deep-check-version alone.
  • don't use "1.0": that implies a similarity between the current 1.0 and some hypothetical future "1.1", which would require a developer to guess whether the code they're writing today will tolerate whatever changes might be implied by the ".1" . Keep it simple and use integers.
  • the rule should be that new fields can be added without bumping the version. If any existing field is changed or removed, the version must be bumped.
  • client code would need to switch on the version number to determine how it ought to parse the output. It's an open question as to what client code should do with an unknown version ("output from the future"): it could try anyways and hope for the best, or refuse to try and just throw an error.

It might actually be a better idea to say that non-backwards-compatible changes to the output of webapi methods must use a different verb: t=deep-stats-v2 instead of t=deep-stats . That would remove the need for a version number in the output, at the expense of slightly uglier API methods. A lot of RESTful web services these days put the version number at the start of the URL: you fetch your amazon S3 data from something like /api/1/blahblah, and some day when they make a non-compatible change, new code will use /api/2/blahblah instead.

comment:16 Changed at 2010-06-12T04:33:56Z by zooko

Let's do the version-string-in-JSON approach, doing it in all the ways that you just said to do in your comment:15, and leave the versioned-API-URL idea for another ticket.

comment:17 Changed at 2010-07-20T05:01:53Z by zooko

Okay, the next step is for someone (FreeStorm, hopefully) to update the patch to reflect Brian's five rules from comment:15.

comment:18 Changed at 2010-08-01T13:30:50Z by freestorm

  • Cc freestorm77@… added
  • Owner changed from freestorm to nobody

comment:19 Changed at 2010-08-10T04:11:56Z by davidsarah

  • Milestone changed from 1.8.0 to 1.9.0

comment:20 Changed at 2011-07-27T18:24:02Z by zooko

  • Milestone changed from 1.9.0 to soon

comment:21 Changed at 2016-11-29T11:40:16Z by rvs

  • Description modified (diff)
  • Owner changed from nobody to rvs
  • Status changed from new to assigned
  • Version changed from 1.2.0 to n/a

comment:22 Changed at 2016-12-05T21:48:26Z by rvs

  • Keywords review-needed added

I've decided to take over, here's pull request number 1: https://github.com/tahoe-lafs/tahoe-lafs/pull/382

Only addresses deep-stats so far.

comment:23 Changed at 2016-12-24T04:08:56Z by Brian Warner <warner@…>

In 51eae34/trunk:

Merge PR382

refs ticket:567
Closes #382

comment:24 Changed at 2017-01-02T22:29:32Z by rvs

New pull request for deep manifest operation: https://github.com/tahoe-lafs/tahoe-lafs/pull/390

comment:25 Changed at 2020-07-14T15:02:17Z by exarkun

  • Keywords review-needed removed

The linked PRs are closed for now so I'm removing the review keyword.

Note: See TracTickets for help on using tickets.