#1200 closed enhancement (fixed)

package up Brian's New Visualization of immutable download

Reported by: zooko Owned by: nobody
Priority: major Milestone: 1.9.0
Component: unknown Version: 1.8β
Keywords: unfinished-business immutable download statistics performance transparency Cc: drewp@…
Launchpad Bug:

Attachments (2)

Brians_New_Visualizer.darcs.patch (296.0 KB) - added by zooko at 2010-11-18T08:24:39Z.
viz-2.diff (815.8 KB) - added by warner at 2010-12-01T16:45:27Z.
dated patch to latest trunk, includes detailed "misc events". Not tested. For experimentation only.

Download all attachments as: .zip

Change History (18)

comment:3 Changed at 2010-11-18T08:20:03Z by zooko

Be sure and read http://tahoe-lafs.org/trac/tahoe-lafs/ticket/1170#comment:90 , and look at these beautiful screenshots:

I'll attach Brian's patch which makes this beautiful visualizer. The question is how to deploy this visualizer to Tahoe-LAFS users without checking jquery and protovis into Tahoe-LAFS revision control!

comment:4 Changed at 2010-11-18T23:55:45Z by terrell

in terms of looking for a way to declare javascript dependencies at build time (and not at runtime from external servers)...

http://reinout.vanrees.org/weblog/2010/01/06/zest-releaser-entry-points.html

the key point that looked useful:

Downloading an external javascript library into a package that cannot be stored in (zope’s) svn repository directly due to licensing issues. Before packaging and releasing it, that is. Automatically so you don’t forget it.

alternately, i didn't know if this might be useful / helpful:

http://pypi.python.org/pypi?:action=search&term=jquery

comment:5 Changed at 2010-11-20T10:43:20Z by francois

Oops, my previous comment was inaccurate, I was bitten by the timeline href link being absolute. The client under test was actually running on a different port than 3456.

  <li><a href="http://localhost:3456/status/down-1/timeline">timeline</a></li>
Last edited at 2010-11-20T11:11:18Z by francois (previous) (diff)

comment:6 Changed at 2010-11-20T16:25:51Z by zooko

See also #1265 (Brian's New Visualizer is insufficiently labelled/documented (plus layout problem)). I would want both #1265 and #1200 to be fixed before accepting Brian's New Visualizer into trunk.

comment:7 Changed at 2010-11-24T00:02:05Z by warner

#1269 mentions a potential enhancement to the viz tool (adding tcpdump timestamp markers).

comment:8 Changed at 2010-11-24T01:08:39Z by warner

The viz tool requires two Javascript libraries: jQuery (121KB source, 57KB minified) and Protovis (510KB source, 117KB minified). It also contains a small JS program that is downloaded as part of the "download-status" WUI page, included in src/allmydata/web/ .

We identified a couple of goals/concerns around packaging this JS code:

  1. we'd prefer to not include non-Tahoe sources in the Tahoe VC tree.
  2. we only want to manage source code in our VC tree, not minified/compressed JS (which is the moral equivalent of object code)
  3. we don't want to store large things in the Tahoe VC tree, compared to the 3MB of .py files in src/
  4. we don't want Tahoe user security to depend upon anything beyond their own Tahoe node. This rules out having the viz page load the libraries from their home pages or from google.

We used to have zfec and Crypto in src/, and removed them. We managed to remove the random windows .exe files from the tree. It'd be nice to avoid adding such things back in.

Unfortunately, there are few established practices for packaging Javascript files on the desktop (most web sites either serve their own copy or rely upon Google's fast servers). Our current tool for obtaining/building dependencies (setuptools) only handles python, not JS.

Debian does appear to have the beginnings of a JS packaging policy and has a libjs-jquery package (which, when installed together with the javascript-common package, makes http://HOSTNAME/javascript/jquery/jquery.min.js available). This makes it feasible for web apps on a debian box to all share the same (upgradable) copy of common libraries.

We seem to have consensus on the importance of goal 4: Tahoe is very focussed on not relying upon outside parties for security, and until browsers offer a way to process and enforce cryptographic hashes in URLs (so we could specify exactly what script we wanted to pull from e.g. google), we're not comfortable with extending the TCB to include external web sites, much less the "anyone-who-can-spoof-your-DNS" attack enabled by using http URLs (instead of https).

Some potential solutions we've discussed:

  1. bite-the-bullet: commit copies of the JS libraries into Tahoe's VC tree, probably the non-minified form to retain goal 2
  2. build-time: add a build step which downloads the libraries from some well-known place, checks their hashes against precomputed expectations, and copies them into the source tree somewhere (support/lib/js?)
    • this covers run-from-source, but packagers will have to come up with something else. Debian packages could depend upon libjs-jquery and use symlinks to reach the files, but there is no protovis package yet. Other OSes are unlikely to have even that.
  3. plugin: define a plugin interface for Tahoe, and put all the viz pieces in a new project. The new project's VC repo would have precisely the same problem, but maybe we'd be willing to bite-the-bullet more readily on the plugin than on the main Tahoe repo.
    • For the viz plugin specifically, we'd need two hooks:
      • add a link to each download-status page
      • add code to serve a new page (the viz JS program) from that link
      • (the basic JSON data source could be built-in to the Tahoe core, as it's useful to more than just a JS-based viz tool)
  4. separate process: if we enable CORS on the JSON data source, the viz program could conceivably be served from a separate program (on a separate port), but it would take some fussing to let it find out what download it should display (the program would need to scrape the recent-uploads-and-downloads page to get the download numbers, then offer its own menu page)
  5. webapp-in-grid: store the JS viz program in the same grid that the Tahoe node is using, then let it behave like the separate-process case.

B (build-time) would be disappointing, because I'm trying to get rid of the "build" step (#479). My unsuck branch splits the build step into two pieces: one is a helper which checks to see if you have the dependencies available, and the other will download+build them for you: the idea is that there may be more appropriate ways to obtain the dependencies (i.e. apt-get install python-zfec), so download+build shouldn't be the default. When OS packages of the JS libraries are available, this could be done cleanly, but otherwise we have to define what it means for each JS library to be "available" (i.e. where it lives).

C (plugin) would be cool for other reasons, but is a lot of work and yak-shaving for a fairly small feature. It also just pushes the problem elsewhere.

D (separate process) doesn't sound likely to provide a good user experience. Download status charts should be reached from a link on the download-status summary page, not from a completely separate app.

E (webapp-in-grid) has the same problem, plus I'm uncomfortable with the idea of storing pieces of Tahoe's functionality in Tahoe itself, both because of the setup problem (what, should each copy of tahoe auto-upload the JS app into each grid it touches upon first boot?), and because of the recursion problem (what tool do you use to diagnose download problems of the download-status JS?).

So I'm undecided. I guess I lean towards B, as it seems to be the cleanest option. I wish the libraries were small enough that we could just jam them into the tree (option A), but they're not. In the long run, C (plugins) would make it easier to get new functionality into the node, and I bet we'd be willing to commit jQuery into the plugin's repo (especially since then, if we changed our mind, we could just ditch that repo and start another plugin), but it doesn't feel right for the near-term.

comment:9 follow-up: Changed at 2010-11-27T13:17:56Z by gdt

I'd like to suggest stepping back from the "python is special" view and consider how the rest of the open source world deals with dependencies. That said, I realize there is a desire to be able to download a tahoe tarball, unpack it, type something, and end up running. I believe that for any popular program almost everyone runs it from a packaging system. So for development and evangelizing, easy build from source tarball is important, but for eventual widespread success, packaging systems are crucial.

Right now we seem to have "build" and "install" steps, but build isn't really build - it's fetch/build dependencies. And install runs the python compiler in addition to copying to DESTDIR. From a packaging system viewpoint, automatic dependency fetching is a problem to be disabled. So I'd like to suggest a new 'dependencies' setup.py target that obtains all missing dependencies, that build compile .py but not fetch dependencies or install, and that install just copy to DESTDIR.

One obstacle to this approach is that some programs are written in languages that appear to have a tradition that people manually deal with files. It's obvious to me :-) that javascript libraries should come in a tarball with a configure script, be given a --prefix, and get installed into $PREFIX/share/javascript/foo/bar.js, from which other programs that need them can obtain them. Probably the build step (typing make) in the js package would run some program to convert from source to minified form, and make install would then install both. Then a "binary package" of the javascript program can be distributed, and be a dependency in packaging systems. (I find it odd that the javascript community hasn't done this, but java seems similar in the expectation that individuals who want to use programs get jar files and type 'java foo.jar' to run them.) So one step would be to package up the javascript code first.

The other obstacle is complexity. As tahoe depends on something new, code to download and build those new things (presumably for use within the source tree, rather than installed) needs to be written. For a few javascript files, that might be simple enough to not be an issue.

Essentially I'm arguing that a "./setup.py dependencies" step is a (reasonable and helpful) accomodation for people wanting to use tahoe without the dependencies already installed, and that the eventual large-scale approach would be to have dependencies already.

Another option is to adopt the packaging-system-centric notions where it's reasonably easy (GNU/Linux, pkgsrc, etc.) and to provide a tarball of tahoe-dependencies that has things like the js libs. Those tarballs could then be built for releases, but not stored in the version-control system.

comment:10 in reply to: ↑ 9 Changed at 2010-11-27T19:00:50Z by zooko

gdt:

Replying to gdt:

So for development and evangelizing, easy build from source tarball is important, but for eventual widespread success, packaging systems are crucial.

Okay, I agree that both running the code directly from upstream-tahoe-project -> user and running the code from upstream-tahoe-project -> packager -> user are important use cases.

Now, let's get down to brass tacks here. What changes are you suggesting?

Right now we seem to have "build" and "install" steps, but build isn't really build - it's fetch/build dependencies. And install runs the python compiler in addition to copying to DESTDIR. From a packaging system viewpoint, automatic dependency fetching is a problem to be disabled.

This is #1220. If I understand correctly, the current state of that ticket is that --single-version-externally-managed satisfies the original requirements of the ticket, but that gdt then added a new requirement which is that the setup step should check for dependencies and stop with an error message if they aren't satisfied. ;-) Let's follow up on that ticket... Here: comment:21:ticket:1220

So I'd like to suggest a new 'dependencies' setup.py target that obtains all missing dependencies, that build compile .py but not fetch dependencies or install

Okay I opened #1270 (have a separate build target to download any missing deps but not to compile or install them).

One obstacle to this approach is that some programs are written in languages that appear to have a tradition that people manually deal with files. It's obvious to me :-) that javascript libraries should come in a tarball with a configure script, be given a --prefix, and get installed into $PREFIX/share/javascript/foo/bar.js, from which other programs that need them can obtain them. Probably the build step (typing make) in the js package would run some program to convert from source to minified form, and make install would then install both. Then a "binary package" of the javascript program can be distributed, and be a dependency in packaging systems. (I find it odd that the javascript community hasn't done this, but java seems similar in the expectation that individuals who want to use programs get jar files and type 'java foo.jar' to run them.) So one step would be to package up the javascript code first.

That all sounds pretty good to me, but note that tahoe-lafs supports platforms that do not have "make" (i.e., Windows), so we can't rely on make.

Another option is to adopt the packaging-system-centric notions where it's reasonably easy (GNU/Linux, pkgsrc, etc.) and to provide a tarball of tahoe-dependencies that has things like the js libs. Those tarballs could then be built for releases, but not stored in the version-control system.

I guess that is the tahoe-deps notion:

http://tahoe-lafs.org/source/tahoe-lafs/deps/tahoe-deps.tar.bz2

Changed at 2010-12-01T16:45:27Z by warner

dated patch to latest trunk, includes detailed "misc events". Not tested. For experimentation only.

comment:11 follow-up: Changed at 2011-06-28T04:42:43Z by drewp

"we don't want Tahoe user security to depend upon anything beyond their own Tahoe node. This rules out having the viz page load the libraries from their home pages or from google."

That seems extreme. Surely there's a scheme for loading the lib from google and then checking it against a small local fingerprint that you package with your main sources. This might even be doable in the browser, which would be cool since it defers the download until you actually show you were going to use a browser.

comment:12 Changed at 2011-06-28T04:43:21Z by drewp

  • Cc drewp@… added

comment:13 in reply to: ↑ 11 Changed at 2011-06-29T07:50:45Z by warner

Replying to drewp:

That seems extreme. Surely there's a scheme for loading the lib from google and then checking it against a small local fingerprint that you package with your main sources.

Hm, are you thinking something like this?

  • browser hits WEBAPI/jQuery.js
  • tahoe server does a reverse-proxy to fetch a copy from google
  • checks the contents against a fingerprint,
  • returns it to browser or throws error
  • server caches the file for later use

Feels kind of weird, but I suppose it *does* satisfy the goals of not including a copy in the source tree, nor fetching it during build. It would mean that the viz display wouldn't work unless the server could reach the internet, which is probably not a real constraint (I run tests on my offline laptop all the time, and I'd like viz to work there, but I'll admit that most grids aren't like this). It feels morally equivalent to having the server fetch those files from google at boot time, which feels pretty similar to having it fetch the files at build time.

This might even be doable in the browser, which would be cool since it defers the download until you actually show you were going to use a browser.

Hm, *that* sounds challenging. If Zooko's PMAGH project really existed (and browsers would enforce hash-of-contents in URLs), then we could just use a link that routed to google but which identified a specific version of jQuery. Lacking that.. sounds tricky. A normal <script> tag won't protect you. Sounds like the page would need to XHR to google (which is prohibited by the annoying same-origin-policy), retrieve jQuery.js as data, hash it (in javascript: slow but possible), then, what, document.write() it into the page as a new <script> tag? Something like that?

comment:14 Changed at 2011-06-29T23:30:43Z by warner

  • Resolution set to fixed
  • Status changed from new to closed

Ok, this has landed, in fc5c2208fbee2506, d8358f2863d9219f, and 0f79973401de70f9. After some discussion at the summit we decided to just commit the minified JS libraries into the source tree (this is 91k of jquery and 117k of protovis). The actual frontend still needs some work (the "overview" pane doesn't seem to work, and scrolling could be nicer), but it's a good start, and will be a lot easier to iterate on now that it's landed.

This didn't include the "misc events" support (for things like how long AES took). That will get a separate ticket, as it needs even more work (too many events make the timeline scroll painfully slowly).

The viz chart can be found from the "timeline" link on each immutable-download status page, under "Recent Uploads And Downloads" from the welcome page.

comment:15 Changed at 2011-06-30T15:15:15Z by zooko

For the record, in re-reading this ticket I realized that the thing terrell mentioned seven months ago turns out to be that something David-Sarah suggested at the summit on Tuesday has already been done by (many) other people: packaging the Javascript libraries as Python libraries. Both jquery and protovis are already packaged into Python libraries by various people.

We could potentially make Tahoe-LAFS depend on one of those Python packages and remove the copy of the Javascript library from our source tree.

Let's open a new ticket for that.

Terrell: maybe next time you're pointing out a potentially useful solution to our current problem, you could try writing a more detailed explanation of how it could apply. Maybe write what the steps would be for our project to use it -- something like that. Basically increase your assumption that your readers are distracted, ignorant, or lazy and if you don't spell out what you mean they will miss the point. :-)

comment:16 Changed at 2011-07-01T00:17:24Z by terrell

apologies :)

Note: See TracTickets for help on using tickets.