#249 closed enhancement (fixed)

move bundled dependencies out of revision control history and make them optional

Reported by: zooko Owned by: warner
Priority: major Milestone: 1.3.0
Component: packaging Version: 0.7.0
Keywords: Cc: cgalvan
Launchpad Bug:

Description

As per this tahoe-dev discussion, it would be nice to move the bundled dependencies out of revision control history and make them optional.

With the bundled dependencies being optional, then people who downloaded the "normal" tarball would be getting a fat tarball with all easy-installable dependencies bundled in so that the "Desert Island Build" would work. The "Desert Island Build" is that someone installs all of the Manual Dependencies, downloads the allmydata-tahoe source tarball, gets on an airplane where they don't have internet access, and then tries to build and install Tahoe.

People who checked out the source with a revision control tool or who downloaded the "minimal" tarball would get only Tahoe-specific source code.

Attachments (1)

tahoe_ext_deps.patch (3.5 KB) - added by cgalvan at 2008-08-26T23:18:22Z.
Updated patch, note though that it will need to be updated once more when the location of the external dependencies is decided on.

Download all attachments as: .zip

Change History (39)

comment:1 Changed at 2008-01-25T04:19:55Z by warner

I had a thought: we build a tarball that contains the libraries that people might need, and make it available from our website. The build process checks to see if this tarball exists in the top of the tree, or in the directory just above it (so that the buildslaves can keep re-using the same tarball without re-downloading it each time). We also provide a make target that will download the tarball if necessary (perhaps using wget and a If-Modified-Since header). If the build process decides it needs to use the tarball, it can unpack it into a directory which setuptools can then use as a repository.

If done right, this would make the following users happy:

  1. Brian: I have a bunch of trees, all in sibling directories. I don't want to download the ext tarball multiple times. I would download it once, place it in my trees' parent directory, and then type 'make build-deps' in each new tree. This would grab the tarball from the parent directory (or maybe find a previously-unpacked directory in the same place and use that as-is)
  2. New Users (connected): they use darcs to get a tahoe tree, then type 'make build-deps'. That notices that there is no ext tarball, so it downloads one, then unpacks it, then creates the dependent libraries.
  3. New Users (disconnected): they use darcs to get a tahoe tree (or download a snapshot), and also download the ext tarball. Then they move to their desert island. They unpack the tahoe tree, and copy the ext tarball into it (or to its parent). They type 'make build-deps' and it uses that tarball without trying to download a new one.

comment:2 Changed at 2008-06-02T23:15:28Z by zooko

  • Milestone changed from eventually to 1.1.0

comment:3 Changed at 2008-06-03T01:02:56Z by warner

this is closely related (or perhaps a duplicate) of #415.

comment:4 Changed at 2008-06-04T01:14:04Z by zooko

  • Milestone changed from 1.1.0 to 1.1.1

comment:5 Changed at 2008-06-04T01:14:36Z by zooko

#415 was a duplicate of this one.

comment:6 Changed at 2008-08-20T02:10:53Z by cgalvan

With setuptools, you can have extra_requires, which act as option install dependencies, so you can say something like:

easy_install tahoe[misc]

And we would specify the 'misc' extra require to be all those things under misc/dependencies. This would also still be supported in a Desert Island build if for instance you had all of your misc. dependency tarballs in <path/to/deps>, you would just say:

easy_install -f <path/to/deps> tahoe[misc]

Does this cover all of your use-cases or am I missing something?

comment:7 Changed at 2008-08-20T02:14:15Z by cgalvan

Hm, on second thought, I'm not yet familiar with tahoe's build process, but are these things actually needed at build time or do they just provide additional functionality once you have tahoe installed?

comment:8 Changed at 2008-08-20T20:32:48Z by zooko

cgalvan: I'm afraid this ticket wasn't explicit enough. All of the packages in question are required for Tahoe. The only question is how to acquire them.

One desideratum is "The Desert Island scenario", i.e. an off-line build, in which you download the Tahoe source and run ./setup.py build, but the build process is not allowed to make connections to the internet. As you've probably seen on the distutils-sig list, this kind of scenario is common behind corporate firewalls. Also it has happened -- twice now I think -- to Brian on an airplane.

Another desideratum is not to keep large binaries (tarballs) in our revision control history under darcs revision control.

Another is not to have a large tarball download of the Tahoe source code itself.

These are somewhat in contention, so the agreed plan is to offer more than one way to do it: if you want just the Tahoe source and you don't mind if the build process fetches more things from the Internet when you build, then you can get just the Tahoe source tarball or the Tahoe darcs checkout. If you want to be able to build behind corporate firewall, on airplane, or on a desert island then you get both the Tahoe tarball/darcs checkout and the "dependent libs" tarball/darcs checkout.

comment:9 Changed at 2008-08-26T04:33:52Z by cgalvan

  • Owner changed from somebody to cgalvan
  • Status changed from new to assigned

comment:10 Changed at 2008-08-26T05:26:25Z by cgalvan

I believe I have a solution for this problem that satisfies each of the scenarios described in the ticket. Here is how they specifically relate to the 3 scenarios described by Brian.

  1. In the parent directory of your multiple source trees, you would have a folder named 'tahoe_deps'(this can be whatever you choose it to be). This folder would contain all of the tarballs for the external dependencies. Doing a 'setup.py build' or 'develop' would find the tarballs from the packages in these locations and would install them just as it had downloaded them from pypi or another repository.
  1. Since the user is connected to the internet, the packages will automatically be built after they are found in a repository(most likely PyPi?), or the backup dependency link of the allmydata site.
  1. Similar to #1, except there is only a single source tree.

comment:11 Changed at 2008-08-26T11:21:49Z by zooko

I think Brian may have later told me on the phone that he didn't like the build process to look "outside of its own subtree" by following "..". But I'll leave that to Brian and Chris to work out -- all of the proposals in this ticket seem acceptable to me.

One thing that I am careful about is what effect this will have on the install.html. I will not accept a change to that document which adds a branch (i.e., it includes the word "if"). I would be okay with any of these approaches being documented in install.html, but I would prefer one in which the user gets both the Tahoe source code and the complete dependency set in a single download operation (i.e. they download a single file after following the instructions in "Get the Source Code" in install.html).

comment:12 Changed at 2008-08-26T11:53:16Z by zooko

cgalvan: thanks for the patch. I agree that Tahoe setup_require's Twisted for the tests, but why Nevow?

comment:13 Changed at 2008-08-26T15:47:27Z by cgalvan

I personally would also prefer if the 'tahoe_deps' were pulled from somewhere inside the source tree, the only reason I designed it to be in the parent was so that it would satisfy the first use-case that Brian described, but if this is no longer desired it can be easily changed :)

This wouldn't change the current install approach, doesn't someone doing a Desert Island build already have to download the external tarballs separately? To make it easier, we could have a single tarball which contained all of the tarballs for the dependencies.

Also, I wasn't certain whether the tested needed just Twisted or Nevow as well, I meant to ask you about this and it looks like you have already answered my question :)

comment:14 Changed at 2008-08-26T16:18:15Z by zooko

cgalvan: Thanks again! I'm glad to have your help on these issues.

Okay, here are the next steps:

  1. Wait for Brian to wake up and login and notice this ticket and to decide whether he wants the deps to be in an uncle directory or inside the tahoe directory. (I hope he chooses the latter.)
  1. If it is the latter then put back the dependent links variable to misc/dependencies.
  1. cgalvan: Do we need to specify each file per its ".tar" name, as in the current trunk, or can we specify just a directory and setuptools will look for all source tarballs in that directory? It used to be the former, which is why the current trunk of Tahoe uses os.listdir() and then filters for files that end with .tar.
  1. Collect a set of source tarballs of all of the dependent libraries that Tahoe requires and recursively all of the dependent libraries that those dependent libraries require. Uncompress them so that they are in .tar form instead of .tar.gz or .tar.bz2 etc.. Make a .tar.bz2 of a directory containing all of those .tar's.
  1. Test it out: unpack the dependent libs tarball into misc/dependencies (or into ../tahoe-dependencies, depending on step 1 above), and see if the Tahoe build succeeds without downloading anything from the network.
  1. Write a script -- probably inside Makefile, which unpacks such a dependent lib tarball into misc/dependencies and then uses ./setup.py sdist to build a tarball which includes the dependencies and is named "allmydata-tahoe-SUMO-1.3.0.tar.gz" instead of "allmydata-tahoe-1.3.0.tar.gz".
  1. Change docs/install.html to link to a sumo tarball in the "Get the Source" section.

comment:15 Changed at 2008-08-26T16:26:55Z by cgalvan

  1. You only need to specify the path to 'tahoe_deps' as a dependency link and setuptools will treat it as a repository, so you don't have to specify each file name explicitly.
  1. For some reason in my testing, it wasn't picking up the .tar's, I had to grab the source tarballs from PyPi? to test it out(which were gzipped and bzipped), can you confirm this?
  1. If you want to eventually move away from using the Makefile, we can do this instead by adding a command such as 'sdist_sumo' that could do this by specifying additional data_files :)

comment:16 Changed at 2008-08-26T16:54:00Z by cgalvan

  1. Nevermind, I was missing one of the .tar's, which happened to be the first one it checked :) It recognizes .tar's just fine.
  1. On second thought, it may be better to subclass the sdist command and write our own hook so that it checks sys.argv for '--sumo' or something, and then makes the appropriate 'sumo' tarball.

comment:17 Changed at 2008-08-26T17:10:51Z by zooko

  1. I thought this feature of just specifying a dir (which contains tarballs) didn't work in the past, but heck, let's try it and see.
  1. Good thinking! I approve.

Changed at 2008-08-26T23:18:22Z by cgalvan

Updated patch, note though that it will need to be updated once more when the location of the external dependencies is decided on.

comment:18 Changed at 2008-08-26T23:22:27Z by cgalvan

I have updated the patch since Nevow wasn't needed at test time. I also added a '--sumo' option to the sdist command, which toggles including the external dependency tarballs into the whole sdist. *Note: The proper place for the external dependencies to be pulled from has not yet been decided on, so the current patch will need to be updated to reflect that. Currently, when building it will be grabbing from a 'tahoe_deps' folder in the parent of the tahoe source tree, but the '--sumo' option uses the tarballs under 'misc/dependencies', which allows anyone to test out that option currently since they are already under version control.

comment:19 Changed at 2008-08-27T17:58:27Z by zooko

cgalvan: I tried your patch out and it worked fine!

One thing, though: I think that it is deciding which things to include in misc/dependencies based on the normal setuptools package-data-inclusion logic (i.e. currently it is including everything which is included in darcs revision control).

We would like to remove those .tar's from darcs revision control, and still have them excluded from normal sdist, but still have them included in sdist --sumo. What's the best way to do that? I'm thinking maybe just an inclusion rule (in the --sumo case only) to include everything named misc/dependencies/*.tar.

comment:20 Changed at 2008-08-27T18:46:21Z by cgalvan

Glad that it worked for you :)

Yes, the current implementation was just an example so that you could see how it worked using the latest revision, which still had all the .tars. Just as you suggested, the --sumo case would do an include based on a pattern like you described.

comment:21 Changed at 2008-08-27T19:13:01Z by zooko

Could you show me the actual code for such an include pattern that would go in our setup.py?

comment:22 Changed at 2008-08-27T23:09:45Z by warner

  1. Wait for Brian to wake up and login and notice this ticket and to decide whether he wants the deps to be in an uncle directory or inside the tahoe directory. (I hope he chooses the latter.)

I'd like them to be in an uncle file, because I have lots of trees (at least 40) and I want to download a single dependency tarball for use by all of them. So:

 wget http://allmydata.org/something/tahoe-deps.tar.gz
 darcs get http://allmydata.org/something/trunk tahoe-trunk
 darcs get tahoe-trunk tahoe-feature1
 darcs get tahoe-trunk tahoe-feature2
 (cd tahoe-trunk && make all)
 (cd tahoe-feature1 && make all)
 (cd tahoe-feature2 && make all)

If it looks in the current source tree *and* the parent directory, that's fine, I just want to be able to hit the same tarball from multiple directories without having to make a symlink for every tree (because I'll forget, and then my builds will take a long time to download stuff, and by the time I notice this and remember the reason for it and create the symlink the build will be far enough along that adding the symlink won't make anything better, and that would annoy me).

I'd prefer it to be a single file that gets downloaded, rather than a directory full of tarballs, but I'd survive if I had to unpack a downloaded tarball first. (at least I wouldn't be replicating that work for every feature tree I have).

  1. Collect a set of source tarballs of all of the dependent libraries that Tahoe requires and recursively all of the dependent libraries that those dependent libraries require. Uncompress them so that they are in .tar form instead of .tar.gz or .tar.bz2 etc.. Make a .tar.bz2 of a directory containing all of those .tar's.

I don't see a lot of value to the second part. Conserve the developer's disk space and just leave the files on disk compressed. I just did a test against the contents of our misc/dependencies/ directory, and 'tar cjf' of the current .tar files uses nearly the same space as a 'tar cjf' of .tar.bz2 files (actually the bz2(tar) uses 0.5% more space than bz2(tar(bz2)) ). (zooko did some measurements a while ago that showed the contrary, but those were on several thousand small darcs patch files, whereas misc/dependencies is a handful or fairly large files).

I'll look more closely at the patch now.

thanks!

comment:23 Changed at 2008-08-27T23:17:13Z by zooko

No, my measurement was dependent lib tarballs of Tahoe:

http://allmydata.org/pipermail/tahoe-dev/2007-December/000292.html

But it doesn't help much with gzip or bzip2.

comment:24 Changed at 2008-08-27T23:18:24Z by zooko

I applied cgalvan's patch (modified) as 2cbba0efa0c928b1.

comment:25 Changed at 2008-08-27T23:48:00Z by warner

ok, so what we just discussed on irc:

  • the tahoe build process ('make all') will look in ./tahoe_deps.tar.bz2 and ../tahoe_deps.tar.bz2 and ./misc/dependencies/ for dependent library tarballs
  • we'll create a tarball with our dependent libraries, publish it on allmydata.org somewhere
  • the 'setup.py sdist' command will *not* include those dependent library files in the generated source distribution tarball/zipfile
  • the 'setup.py sdist --sumo' command *will* include those files, in misc/dependencies/
  • we'll set up a buildslave that does both 'sdist' and 'sdist --sumo', and publish both

And our various use cases will be satisfied as follows:

  • 'darcs get tahoe' + build : download all deps from the internet
  • 'darcs get tahoe' + 'wget tahoe-deps.tar.bz2', then get on a plane
  • 'wget tahoe-nightly.tar.bz2' + build : download all deps from the internet
  • 'wget tahoe-sumo.tar.bz2', then get on a plane

My personal use case (multiple darcs trees) will be handled by having a tahoe-deps.tar.bz2 file in their mutual parent directory.

Now, will setuptools look inside a .tar.bz2 for its files? or do we need to have something (either the user, or some code inside setup.py) unpack that tarball before letting setuptools see it?

comment:26 Changed at 2008-08-28T00:53:43Z by cgalvan

Everything looks good to me :)

setuptools can't look inside the dependency tarball itself, it will need to be extracted. I would think it'd be sufficient to make the unpacking a necessary step, but it is really up to you(or whoever wants to weigh in on this one) :) Most people that fall into the Desert Island scenario will probably just download the sumo tarball.

comment:27 Changed at 2008-08-28T01:51:22Z by warner

I'm ok with unpacking. So the process will be:

  • 'darcs get tahoe' + build: downloads deps from internet
  • 'darcs get tahoe' + 'wget tahoe-deps.tar.bz2' + 'tar xf tahoe-deps.tar.bz2', then get on a plane
  • 'wget tahoe-nightly.tar.bz2' + build: download deps from internet
  • 'wget tahoe-sumo.tar.bz2' then get on a plane

The tahoe-deps.tar.bz2 file will unpack into say tahoe-deps/*.tar.bz2, and the setup.py build process will look in {{{["./tahoe-deps", "./misc/dependencies", and "../tahoe-deps"]}}} for those libs.

comment:28 Changed at 2008-08-28T02:19:49Z by zooko

Sounds good!

comment:29 Changed at 2008-09-04T13:50:47Z by zooko

So some of our builds are failing, like this:

Using /usr/lib/python2.5/site-packages
Searching for pyOpenSSL==0.6
Reading http://allmydata.org/trac/tahoe/wiki/Dependencies
Reading http://pypi.python.org/simple/pyOpenSSL/
Reading http://pyopenssl.sourceforge.net/
Best match: pyOpenSSL 0.6
Downloading http://downloads.sourceforge.net/pyopenssl/pyOpenSSL-0.6.tar.gz?modtime=1212595285&big_mirror=0
error: Download error for http://downloads.sourceforge.net/pyopenssl/pyOpenSSL-0.6.tar.gz?modtime=1212595285&big_mirror=0: (110, 'Connection timed out')

http://allmydata.org/buildbot/builders/feisty2.5/builds/1663/steps/compile/logs/stdio

We can make these compiles work by manually installing pyOpenSSL on those machines, but it might be better to make them work by changing the build steps to automatically test the "bundled dependencies/Desert Island scenario".

Does anyone want to do that? I can't take the time for it right now.

comment:30 Changed at 2008-09-08T21:43:43Z by zooko

  • Cc cgalvan added
  • Milestone changed from 1.3.1 to 1.3.0
  • Owner changed from cgalvan to warner
  • Status changed from assigned to new

It would be sweet to finish this ticket for the 1.3.0 release so that 1.3.0 would have a working sumo/desert-island install and so that the slim tarball and the darcs checkout would be slimmer.

Brian is Release Manager for 1.3.0 (at the moment), so he can kick this ticket back out of the Milestone if he wants.

Also he is probably the only person who has a chance of implementing this ticket in time. ;-) Assigning to Brian.

comment:31 Changed at 2008-09-08T21:44:11Z by zooko

Just imagine a Sumo wrestler on a Desert Island.

Hey, that reminds me of Virtua Fighter 3.

comment:32 Changed at 2008-09-16T14:30:08Z by warner

So, I'm experimenting with having the following in setup.cfg:

[easy_install]
find_links=misc/dependencies tahoe-deps ../tahoe-deps
           http://allmydata.org/trac/tahoe/wiki/Dependencies

And it appears to do the right thing w.r.t. finding .tar.gz files in those different directories. I'm assembling a tahoe-deps.tar.gz aggregate from the things we depend upon.

However, I'm running into a problem (that I think we've seen before). If I have, say, foolscap-0.3.1 installed (via a debian package) in /usr/lib, and if there is a foolscap-0.3.1.tar.gz present in tahoe-deps/ , then the tahoe build process will build foolscap and install it to ./support/lib/ even though it's already installed. If foolscap is in /usr/lib but the .tar.gz is *not* present in tahoe-deps/ , it is content to use the /usr/lib version. This appears to be true for most of our dependent libraries: twisted, simplejson, nevow, and pyopenssl, at least.

This is annoying, but not fatal. It builds take a good bit longer than they ought to. I'll poke at this some more, but I might push the changes that take advantage of tahoe-deps/ (and publish the tahoe-deps.tar.gz tarball to allmydata.org, and update the docs) even without fixing this.

comment:33 Changed at 2008-09-16T16:14:54Z by zooko

Hm... This bug doesn't sound familiar to me. It would totally be familiar if you were talking about Nevow instead of foolscap:

http://bugs.python.org/setuptools/issue20 http://bugs.python.org/setuptools/issue17 http://bugs.python.org/setuptools/issue36 http://divmod.org/trac/ticket/2699 http://divmod.org/trac/ticket/2629 http://divmod.org/trac/ticket/2527

Does the foolscap from the Debian package come with a .egg-info file? Is the .egg-info file in /var/lib/python-support/python2.5 ?

comment:34 Changed at 2008-09-16T21:50:54Z by warner

Foolscap is packaged with 'pyshared' (as opposed to 'python-support'), so the code lives in /usr/lib/python2.5/site-packages/foolscap . There is an /usr/lib/python2.5/site-packages/foolscap-0.3.1.egg-info/ directory right next to it. The files in that directory are all symlinks to /usr/share/pyshared/foolscap-0.3.1.egg-info/* .

So it seems like it's a different problem than the nevow/python-support issue.

comment:35 Changed at 2008-09-17T02:02:51Z by zooko

The next version of setuptools is going to be shipped any day now (it is currently blocked on a couple of bugs that I opened and that PJE fixed and that he asked me to test his fix). So now would be a fine time to open a bug report on http://bugs.python.org/setuptools/ .

comment:36 Changed at 2008-09-17T06:22:30Z by warner

I just pushed a bunch of changes that add those tahoe-deps/ directories to setup.cfg, and remove most of the tarballs from misc/dependencies/ . There is now a tahoe-deps.tar.gz available at http://allmydata.org/source/tahoe/tarballs/tahoe-deps.tar.gz which contains up-to-date versions of everything. There is also a unit test (well, an extra step in the 'clean' builder) that asserts that a build with tahoe-deps/ in place does not try to download anything.

I still need to update the docs and the wiki to explain this stuff, but the basic code is now in place. Note that there was a problem related to #455 (involving pyutil not being built correctly), with a workaround in place (run 'build-once' up to *three* times, if the first two attempts fail).

comment:37 Changed at 2008-09-17T20:17:54Z by warner

-SUMO tarballs are now being generated and uploaded by the buildbot. Only the docs are left.

comment:38 Changed at 2008-09-17T22:59:08Z by warner

  • Resolution set to fixed
  • Status changed from new to closed

Ok, docs are done. I've added the InstallDetails wiki page, and I've added some small changes to source:docs/install.html to reference it. Finally closing this ticket.

Note: See TracTickets for help on using tickets.