#1220 closed defect (fixed)

build/install should be able to refrain from getting dependencies

Reported by: gdt Owned by: gdt
Priority: major Milestone: 1.11.0
Component: packaging Version: 1.8.0
Keywords: setuptools security install packaging pip Cc: zooko
Launchpad Bug:

Description (last modified by daira)

In a managed package system, each program's dependencies are expressed in control files and provided before the package builds. If the package has more dependencies than expresssed, the right behavior is failure so that this can be fixed, and it is unhelpful to download/install code either from included eggs or especially from the net.

There are two parts to this problem. One is downloading and installing things like py-cryptopp. The other is that tahoe seems to have to need modified versions of standard tools and has included eggs. This kind of divergence should be resolved.

I realize that this complaint is perhaps directed at setuptools, but tahoe-lafs inherits responsibility.

A reasonable solution would be to have a switch that packaging systems can add.

I put this on packaging even though the bug is in tahoe-lafs, not in any packaging of it.

Change History (34)

comment:1 Changed at 2010-10-18T01:44:59Z by zooko

  • Keywords setuptools added

comment:2 Changed at 2010-10-30T08:06:18Z by zooko

I just remembered that there is the --single-version-externally-managed flag. If you pass that flag as an argument to python setup.py install then it will suppress all automated fetching of dependencies. We test the use of this flag on all of our buildbots--look at the buildsteps called "install-to-prefix", e.g. this one on NetBSD and "test-from-prefixdir", e.g. this one on NetBSD. "install-to-prefix" does an install using --single-version-externally-managed to suppress automated resolution of dependencies, and "test-from-prefixdir" runs the unit tests in the resulting target directory where it was installed to.

Please try adding --single-version-externally-managed and see if that is sufficient to close this ticket.

comment:3 Changed at 2010-10-30T08:06:32Z by zooko

  • Owner changed from somebody to gdt

comment:4 Changed at 2010-10-30T11:27:22Z by np

I don't see how a flag passed at install time would really fix the issue. What I would like is to tell the build step to not install missing dependencies.

comment:5 Changed at 2010-10-30T20:36:57Z by zooko

Well, can you (either of you) show me a script that is used to package Python applications for your system? I imagine that you could do something like this:

tar xf $SOURCE_DISTRIBUTION
cd $TOP_LEVEL_DIR
python setup.py install --prefix=$TARGETDIR --single-version-externally-managed --record=list_of_installed_files.txt

Then collect all the files that got written into $TARGETDIR and put them into your newly created package. This should work with any setuptools-built Python package.

But, if that's not how you do it, then show me how you do it and I'll see if I can help make it so that the setuptools automatic resolution of dependencies gets out of your way.

comment:6 Changed at 2010-10-31T01:29:40Z by gdt

Here's a log of building under pkgsrc. You can see that it's basically setup.py build (with the presetup to have a symlink tree of allowed libaries, so that only expressed dependencies are available). build doesn't have --single-version-externally-managed but install does. So are you saying that I should pass --single-version-externally-managed to the build phase as well?

pkgsrc-build-log.txt

comment:7 follow-up: Changed at 2010-10-31T03:07:59Z by zooko

Okay, thanks for the log! My current thought is: do we need the build step for anything? What happens if you just comment-out that step and head straight for the install step? As far as I know, that will work, and will also completely avoid any automated downloading of any dependencies (since the install step already has --single-version-externally-managed). Tahoe-LAFS doesn't have any native code modules that need to be compiled, but even if it did (or if you used this same script for a different Python package which did have native code modules) then I think running python setup.py install would automatically build those native code modules, so I don't think you really need to invoke python setup.py build directly.

comment:8 Changed at 2010-10-31T03:28:03Z by zooko

I just ran a quick manual test locally, and python setup.py build --single-version-externally-managed gives an error message saying that "--single-version-externally-managed" is not a recognized option for "build", but python setup.py install --single-version-externally-managed --prefix=instdir --record=list-of-installed-files.txt correctly builds and installs without downloading any dependencies.

comment:9 in reply to: ↑ 7 ; follow-up: Changed at 2010-10-31T12:51:11Z by gdt

Replying to zooko:

[Can't you just install but not build?]

No, because pkgsrc requires that the build phase do all things that feel like what "make" should do, and stay within the working directory. Then install does what "make install" should do and puts compiled bits in a staging area. Then the package tar bundles up that staging area.

It seems odd to me that --single-version-externally-managed suppresses dependencies and is only valid at install. I had thought -svem was about changing the way the egg file is created, and the dep suppression seems to be a side effect.

comment:10 follow-up: Changed at 2010-10-31T12:52:12Z by gdt

The real question for me is whether a build/install attempt would fail and refraing from getting dependencies in the case where they didn't already exist.

comment:11 in reply to: ↑ 9 Changed at 2010-10-31T13:32:19Z by zooko

Replying to gdt:

Replying to zooko:

[Can't you just install but not build?]

No, because pkgsrc requires that the build phase do all things that feel like what "make" should do, and stay within the working directory. Then install does what "make install" should do and puts compiled bits in a staging area. Then the package tar bundles up that staging area.

But there aren't any compiled bits, so as far as I can tell if we force the build phase to be a no-op then we still satisfy the pkgsrc protocol. Alternately, if you let the build phase be python setup.py build (just like it currently is) instead of a no-op then we are still satisfying the protocol because it keeps all of the deps that it acquires within its working directory.

But maybe there is another requirement for the build phase besides what you wrote above, such as "no open connections to remote hosts" or perhaps even more importantly "no printing out messages that make the human think that you are installing deps".

Is one or both of those a requirement? Am I missing some other requirements on what the build phase is allowed/required to do?

It seems odd to me that --single-version-externally-managed suppresses dependencies and is only valid at install. I had thought -svem was about changing the way the egg file is created, and the dep suppression seems to be a side effect.

Why do you find this to be odd? Perhaps it is because you think of python setup.py build as the step that would create a egg if a egg were going to be created? It is not—if an egg were going to be created, that would be done in python setup.py install.

comment:12 in reply to: ↑ 10 ; follow-up: Changed at 2010-10-31T13:39:12Z by zooko

Replying to gdt:

The real question for me is whether a build/install attempt would fail and refraing from getting dependencies in the case where they didn't already exist.

Oh, I see, so the requirement that I was missing on the "build" step is: "return non-zero exit code if any of the deps are missing".

Waitaminute, that's not truly a requirement. None of your C programs, for example, reliably do that do they? Or maybe some of them do nowadays by using a tool like pkg-config?

So, I'm still not 100% certain what you mean by "refrain from getting dependencies". Does my buildstep fail if it opens a TCP or HTTP connection but doesn't download any large files? Does it fail if it downloads a large file but that file isn't a dependency? What if it downloads a dependency as a .zip or a .tar but doesn't unpack it? What if it unpacks it but only into the current working directory (this is the one that it currently does)? What if it writes it into /usr/lib/python2.6/site-packages and then edits your /usr/lib/python2.6/site-packages/site.py script to change the way Python imports modules (this is the one that it would do if you ran sudo python setup.py install)? Does it matter whether it prints out messages describing what it is doing versus if it stays quiet? Does it matter how long it takes to finish the build step?

comment:13 in reply to: ↑ 12 Changed at 2010-10-31T13:40:39Z by zooko

Replying to zooko:

Replying to gdt:

The real question for me is whether a build/install attempt would fail and refraing from getting dependencies in the case where they didn't already exist.

Oh, I see, so the requirement that I was missing on the "build" step is: "return non-zero exit code if any of the deps are missing".

Waitaminute, that's not truly a requirement. None of your C programs, for example, reliably do that do they? Or maybe some of them do nowadays by using a tool like pkg-config?

[following-up to myself]

Although we could potentially do better than C programs and actually satisfy this requirement of reliably exiting with non-zero exit code if all of the deps aren't already present. Is that what we should do? It sounds like we would be going over and above the normal requirements of a pkgsrc build step and if we were going to go that direction then we should try to generalize the hack so that all Python programs that are being built by pkgsrc would do the same. :-)

comment:14 Changed at 2010-10-31T22:38:17Z by gdt

You raise good points about unarticulated requirements; a lot of them are captured in "what 'make' is supposed to do". So specifically, the build phase

  • should fail if any dependencies are missing. C programs use autoconf, or autoconf/pkg-config, and fail at configure phase. Or, they are old-school and do -lfoo and that fails at build time if libfoo is not installed. You are probably right that some C programs do not reliably fail, but they should.
  • must not use the net at all, and use only files expressed in the "distinfo" manifest and downloaded during the fetch phase, and unpacked in the working directory in the extract phase. If a file is needed it is listed in distinfo and make fetch gets it. (Without this, offline building fails and GPL compliance is difficult - how do you find the list of sources that must be distributed with the resulting binary package?)
  • must set up install so that the list of created files is always the same

An underlying goal is that building a package should have a deterministic outcome, with the same bits produced regardless of which dependencies or other programs were already installed. This allows the use of the resulting binary packages on other systems. If a program has an optional dependency foo, then the pkgsrc entry has to require foo (and thus depend on the foo package), or disable use of foo, or have a pkgsrc option to control it. Having the built package be built differently depending on whether foo is present is considered a packaging bug (and perhaps an upstream bug, if there's no --disable-foo switch/method).

It's also a goal to be able to 'make fetch-list|sh' on a net-connected machine and grab all distfiles but not build, and then to be able to build offline.

I see that there are .pyc files installed, but not produced during build. This seems wrong, but not important or causing an actual problem, and it seems to be the python way.

Basically, there's a huge difference in approach between large-scale package management systems and the various language-specific packaging systems. I suspect debian/ubuntu and rpms are much more like pkgsrc than not in their requirements. But, there seems not be a culture of bulk building all rpms in Linux; it seems the maintainers build them and upload them.

comment:15 Changed at 2010-10-31T23:18:10Z by gdt

I ran 'python2.6 install --single-version-externally-managed --root ../.destdir" without having run build, after uninstalling nevow. The install completed, and then running that tahoe failed on importing tahoe.

Having read setup.py and _auto_deps.py, I think the problem is in hand-written setup code in tahoe-lafs which needs a switch to require/fail vs require/fetch.

[This problem isn't causing me lots of trouble; I simply check the build output when updating the package and manually consider it broken if it uses the net.]

comment:16 Changed at 2010-11-01T01:36:33Z by zooko

Should we close this ticket, due to the existence of the --single-version-externally-managed flag for python setup.py install?

comment:17 follow-ups: Changed at 2010-11-01T11:38:35Z by gdt

No, because a) --svem isn't usable during a build phase (install writes to the destination) and b) it doesn't check dependencies and fail. (This gives me the impression install is only supposed to be used after build.)

I don't mean to demand that anyone spend time on this, but I still think the setup.py code is incorrect compared to longstanding open source norms.

I would be curious to hear about how people who work on packaging for other systems deal with this issue.

comment:18 Changed at 2010-11-01T11:40:00Z by gdt

  • Priority changed from major to minor

This problem is an annoyance and increases the risk of packaging errors, but the resulting packages are ok. Therefore dropping to minor, which is probably should have been already.

comment:19 in reply to: ↑ 17 Changed at 2010-11-01T17:43:44Z by davidsarah

Replying to gdt:

No, because a) --svem isn't usable during a build phase (install writes to the destination) and b) it doesn't check dependencies and fail. (This gives me the impression install is only supposed to be used after build.)

'setup.py install' and 'setup.py build' are alternatives. As far as I understand, it isn't intended that both be used.

I don't mean to demand that anyone spend time on this, but I still think the setup.py code is incorrect compared to longstanding open source norms.

I don't dispute that, but I favour making sure that a replacement for setuptools -- probably Brian's "unsuck" branch -- follows those norms by default, rather than continuing to hack at zetuptoolz. zooko's efforts with the latter are appreciated, but that approach has consumed an enormous amount of development effort, and is still causing obscure and often irreproducible bugs on our buildslaves and for our users.

comment:20 Changed at 2010-11-17T08:05:39Z by zooko

I was just hacking at zetuptoolz and I noticed that there is already a method named url_ok() which implements the feature of excluding certain domain names from the set that you will download from. If we hack it to always return False (when the user has specified "no downloads") then this would be our implementation of this ticket. Here is the url_ok() method in zetuptoolz. Here is the current body of it:

    def url_ok(self, url, fatal=False):
        s = URL_SCHEME(url)
        if (s and s.group(1).lower()=='file') or self.allows(urlparse.urlparse(url)[1]):
            return True
        msg = "\nLink to % s ***BLOCKED*** by --allow-hosts\n"
        if fatal:
            raise DistutilsError(msg % url)
        else:
            self.warn(msg, url)

comment:21 in reply to: ↑ 17 Changed at 2010-11-27T18:46:10Z by zooko

Replying to gdt:

No, because a) --svem isn't usable during a build phase (install writes to the destination)

How about this. I'm going to propose a build step and you have to tell me if you would accept any code that passes that build step or whether you have other requirements.

The buildstep starts with a pristine tarball of tahoe-lafs and unpacks it, then runs python setup.py justbuild. If the code under test emits any lines to stdout or stderr which have the phrase "Downloading http" then it is marked as red by this buildstep. (The implementation of this test is visible here: misc/build_helpers/check-build.py, which is invoked from here: Makefile)

Then the buildstep runs python setup.py justinstall --prefix=$PREFIXDIR. Then it executes $PREFIXDIR/bin/tahoe --version-and-path and if the code under test emits the right version and path then it is marked as green by this buildstep, else it is marked as red.

Now, one thing that this buildstep does not require of the code under test is that it detect missing dependencies or that it find and download missing dependencies. That would be cool, and you have requested it in this ticket, and I know how to implement it, but since that is above and beyond the standard packaging functionality that we're trying to emulate perhaps we should open a separate ticket and finish fixing the basic functionality first.

This means that the test can't give the code under test a fair chance of going green unless it is run on a system where all of the dependencies are already installed. As far as I understand, that's standard for this sort of packaging.

comment:22 Changed at 2010-11-30T15:18:58Z by zooko

If you like this ticket, you might also like #1270 (have a separate build target to download any missing deps but not to compile or install them).

comment:23 Changed at 2011-01-29T04:34:30Z by davidsarah

  • Keywords security added
  • Priority changed from minor to major

I don't consider this a minor issue, because the downloading from potentially insecure sites is a significant vulnerability (as we were recently reminded by SourceForge being compromised -- and setuptools will happily download from far less secure sites than SourceForge).

comment:24 Changed at 2011-01-31T04:38:20Z by zooko

  • Cc zooko added

comment:25 Changed at 2011-03-07T12:28:03Z by zooko

People were just wishing for related (but not identical) functionality on the distutils-sig mailing list and Barry Warsaw settled on patching setup.cfg of each Python project that he is building to add this stanza:

[easy_install]
allow_hosts: None

http://mail.python.org/pipermail/distutils-sig/2011-February/017400.html

But I still feel like this ticket is underspecified. Before I make further progress on this ticket I want someone who cares a lot about this issue to tell me whether the test procedure (which is a Buildbot "build step") in comment:21 would be sufficient.

comment:26 Changed at 2012-05-14T08:24:10Z by zooko

As Kyle mentioned on a mailing list thread, it would be nice if, when the build system detects that it already has everything it needs locally, then it doesn't look at the net at all. If this ticket were fixed, and we had the ability to refrain from getting dependencies, then we could also implement this added feature of "don't look at the net if you already have everything you need". I guess that should really be a separate ticket, but I honestly don't feel like going to all the effort to open a separate ticket.

I'll just re-iterate that if you want me, or anyone else, to make progress on this ticket, then please start by answering my questions from comment:21.

comment:27 Changed at 2012-10-29T09:46:01Z by zooko

The "allow_hosts=None" configuration that Barry Warsaw was using (mentioned in comment:25) is documented here:

comment:28 Changed at 2013-08-31T12:14:41Z by daira

  • Description modified (diff)
  • Keywords install packaging pip added

pip has the following relevant options:

-d, --download <dir>

 Download packages into <dir> instead of installing
 them, regardless of what’s already installed.

--download-cache <dir>

 Cache downloaded packages in <dir>.

--src <dir>

 Directory to check out editable projects into.
 The default in a virtualenv is “<venv path>/src”.
 The default for global installs is
 “<current dir>/src”.

-U, --upgrade

 Upgrade all packages to the newest available
 version. This process is recursive regardless of
 whether a dependency is already satisfied.

--force-reinstall

 When upgrading, reinstall all packages even if
 they are already up-to-date.

-I, --ignore-installed

 Ignore the installed packages (reinstalling
 instead).

--no-deps

 Don’t install package dependencies.

--no-install

 Download and unpack all packages, but don’t
 actually install them.

--no-download

 Don’t download any packages, just install the
 ones already downloaded (completes an install run
 with –no-install).

These seem very comprehensive and useful!

Last edited at 2014-01-02T21:03:54Z by daira (previous) (diff)

comment:29 follow-up: Changed at 2013-12-29T23:28:27Z by jmalcolm

I don't fully understand Zooko's suggestion in ticket:1220#comment:21 above, probably because I know very little about python packaging. Here's what I would want:

1) A way for there to be no network activity, of any kind, when building or installing Tahoe-LAFS

that implies:

2) Whether or not network activity is available, a build or install should have the same behavior - either it works, as it can find all dependencies, or it can't, so it fails

comment:30 in reply to: ↑ 29 Changed at 2013-12-30T16:31:17Z by zooko

Replying to jmalcolm:

I don't fully understand Zooko's suggestion in ticket:1220#comment:21 above, probably because I know very little about python packaging. Here's what I would want:

jmalcolm: what you wrote there seems consistent with my proposal from comment:21.

comment:31 Changed at 2014-03-17T18:26:02Z by daira

On #2055, dstufft wrote:

FWIW pip --no-download is bad and you shouldn't use it. If you want to do that you should donwload the packages to a directory (you can use pip install --download <directory> [package [package ...]] for that) and then use pip install --no-index -find-links <directory> [package [package ...]].

Version 1, edited at 2014-03-17T18:26:25Z by daira (previous) (next) (diff)

comment:32 Changed at 2015-07-21T19:05:49Z by zooko

I think a good next-step on this is #2473 (stop using setup_requires).

comment:33 Changed at 2015-07-21T19:08:30Z by zooko

Another good next step on this is to take the "Desert Island" test (https://github.com/tahoe-lafs/tahoe-lafs/blame/15a1550ced5c3691061f4f07d3597078fef8814f/Makefile#L285) and copy it to make this test. The changes from the "Desert Island" test to this test are:

  1. This test starts with just the Tahoe-LAFS source; the Desert Island test starts with the SUMO package.
  2. This test runs python setup.py justbuild; the Desert Island test runs python setup.py build.

comment:34 Changed at 2016-03-26T21:27:07Z by warner

  • Milestone changed from undecided to 1.11.0
  • Resolution set to fixed
  • Status changed from new to closed

I think this should be resolved, now that we're using pip/virtualenv, and do not have a setup_requires= anymore. Packagers can use python setup.py install --single-version-externally-managed with a --root that points into a new directory, then turn that directory into a package. I believe this is how Debian currently does things, and by changing Tahoe to behave like every other python package, we should be able to take advantage of that machinery.

gdt, please feel free to re-open this if you disagree.

Note: See TracTickets for help on using tickets.