Opened at 2007-05-21T15:49:03Z
Last modified at 2020-01-17T15:05:20Z
#47 closed enhancement
use pyutil as a separate package and contribute src/allmydata/util/* into pyutil — at Version 2
Reported by: | zooko | Owned by: | zooko |
---|---|---|---|
Priority: | major | Milestone: | eventually |
Component: | packaging | Version: | 0.6.1 |
Keywords: | pyutil | Cc: | |
Launchpad Bug: |
Description (last modified by zooko)
I'm posting this just as a way to get any feedback from Brian before I go and do this. My motivation is
(a) I'm tired of using cp/diff/path/vim to port improvements back and forth and merge changes among the current proprietary Mountain View source code, the Tahoe source code, and the separate pyutil project.
(b) I would like to make the contents of tahoe/src/allmydata/util/ available to a wider audience of Python programmers who might use them.
(P.S. I'm not entirely sure whether this is component "code" or "packaging". Oh well.)
Change History (2)
comment:1 follow-up: ↓ 2 Changed at 2007-05-21T19:34:48Z by warner
comment:2 in reply to: ↑ 1 Changed at 2007-05-21T20:18:42Z by zooko
- Description modified (diff)
Replying to warner:
With our code-coverage graph in place, I'm motivated to keep the percentage number up, and so I'm reluctant to add code to the tree that doesn't have unit test coverage. This sometimes translates into a strange reluctance to add unused code to our tree.
It would make sense for our code-coverage to measure tahoe code separately from 3rd-party-library-that-we-use code. This would eliminate your concern in the best (?) possible way. (We could also, of course, do code coverage, unit tests, etc. on dependent libraries that we rely on, but that should be measured separately and anyway is optional.)
I don't think we have a clear distinction between what code goes in src/allmydata/util/ and what goes in the parent directory. upload/download is clearly "not" a utility, idlib and bencode clearly are, but.. hashutil? figleaf? Some of these things have been incorporated from other upstream sources and adapted to our needs, and folks who want to use them may be better served by going directly to the upstream provider.
Let's define some subdirectory, possibly "util", to be "all source code which would hypothetically be useful to a programmer who was programming something other than Tahoe". If we have some other code which is "generally utility-like code that is used from multiple parts of Tahoe", but doesn't fall into the first category, then let's put that in a separate directory.
Once we've done that, then I further propose that the first category -- code potentially useful to non-Tahoe-hackers -- be packaged and distributed separately. I then further propose that it be named "pyutil" and merged with the current contents of the pyutil package. All of these proposals are negotiable.
Now when we use source that is more maintained by other people than by us then we have three options:
- Add that source as another dependency (i.e., use a package management tool instead of "cp" to include and maintain it)
- Use cp to include it, and then it becomes de facto part of the aforementioned "util" package which is hypothetically valuable to people who aren't hacking on tailor
- Make a separate directory for "stuff which we are not the primary maintainers of but which we do not include as a packaged dependency".
I vote for #2, for those cases where the thing is too small or unmaintained or whatever to warrant #1.
I imagine that the easiest way to track pyutil within tahoe would be to have a separate darcs repository for pyutil, then merge that into tahoe under src/pyutil .
Agreed.
But I find I'm reluctant to add more directories to src/, since everything there adds confusion between our snapshots and the upstream project's own releases. (having snapshots in src/ reduces the amount of work that developers must invest to get a working tahoe tree, in that they only have to build one project instead of four, but it makes it more difficult to build tahoe against an newer upstream version of e.g. foolscap or zfec).
I agree, but that is also a feature -- it is normal to define a version of tahoe that depends on a specific version of the upstream source, and it makes it easier for us to patch those sources. In a sense it transfers a certain degree of authority over "what should go into the version of Crypto that tahoe uses" from the "upstream" maintainers of Crypto to the maintainers of tahoe and also to the users who have a tahoe source tree. I have a strong emotional preference for this political organization over the tradition of the "upstream" maintainers having more control over what version of "their" library is used than the re-users have. However, we shouldn't make too much of this -- if we didn't do it this way it would still be possible for us to require specific versions or patches of dependent libraries, and when we do it this way it is still easy for us and for our users to upgrade to newer versions which are published by upstream maintainers. So perhaps we should name this political issue explicitly in order to clear our minds of it and then concentrate on practical issues. ;-) (Indeed, we are collectively the authors of 3 out of 4 of the dependent packages in question: foolscap, zfec, and pyutil, and there is no known upstream maintainer of the 4th -- pycrypto.)
some random thoughts..
I'm all for code-reuse and minimizing the effort of same. Copying code by hand sucks.
With our code-coverage graph in place, I'm motivated to keep the percentage number up, and so I'm reluctant to add code to the tree that doesn't have unit test coverage. This sometimes translates into a strange reluctance to add unused code to our tree.
I don't think we have a clear distinction between what code goes in src/allmydata/util/ and what goes in the parent directory. upload/download is clearly "not" a utility, idlib and bencode clearly are, but.. hashutil? figleaf? Some of these things have been incorporated from other upstream sources and adapted to our needs, and folks who want to use them may be better served by going directly to the upstream provider.
I imagine that the easiest way to track pyutil within tahoe would be to have a separate darcs repository for pyutil, then merge that into tahoe under src/pyutil . But I find I'm reluctant to add more directories to src/, since everything there adds confusion between our snapshots and the upstream project's own releases. (having snapshots in src/ reduces the amount of work that developers must invest to get a working tahoe tree, in that they only have to build one project instead of four, but it makes it more difficult to build tahoe against an newer upstream version of e.g. foolscap or zfec).
So, just thoughts for discussion, not conclusions or suggestions.
I also do not know what component would cover this sort of issue. Maybe 'subproject packaging'? I created the 'code' component to talk about things other than debian packaging, trac setup, darcs repositories, testnet, or architecture docs. #28 is for discussing new component names to add, but it may well be that the existing components are confusing enough and that we need fewer rather than more.