Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Package version in notebooks #26

Closed
tfiers opened this issue Apr 25, 2022 · 7 comments
Closed

Package version in notebooks #26

tfiers opened this issue Apr 25, 2022 · 7 comments

Comments

@tfiers
Copy link
Contributor

tfiers commented Apr 25, 2022

or
How to declare which version of our python package is used in a notebook

Background

Our plan is to take code that is used in multiple notebooks, and factor it out to a python package, which is then imported in notebooks (see also our zoom meeting, and #2 and #19).
This python package will evolve over time.

Notebooks made using an older version of the package will not work anymore with newer versions.
(Specifically, versions with API-breaking changes. These versions have a higher 'major' numer in the 'major.minor.patch' scheme of semantic versioning, which seems like a useful scheme to use. Besides a versioning scheme, we should also keep a changelog).

Proposal

Notebook authors declare which version of our python package they use in a cell at the top of their notebook.

Implementation

This can be done like so (docs):

!pip install git+file:///path/to/our/python/package/@v2.3

The path would e.g. be ../spikeloc/ (or ../soundloc/, or whatever we name the package dir).
The two dots are to move up out of the notebooks/ dir (or research/, or whatever we name the experiments / figure-building / drafting / research dir).

@v2.3 is a git tag.
It can be auto-synched to the pip version of the package using setuptools_scm.

This implementation requires the directory of our python package to be a git repository.

I think it is a good idea to have this be a separate git repository from the one containing our notebooks and web/ etc:

  1. This allows us to easily check out an older version of the code without having newer notebooks disappearing out of view.
  2. This separate repo, and its corresponding GitHub page, would be more like a pure and classical software engineering project / open source software package (which the current project is not).
    It'd be a good place to apply the proposed open-source (python) software conventions, like a top-level pyproject.toml/setup.py (link) and an Architecture.md (link) etc.

Alternatives

PyPI

An alternative to pip install git+file:///../soundloc@v2.3 is

!pip install comob-soundloc~=2.3

where comob-soundloc is the name of our package on PyPI.

(PyPI has no namespacing, so package authors need to do that manually if they don't want to muddle the global namespace. Hence the comob-. Note that the PyPI / pip install name needn't be the same as the python module / import name).

The problem with this solution is that it does not work offline, which I find important.

No explicit versioning

Another alternative, for this entire issue, is to have every notebook use the same (i.e. newest) version of the package.
This will stifle package evolution, as you'd have to go and retroactively update the growing list of notebooks on every API-breaking change, to keep them runnning.

Alternatively, we could accept older notebooks not running anymore.

If you still wanted to run an old notebook in this no-versioning, no-old-notebook-updating world, you can git checkout a commit from when the notebook last run succesfully.
This is what I do for my PhD project, where I don't bother with explicit code versioning – which is feasible only because I am both the only user and developer of the API.


Thoughts @thesamovar and @synek?

@tfiers
Copy link
Contributor Author

tfiers commented Apr 25, 2022

forgot to add:

The git repository for our python package can be a git submodule in the current repo.
That way, a path like ../ourpackage from inside the notebook dir will always work.

(git submodules are notoriously hard to work with.
In my experience, the fears are overblown, but not unfounded.
I have some experience with them and can write tips to avoid their annoying pitfalls).

@rorybyrne
Copy link
Contributor

What about just installing a commit hash or tag from Github?

!pip install git+git://github.com/comob-project/snn-sound-localization.git@39445ad

Personally, I don't think we need to separate the main package to another repo. If we just keep a setup.py or pyproject.toml (pip can install from pyproject.toml, I recently discovered) at the root level of this repo then it should work fine.

@tfiers
Copy link
Contributor Author

tfiers commented Apr 25, 2022

That's good to know! I like TOML.
Do you know if you can do editable installs (pip install -e/--editable) with pyproject.toml?
Such installs are useful when developing a package while using it in a notebook (so that you not always have to restart and reinstall).

The ability to work on a train or in a park (and thus without github/PyPI) is not practically essential, but is philosophically nice and fits well with the rest of our stack that's already offline first (Jupyter and PyTorch etc).

@rorybyrne
Copy link
Contributor

Looks like there's no editable installs from pyproject.toml (issue, PR).

I just tried, and got this to work in Jupyter:

!pip install git+file://$PWD/../..@abcdef

@tfiers
Copy link
Contributor Author

tfiers commented Apr 25, 2022

That's the most heroic PR I've ever seen 😦

@thesamovar
Copy link
Contributor

Let's definitely not use git submodules, whatever we do! Nightmare.

I feel like our current plan is everyone works in their own branch, and they make sure it works with the version of the package in that branch. If they want to "finalise" their notebook and have it merged into main they need to make it work with the main version of the package. This is annoying, but typically only has to be done once when it's finished. At that point, anyone wanting to breaking change the package needs to also update the notebooks to work with the new version. This makes breaking changes to the package burdensome, but that's good - we shouldn't be breaking it regularly only when there's an overwhelmingly strong reason to do so. Also, there should be relatively few notebooks merged into main and they should be relatively simple because they'll each correspond more or less to one figure in the paper.

How about that?

My worry about versioning the package and the pip installs is that it will make the overhead to contributing too high. We want to get people in, get them enthused and contributing.

@tfiers
Copy link
Contributor Author

tfiers commented May 11, 2022

OK, I had in mind that different 'throwaway' experiments would co-exist on main (and thus also be visible as links in the main jupyterbook website).

If instead main is for a few and 'paper-ready' notebooks, then the workflow you describe Dan is good.
The advantage is that there is indeed no need to muck with explicit versions.
We'll start doing as you describe and I'll close this issue

@tfiers tfiers closed this as completed May 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants