Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pip output is confusing/incomplete when a package is pulled with different extras #11683

Closed
1 task done
dechamps opened this issue Jan 1, 2023 · 4 comments
Closed
1 task done
Labels
C: extras Handling optional dependencies C: list/show 'pip list' or 'pip show' resolution: duplicate Duplicate of an existing issue/PR

Comments

@dechamps
Copy link

dechamps commented Jan 1, 2023

Description

Consider the following:

  • Package A depends on packages B and C
  • Package B depends on D (without extras)
  • Package C depends on D[some_extra]
  • Package D depends on package E only if some_extra is specified

pip install A will end up pulling E, which is correct. However, it is extremely difficult to figure out why, as the outputs of both pip install --verbose --verbose --verbose and the output of pip show are incomplete/misleading. Other tools such as pipdeptree can't figure it out either.

Expected behavior

No response

pip version

22.3

Python version

3.10

OS

Debian Unstable

How to Reproduce

virtualenv venv &&
(cd venv &&
bin/pip --verbose --verbose --verbose install nbclassic > pip-install.log
&& bin/pip show nbclassic jupyter_events jsonschema rfc3986-validator)

jupyter_events is installed because nbclassic specifies:

Requires-Dist: jupyter-events>=0.4.0

jsonschema is installed with the format-nongpl extra because jupyter_events specifies:

Requires-Dist: jsonschema[format-nongpl]>=4.3.0

And, crucially, nbformat (and fastjsonschema) also specifies:

Requires-Dist: jsonschema>=2.6

rfc3986-validator is installed because jsonformat specifies:

Requires-Dist: rfc3986-validator>0.1.0; extra == 'format-nongpl'

…and the format-nongpl extra is enabled for jsonformatfrom jupyter_events (see above).

Output

The output of bin/pip --verbose --verbose --verbose install nbclassic contains:

Added rfc3986-validator>0.1.0 from […] (from jsonschema>=2.6->nbformat->nbclassic) to build tracker […]

The path shown is wrong and misleading. nbformat does not pull rfc3986-validator through jsonschema because nbformat does not specify the format-nongpl extra for jsonschema. The correct output should have been:

Added rfc3986-validator>0.1.0 from […] (from jsonschema[format-nongpl]>=4.3.0->jupyter_events->nbclassic) to build tracker […]

pip info jupyter_events does not mention that it depends on the format-nongpl extra of jsonschema:

Name: jupyter-events
Version: 0.5.0
[…]
Requires: jsonschema, python-json-logger, pyyaml, traitlets
Required-by: jupyter_server

pip info jsonschema does not mention anything about rfc3986-validator:

Name: jsonschema
Version: 4.17.3
[…]
Requires: attrs, pyrsistent
Required-by: jupyter-events, nbformat

And finally, to add insult to injury, pip info rfc3986-validator claims it's not required by anything, which is clearly a lie since it was pulled by a dependency (jsonschema[format-nongpl])!

Name: rfc3986-validator
Version: 0.1.1
[…]
Requires: 
Required-by: 

Because of the above, it took me hours to figure out why pip install jupyter ends up pulling rfc3986-validator.

Code of Conduct

@dechamps dechamps added S: needs triage Issues/PRs that need to be triaged type: bug A confirmed bug or unintended behavior labels Jan 1, 2023
@dechamps
Copy link
Author

dechamps commented Jan 1, 2023

As an example, this made investigating vega/altair#2794 significantly harder than it should have been.

@potiuk
Copy link
Contributor

potiuk commented Jan 2, 2023

(Opinion) - I always observed similar behaviour, and while I believe the ouput printed by pip when installing can be improved, there is not much you can do about pip info and this is by design.

Maybe future diagnostics could be improved a little in terms of historical view on why certain decisions have been made by the resolver, but not more than that IMHO

From what I understand how extras work (It's not really specified I think, it's more of an observation of what I've seen), extras are always "transient" dependencies. They are only used during the run of the pip command, but they are never recorded by pip as stored dependency information after install is completed. The extras specified during installation is not somethign that is stored as "requirement" of a package (so that it could be later inspected by the pip info command) but it is merely a signal for pip to also install those packages specified by extras and include them in resolver calculations. It is only used at the moment where resolver is invoked (i.e. while pip install command is run) and the extra dependency information is discarded and not stored anywhwere.

And to be honest it's the only "reasonable" way you could approach it, when you think of that, because as a user you could change your mind what optional extras you want to install - over time. For example imagine (example from airflow that uses extras heavily):

  1. pip install apache-airflow[apache.beam]==2.5.0 -> installs set of dependencies for apache-beam for apache-airflow 2.5.0. One of those dependencies is "apache-beam>=2.33.0"
  2. pip remove apache-beam -> you are perfectly OK to do it. Apache-beam is an optional extra so uninstalling it does not remove apache-airflow
  3. pip install apache-airflow[google]==2.4.0 - > downgrades apache-airflow with optional google extra packages which includes many packages from google.Some of them are installed with their own extras including pulling some non-obvious dependencies (some apache-beam dependencies as well). One of those dependencies is "`apache-beam < 2.33.0" (this is not exactly how it is now but it could have happened).

There are multiple scenarios here that could go wrong if you try to "remember" what optional extras have been installed the previous time. Imagine those two commands produce a different (and conflicting) set of dependencies. For example:

  • should 3) in the case above "remember" that airflow has been previously installed with "apache.beam" extra and install/upgrade apache-beam extra dependencies as well for example?
  • What happens if you also have other packages installed that require "apache-beam" in different version (also optionally as an extra) and they are still installed ? Should their be downgraded as well if their dependencies do not agree with what the downgraded apache-airflow has as "apache.beam" extra ?
  • What happens if apache-airflow==2.4.0 does not have the "apache.beam" extra at all? Shoudl we "forget" about that extra? Or should we remember that airflow at "some point in time" was installed with "apache.beam" exrtra and whenever possible we should also continue upgrading those extras as well?
  • How do we get rid of the "extras" remembered information when we do not want this optional depenendcy any more?

And those above questions are just tip of the iceberg. There are many more complex scenarios because dependencies can also be specified with extras transitively and that adds even more complexity here and quite likely leads to a number of those scenarios being impossible to solve and even to reason about.

So - the choice that pip made (as I understand it) that resolver uses the last pip install command specification to produce the set of dependencies it looks at when resolving the dependencies to install - it does not look at all what has been specified via previous installation command of apacha-airflow (i.e. it does not remember that previous installation of apache-airflow used apache.beam as extra in the above scenario). That makes all those complexities go away. You just tell resolver to figure out the consistent set of dependencies including extras at the moment of installation and after installation is complete, you forget about those extra dependencies - quite literally. The information that a given package was installed because it was required by an extra is not stored anywhere.

And I think it is a good choice.

I think what can be done instead to make it easier for people like you and me is to have some kind of "audit log" of installations. I.e. to record an information about decisions made (so basically exactly what you see as output of pip install command. Maybe a bit more structured (so that you can query it when you are diagnosing things rather than actually relying on you having access to a previous pip install log. That could make the diagnosis easier. But such infomration would not be necessarily recorded and exposed via pip info - becuase pip info shows the current status of stored dependencies rather than historical view on decisions made in the past.

@pradyunsg
Copy link
Member

Thanks for filing this @dechamps! :)

pip info rfc3986-validator claims it's not required by anything, which is clearly a lie since it was pulled by a dependency (jsonschema[format-nongpl])!

Yea, I'll consolidate this into #3797. I will agree that it's an annoying issue, but it's also frustratingly tricky because there's no metadata stored about which "extras" were installed-on-request in the environment.


In the future @potiuk, I'd appreciate if you defer posting your opinion until after an issue has been triaged by triagers/maintainers -- it'd be good to avoid overwhelming reporters with various peoples' opinions as the first interaction on their issue. :)

The information that a given package was installed because it was required by an extra is not stored anywhere.

And I think it is a good choice.

I haven't read the entire wall of text posted (sorry, I have limited time), but I will say that I disagree with, at least, this part of what has been said.

@pradyunsg pradyunsg added resolution: duplicate Duplicate of an existing issue/PR C: list/show 'pip list' or 'pip show' C: extras Handling optional dependencies and removed type: bug A confirmed bug or unintended behavior S: needs triage Issues/PRs that need to be triaged labels Jan 2, 2023
@potiuk
Copy link
Contributor

potiuk commented Jan 3, 2023

In the future @potiuk, I'd appreciate if you defer posting your opinion until after an issue has been triaged by triagers/maintainers -- it'd be good to avoid overwhelming reporters with various peoples' opinions as the first interaction on their issue. :)

Sure, if you think that positing opinions (even clearly stating that this is an opinion coming from heavy users coming from an experience), I can hold a bit with my opinions not being the first response.

But just wanted to note, that I find it a bit awkward approach personally. My comment was not directed to you but to help @dechamps to understand the complexities involved. I understand you have limited time as maintainer to read all messages in pip bit the users who raise issues mostly read just that one issue and actually might find time to read and understand that kind of explanation.

I think if you really want to be the first one to respond to all pip issues, you make yourself single point of synchronisation. It could be that this answer actually responded to some users question and they could have chosen to close the issue on their own with "now I understand how it works" (this is one of the ways we manage to scale in Airflow - by having our users help other users and encouraging it). As maintainer of Airflow I am super happy when someone posts an answer that I can read and maybe just slightly correct rather than having to answer it all myself and spend a lot of time on writing answer. But maybe it's just me - I read and understand much faster than I write.

And BTW. I think I described pretty well the current state of the #3797 with some good examples showing the complexities involved, and I believe discussing "external way" of showing the extra dependencies is exactly what's beeing discussed in https://discuss.python.org/t/provide-information-on-installed-extras-in-package-metadata/15385 (and BTW thanks for pointing to those, this is really helpful). Maybe phrasing is a little "lame" (from the user's perspective).

The information that a given package was installed because it was required by an extra is not stored anywhere.

I haven't read the entire wall of text posted (sorry, I have limited time), but I will say that I disagree with, at least, this part of what has been said.

Yep. You are right. I think that was slightly incorrect. It is kept in package metadata of course. I realised after reading the discussion you pointed out to that there is no extra information that pip keeps - but it's all is in package metadata. Actually it made me realise this is the main reason for the choice (I believe) that pip does not keep any information between the installs, it entirely relies on the package metadata - and that's the main reason it cannot use the previous installation specification for subsequent installs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 4, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
C: extras Handling optional dependencies C: list/show 'pip list' or 'pip show' resolution: duplicate Duplicate of an existing issue/PR
Projects
None yet
Development

No branches or pull requests

3 participants