Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make plugin=kedro-datasets install-test-requirements has dependency conflicts #597

Open
grofte opened this issue Mar 6, 2024 · 8 comments
Labels
bug Something isn't working datasets

Comments

@grofte
Copy link

grofte commented Mar 6, 2024

Description

Running make plugin=kedro-datasets install-test-requirements does not work. Dependency conflicts.

Context

I wanted to contribute a PR for some Polars support but I can't install the dependencies.

Steps to Reproduce

  1. Fork + clone repo
  2. Create anaconda environment with Python 3.9 conda create -n PR-kedro python=3.9 (contribution readme says 3.6+ but PyPi says 3.9+)
  3. conda activate PR-kedro
  4. Run make plugin=kedro-datasets install-test-requirements

Expected Result

Pip should install the required libraries.

Actual Result

Pip did not.

INFO: pip is looking at multiple versions of dask[complete] to determine which version is compatible with other requirements. This could take a while.
INFO: pip is still looking at multiple versions of dask[complete] to determine which version is compatible with other requirements. This could take a while.
INFO: This is taking longer than usual. You might need to provide the dependency resolver with stricter constraints to reduce runtime. See https://pip.pypa.io/warnings/backtracking for guidance. If you want to abort this run, press Ctrl + C.
ERROR: Cannot install dask[complete]==2024.2.1 and kedro-datasets[test]==2.1.0 because these package versions have conflicting dependencies.

The conflict is caused by:
    kedro-datasets[test] 2.1.0 depends on dask>=2021.10; extra == "test"
    dask[complete] 2024.2.1 depends on dask 2024.2.1 (from https://files.pythonhosted.org/packages/ff/d3/f1dcba697c7d7e8470ffa34b31ca1e663d4a2654ef806877f1017ecc5102/dask-2024.2.1-py3-none-any.whl (from https://pypi.org/simple/dask/) (requires-python:>=3.9))

To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict

ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-dependency-conflicts

Let me know if you want the full message from Pip but I think this covers all the relevant information.

Your Environment

Include as many relevant details about the environment in which you experienced the bug:

  • Kedro version used (pip show kedro or kedro -V): current
  • Kedro plugin and kedro plugin version used (pip show kedro-airflow): current
  • Python version used (python -V): 3.9.18
  • Operating system and version: Ubuntu 20.04
@astrojuanlu
Copy link
Member

Thanks for the report @grofte , we'll look into this.

@astrojuanlu
Copy link
Member

@grofte What pip version is this?

@astrojuanlu astrojuanlu added bug Something isn't working datasets labels Mar 6, 2024
@astrojuanlu
Copy link
Member

contribution readme says 3.6+ but PyPi says 3.9+

Time to update the contribution readme too 👍🏽

@grofte
Copy link
Author

grofte commented Mar 6, 2024

pip --version
pip 23.3.1 from /home/mog/anaconda3/envs/PR-kedro/lib/python3.9/site-packages/pip (python 3.9)

You're right tho, pip version does matter. I would suggest that you change pip in the makefile to python -m pip. Unless there's some problem with that that I am unaware of. It doesn't work with python -m pip either tho (and it's the same pip version).

@noklam
Copy link
Contributor

noklam commented Mar 6, 2024

Good shout about the readme, on the other hand we need more information.

https://github.com/kedro-org/kedro/actions/runs/8158985607/job/22302183993
I am checking some CI that we run which use the make install command and it runs successfully for py39.

Most likely pip version problem.

@grofte
Copy link
Author

grofte commented Mar 6, 2024

I don't even understand how Dask depending on Dask and kedro-datasets depending on Dask gives a conflict.

Anyway, I installed dask[complete] on it's own first, commented it out from the pyproject.toml and ran that. It took ages and I ran out of harddrive space. So I cleaned up my harddisk. Then I removed all the non-test optional dependencies in pyproject.toml and ran it with uv pip install -r pyproject.toml --all-extras (and commented out pandas-gbq since google somehow broke uv). That was really fast but uv is all or nothing when it comes to optional dependencies apparently.

Running the rest of the CI worked and make test-no-spark gave me 944 passed, 9 skipped, 21 xfailed, 2 xpassed, 53 errors in 117.09s (0:01:57) which I think is fair. XFAILs are from TestVideoDataset and most of the errors seem to be from AWS botocore.exceptions.ClientError: An error occurred (IllegalLocationConstraintException) when calling the CreateBucket operation: The unspecified location constraint is incompatible for the region specific endpoint this request was sent to..

EDIT: omg you guys, uv was sooooo much faster

@astrojuanlu
Copy link
Member

EDIT: omg you guys, uv was sooooo much faster

Yep 😄 we're using it in our CI already.

About the google dependency, we requested that they yank it, since it's invalid, but they didn't seem to fully understand and they closed the issue already googleapis/python-bigquery#1818

Anyway, I installed dask[complete] on it's own first, commented it out from the pyproject.toml and ran that. It took ages and I ran out of harddrive space. So I cleaned up my harddisk.

Ughhhhh. I'm sorry, hope it was not too painful.

@grofte
Copy link
Author

grofte commented Mar 7, 2024

I did manage to do a draft PR and I probably fucked everything up =D
#598

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working datasets
Projects
Status: No status
Development

No branches or pull requests

3 participants