Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Designate a better place for development requirements in Kedro projects #2519

Open
astrojuanlu opened this issue Apr 14, 2023 · 9 comments
Open
Assignees
Labels
Issue: Feature Request New feature or improvement to existing feature

Comments

@astrojuanlu
Copy link
Member

astrojuanlu commented Apr 14, 2023

Description

At the moment, we are telling users in our documentation to place linters in src/requirements.txt:

You can install `black`, `flake8`, and `isort` by adding the following lines to your project's `src/requirements.txt`

But this is weird, becuase they are actually development tools, and they shouldn't be part of src/requirements.txt, which will be packaged along with the project.

Instead, common practice is to either have a dev-requirements.txt, or an "optional dependencies" table in the project metadata (see below).

Context

(Comes from #2517 (comment))

Possible Implementation

We'd need

  • a change in starters, and
  • a change in documentation

Possible Alternatives

Instead of dev-requirements.txt we could have a dev table in the [project.optional-requirements] section of pyproject.toml (extras_require in old setup.py files). This would be blocked in #2280.

@astrojuanlu astrojuanlu added the Issue: Feature Request New feature or improvement to existing feature label Apr 14, 2023
@astrojuanlu astrojuanlu changed the title Designate a better place for development requirements Designate a better place for development requirements in Kedro projects Apr 14, 2023
@antonymilne
Copy link
Contributor

For some more background context, this originally came up as a suggestion in #1077. As you can see there, the discussion eventually led to the deprecation of the commands like kedro lint which used the dependencies in question, and so nothing was done about it.

I'm not sure what the latest thinking is on (a) deprecation of these commands and (b) Amanda's opt-in utility approach, but I guess the solution here should take those into account. But yes, I fully agree with you that these don't belong in requirements.txt and shouldn't be packaged with the project, and moving them to dev-requirements.txt would be much better.

Also relevant: #844.

@antonymilne
Copy link
Contributor

antonymilne commented May 4, 2023

Instead, common practice is to either have a dev-requirements.txt.

What is the other option you were going to mention here? Is it dev table in [project.optional-requirements]?

@astrojuanlu
Copy link
Member Author

Oh yes, left the sentenced unfinished then wrote it down at the end of the comment:

Instead of dev-requirements.txt we could have a dev table in the [project.optional-requirements] section of pyproject.toml (extras_require in old setup.py files). This would be blocked in #2280.

astrojuanlu added a commit that referenced this issue May 8, 2023
Fix gh-2519.

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
astrojuanlu added a commit that referenced this issue May 17, 2023
Fix gh-2519.

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
@astrojuanlu
Copy link
Member Author

This is in progress in gh-2569.

@astrojuanlu astrojuanlu self-assigned this May 22, 2023
astrojuanlu added a commit that referenced this issue May 23, 2023
Fix gh-2519.

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
astrojuanlu added a commit that referenced this issue May 23, 2023
Includes consolidated dev requirements in project template.

Fix gh-2280.
Fix gh-2519.

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
astrojuanlu added a commit that referenced this issue May 23, 2023
Includes consolidated dev requirements in project template.

Fix gh-2280.
Fix gh-2519.

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
astrojuanlu added a commit that referenced this issue May 26, 2023
Uses `requirements.txt` for dev requirements in project template.

Fix gh-2280.
Fix gh-2519.

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
astrojuanlu added a commit that referenced this issue May 26, 2023
Uses `requirements.txt` for dev requirements in project template.

Fix gh-2280.
Fix gh-2519.

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
astrojuanlu added a commit that referenced this issue May 26, 2023
Uses `requirements.txt` for dev requirements in project template.

Fix gh-2280.
Fix gh-2519.

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
astrojuanlu added a commit that referenced this issue May 26, 2023
Uses `requirements.txt` for dev requirements in project template.

Fix gh-2280.
Fix gh-2519.

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
astrojuanlu added a commit that referenced this issue May 26, 2023
Uses `requirements.txt` for dev requirements in project template.

Fix gh-2280.
Fix gh-2519.

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
astrojuanlu added a commit that referenced this issue May 26, 2023
Uses `requirements.txt` for dev requirements in project template.

Fix gh-2280.
Fix gh-2519.

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
astrojuanlu added a commit that referenced this issue May 26, 2023
Uses `requirements.txt` for dev requirements in project template.

Fix gh-2280.
Fix gh-2519.

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
astrojuanlu added a commit that referenced this issue May 26, 2023
Uses `requirements.txt` for dev requirements in project template.

Fix gh-2280.
Fix gh-2519.

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
astrojuanlu added a commit that referenced this issue May 26, 2023
Uses `requirements.txt` for dev requirements in project template.

Fix gh-2280.
Fix gh-2519.

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
astrojuanlu added a commit that referenced this issue May 26, 2023
Uses `requirements.txt` for dev requirements in project template.

Fix gh-2280.
Fix gh-2519.

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
astrojuanlu added a commit that referenced this issue May 26, 2023
Uses `requirements.txt` for dev requirements in project template.

Fix gh-2280.
Fix gh-2519.

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
astrojuanlu added a commit that referenced this issue May 26, 2023
Uses `requirements.txt` for dev requirements in project template.

Fix gh-2280.
Fix gh-2519.

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
astrojuanlu added a commit that referenced this issue Jun 13, 2023
Uses `requirements.txt` for dev requirements in project template.

Fix gh-2280.
Fix gh-2519.

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
astrojuanlu added a commit that referenced this issue Jun 14, 2023
Uses `requirements.txt` for dev requirements in project template.

Fix gh-2280.
Fix gh-2519.

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
astrojuanlu added a commit that referenced this issue Jun 20, 2023
Uses `requirements.txt` for dev requirements in project template.

Fix gh-2280.
Fix gh-2519.

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
astrojuanlu added a commit that referenced this issue Jun 21, 2023
Uses `requirements.txt` for dev requirements in project template.

Fix gh-2280.
Fix gh-2519.

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
astrojuanlu added a commit that referenced this issue Jul 10, 2023
Uses `requirements.txt` for dev requirements in project template.

Fix gh-2280.
Fix gh-2519.

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
@astrojuanlu
Copy link
Member Author

We went back and forth a bit with this, and in the end we closed my initial attempt #2569 and didn't make a final decision.

This is the current contents of requirements.txt for the project template (untouched by the current PR #2853):

black~=22.0
flake8>=3.7.9, <5.0
ipython>=7.31.1, <8.0; python_version < '3.8'
ipython~=8.10; python_version >= '3.8'
isort~=5.0
jupyter~=1.0
jupyterlab_server>=2.11.1, <2.16.0
jupyterlab~=3.0, <3.6.0
kedro~={{ cookiecutter.kedro_version }}
kedro-telemetry~=0.2.0
nbstripout~=0.4
pytest-cov~=3.0
pytest-mock>=1.7.1, <2.0
pytest~=7.2

@astrojuanlu
Copy link
Member Author

Coming back to this after finding that it is not possible to install Airflow with the official instructions and the dependencies of a Kedro project side by side:

❯ pip install "apache-airflow==$AIRFLOW_VERSION" --constraint "$CONSTRAINT_URL" -r src/requirements.txt                        (airflow310-alt) 
Ignoring ipython: markers 'python_version < "3.8"' don't match your environment
Requirement already satisfied: apache-airflow==2.7.2 in /Users/juan_cano/.micromamba/envs/airflow310-alt/lib/python3.10/site-packages (2.7.2)
ERROR: Cannot install black~=22.0 because these package versions have conflicting dependencies.

The conflict is caused by:
    The user requested black~=22.0
    The user requested (constraint) black==23.9.1

To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict

ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-dependency-conflicts

However, the problem is much less severe in develop, since the linters are gone:

ipython>=7.31.1, <8.0; python_version < '3.8'
ipython~=8.10; python_version >= '3.8'
jupyter~=1.0
jupyterlab_server>=2.11.1, <2.16.0
jupyterlab~=3.0, <3.6.0
kedro~={{ cookiecutter.kedro_version }}
kedro-telemetry~=0.2.0
# Pin problematic traitlets release - https://github.com/jupyter/notebook/issues/7048
traitlets<5.10.0

The only remaining non-mandatory dependencies there are Jupyter (#2276) and kedro-telemetry (kedro-org/kedro-plugins#375).

@merelcht
Copy link
Member

@astrojuanlu If it's just the jupyter and kedro-telemetry dependencies, is it really worth adding a separate dev-requirements.txt file to the template? Or can these be moved to the optional dependencies in pyproject.toml?

@astrojuanlu
Copy link
Member Author

@merelcht I think effectively jupyterlab (or notebook) and kedro-telemetry are optional dependencies, and ideally they should indeed be in the [project.optional-dependencies] table of pyproject.toml, as you say.

However, this would mean that users would need to do pip install -e .[extra] to install those extra dependencies. Almost every Python user knows how to do pip install -r requirements.txt, but pip install -e . is a bit more arcane and has only been working unconditionally1 as of very recently (PEP 662 and deprecations of direct setup.py invocations happened 2 years ago, or even less). We anticipated this in my first iteration of #2569 and decided to keep the requirements.txt workflow for now. But that's what keeps us from making progress on this task.

Long story short: the moment we take kedro-telemetry and jupyter out of requirements.txt, some users will get confused that they're not installed. And on top of that, there's a the "political" problem of how users get kedro-telemetry installed.

Footnotes

  1. Unconditionally = For every PEP 517 compatible build backend and for every PEP 621 compatible pyproject.toml. I consider everything before that "the dark ages" of Python packaging.

@astrojuanlu
Copy link
Member Author

This is blocked on kedro-org/kedro-plugins#375.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Issue: Feature Request New feature or improvement to existing feature
Projects
Status: To Do
Development

No branches or pull requests

3 participants