Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make all starters use pyproject.toml #2280

Closed
merelcht opened this issue Feb 6, 2023 · 13 comments · Fixed by #2853 or kedro-org/kedro-starters#144
Closed

Make all starters use pyproject.toml #2280

merelcht opened this issue Feb 6, 2023 · 13 comments · Fixed by #2853 or kedro-org/kedro-starters#144
Assignees

Comments

@merelcht
Copy link
Member

merelcht commented Feb 6, 2023

Description

Make all starters use pyproject.toml instead of setup.py
In the process, ensure that there's a split between test/dev requirements and project requirements.

Context

Move along with the rest of the python community

@astrojuanlu
Copy link
Member

astrojuanlu commented Feb 20, 2023

There's one other issue that we could solve along the way.

At the moment, Kedro configuration is stored in a pyproject.toml file at the root of the project (previously .kedro.yml, see d9a7905). This means that pip thinks that the root of the project is installable, which leads to funny results like:

(docs38_test) juan_cano@M-PH9T4K3P3C new-kedro-test % pip install .
Processing /Users/juan_cano/Projects/QuantumBlack Labs/new-kedro-test
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Building wheels for collected packages: setup
  Building wheel for setup (pyproject.toml) ... done
  Created wheel for setup: filename=setup-0.0.0-py3-none-any.whl size=5014 sha256=9312d78314436fc1bae7e17b8ad467a155f1dfd51e5924d9f14118340aff1a31
  Stored in directory: /Users/juan_cano/Library/Caches/pip/wheels/ea/e3/32/cbd94ea2b842364cf52b9a0317ce79a0bacde58719172a5339
Successfully built setup
Installing collected packages: setup
Successfully installed setup-0.0.0

Notice how pip installed a package called setup, version 0.0.0 ⚠️

So if we just replace setup.py by pyproject.toml, we will end up with two identically named files, but only one of them is actually useful for the purposes of Python packaging.

Maybe when the choice of moving away from .kedro.yml was made, this corner case went unnoticed or was not deemed important enough. but I think it can be annoying, and with the advent of a second pyproject.toml, the confusion will be greater.1 In any case, I'm sure that ship has sailed and I will not be advocating for going back to .kedro.yml.

What we can do instead is to merge all those configs into the root pyproject.toml, essentially moving the setup.py one directory up. This would turn the library code into an src-layout. And since src layouts are the only sensible choice2, we would be killing not two, but three birds with one stone.

Footnotes

  1. This, by the way, is the rationale behind flake8 not supporting pyproject.toml for configuration (which doesn't mean I endorse open source maintainers being blunt online)

  2. Forgive my ironic comment - as with everything in Python packaging, there is no absolute truth or strong consensus.

@antonymilne
Copy link
Contributor

+💯 to @astrojuanlu on all the above. I have wanted us to move us to a src-layout for a long time and end our abuse of setup.py/requirements.txt but hadn't got around to posting about it. So thank you Juan! We should definitely be demonstrating best practices in our project template, and the proposed solution would be a huge improvement IMO 👍

Also +💯 for use of footnotes in a github comment and teaching me that there was some reasoning over flake8's refusal to support pyproject.toml 😀 I had never seen the justification that went behind this infamous response before.

@astrojuanlu
Copy link
Member

Thinking about something mentioned today in the "what's new in Kedro?" meeting: how does this interact with kedro package?

https://kedro.readthedocs.io/en/stable/tutorial/package_a_project.html#package-your-project

The resulting package only contains the Python source code of your Kedro pipeline, not any of the conf, data and logs subfolders. This means that you can distribute the project to run elsewhere, such as on a separate computer with different configuration information, dataset and logging locations.

@astrojuanlu
Copy link
Member

Been thinking about this a bit more, and how these changes could interact with the recommendations we have in our documentation.

For example, my proposal above to turn starters into src-layout packages would involve moving src/requirements.txt to requirements.txt, or even removing that requirements file completely (and rely on pyproject.toml instead). This means that

  • If we first change the starters, there's a moment in which the pip install -r src/requirements.txt instructions in the docs don't work.
  • If we change the docs first, there's a moment in which the newer instructions don't work with the old starters.

If we want to avoid this sort of situation, maybe we could create "next-gen" starters and have a deprecation period for the old ones, something like:

> kedro new
...
WARNING: This starter is deprecated, use `kedro new --starter=default-ng` instead. The default will change in Kedro 0.20.0.

@antonymilne
Copy link
Contributor

@astrojuanlu I don't think there will be an issue here. When you do kedro new --starter by default it will look at the kedro-starters repository and use the tag corresponding to the installed kedro version. Hence keeping docs and starters in sync should work the same as any other changes to kedro unless I'm missing something? e.g. say the version we make the change is 0.18.8 (and it's a non-breaking change, so there's no reason it can't be in 0.18.x). Then while in development, updates to the docs will be shown in latest on RTD and will be in main in kedro-starters. Then when kedro 0.18.8 is released, we also release kedro-starters (= tag as 0.18.8) and stable on RTD will also show 0.18.8.

@astrojuanlu
Copy link
Member

Sounds perfect, thanks for clarifying @AntonyMilneQB!

@astrojuanlu

This comment was marked as resolved.

astrojuanlu added a commit that referenced this issue May 8, 2023
See gh-2280.

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
astrojuanlu added a commit that referenced this issue May 17, 2023
See gh-2280.

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
astrojuanlu added a commit that referenced this issue May 23, 2023
See gh-2280.

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
astrojuanlu added a commit that referenced this issue May 23, 2023
Includes consolidated dev requirements in project template.

Fix gh-2280.
Fix gh-2519.

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
astrojuanlu added a commit that referenced this issue May 23, 2023
Includes consolidated dev requirements in project template.

Fix gh-2280.
Fix gh-2519.

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
@astrojuanlu astrojuanlu self-assigned this May 23, 2023
astrojuanlu added a commit that referenced this issue May 26, 2023
Uses `requirements.txt` for dev requirements in project template.

Fix gh-2280.
Fix gh-2519.

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
astrojuanlu added a commit that referenced this issue May 26, 2023
Uses `requirements.txt` for dev requirements in project template.

Fix gh-2280.
Fix gh-2519.

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
astrojuanlu added a commit that referenced this issue May 26, 2023
Uses `requirements.txt` for dev requirements in project template.

Fix gh-2280.
Fix gh-2519.

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
astrojuanlu added a commit that referenced this issue May 26, 2023
Uses `requirements.txt` for dev requirements in project template.

Fix gh-2280.
Fix gh-2519.

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
astrojuanlu added a commit that referenced this issue May 26, 2023
Uses `requirements.txt` for dev requirements in project template.

Fix gh-2280.
Fix gh-2519.

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
astrojuanlu added a commit that referenced this issue May 26, 2023
Uses `requirements.txt` for dev requirements in project template.

Fix gh-2280.
Fix gh-2519.

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
astrojuanlu added a commit that referenced this issue May 26, 2023
Uses `requirements.txt` for dev requirements in project template.

Fix gh-2280.
Fix gh-2519.

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
astrojuanlu added a commit that referenced this issue May 26, 2023
Uses `requirements.txt` for dev requirements in project template.

Fix gh-2280.
Fix gh-2519.

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
astrojuanlu added a commit that referenced this issue May 26, 2023
Uses `requirements.txt` for dev requirements in project template.

Fix gh-2280.
Fix gh-2519.

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
astrojuanlu added a commit that referenced this issue May 26, 2023
Uses `requirements.txt` for dev requirements in project template.

Fix gh-2280.
Fix gh-2519.

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
astrojuanlu added a commit that referenced this issue May 26, 2023
Uses `requirements.txt` for dev requirements in project template.

Fix gh-2280.
Fix gh-2519.

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
astrojuanlu added a commit that referenced this issue May 26, 2023
Uses `requirements.txt` for dev requirements in project template.

Fix gh-2280.
Fix gh-2519.

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
@astrojuanlu

This comment was marked as outdated.

@astrojuanlu

This comment was marked as resolved.

astrojuanlu added a commit that referenced this issue Jun 13, 2023
Uses `requirements.txt` for dev requirements in project template.

Fix gh-2280.
Fix gh-2519.

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
astrojuanlu added a commit that referenced this issue Jun 14, 2023
Uses `requirements.txt` for dev requirements in project template.

Fix gh-2280.
Fix gh-2519.

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
astrojuanlu added a commit that referenced this issue Jun 20, 2023
Uses `requirements.txt` for dev requirements in project template.

Fix gh-2280.
Fix gh-2519.

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
astrojuanlu added a commit that referenced this issue Jun 21, 2023
Uses `requirements.txt` for dev requirements in project template.

Fix gh-2280.
Fix gh-2519.

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
astrojuanlu added a commit that referenced this issue Jul 10, 2023
Uses `requirements.txt` for dev requirements in project template.

Fix gh-2280.
Fix gh-2519.

Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
@astrojuanlu
Copy link
Member

We're almost there! #2853 takes an incremental approach to fix this issue. It needs a decision on one bit that affects backwards incompatibility.

@DimedS DimedS self-assigned this Aug 11, 2023
@astrojuanlu
Copy link
Member

For the default Kedro template, #2853 is there. @DimedS will take care of the starters.

@DimedS
Copy link
Contributor

DimedS commented Aug 25, 2023

Closed by
#2853
kedro-org/kedro-starters#144

The next step can be found at #2926, which will be part of the Kedro 0.19 release. The two pyproject.toml files, currently located in the root and src folders, will be merged in the root folder. Following this, template projects and starters will adopt a pythonic src-layout package structure.

@astrojuanlu, should we create a separate issue for this?

@astrojuanlu
Copy link
Member

@astrojuanlu, should we create a separate issue for this?

Absolutely yes 💯

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment