Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make source_dir discoverable and document how to use this for non-standard project structure #2962

Closed
noklam opened this issue Aug 22, 2023 · 6 comments · Fixed by #3458
Closed
Assignees
Labels
Issue: Feature Request New feature or improvement to existing feature

Comments

@noklam
Copy link
Contributor

noklam commented Aug 22, 2023

Description

Related to #2553

Documentations: Document clearly what is pyproject.toml, How can you change pyproject.toml for non-standard project without src
Current doc: https://docs.kedro.org/en/latest/kedro_project_setup/settings.html#project-metadata

We never document it clearly how to use source_dir. It's a misconception that src is a must for kedro-project.

if (
metadata_dict["kedro_init_version"].split(".")[:2]
!= kedro_version.split(".")[:2]
):
raise ValueError(_version_mismatch_error(metadata_dict["kedro_init_version"]))
source_dir = Path(metadata_dict.get("source_dir", "src")).expanduser()
source_dir = (project_path / source_dir).resolve()
metadata_dict["source_dir"] = source_dir
metadata_dict["config_file"] = pyproject_toml
metadata_dict["project_path"] = project_path
metadata_dict.pop("micropkg", {}) # don't include micro-packaging specs

Context

Users misunderstood that src is a must for Kedro Project, it can be configured already but we never document it clearly.

Possible Implementation

  • Add documentations, i.e. How to work with Kedro if I have my own project structure already
  • Add source_dir=src to pyproject.toml by default to make this discoverable.

Possible Alternatives

@noklam noklam added the Issue: Feature Request New feature or improvement to existing feature label Aug 22, 2023
@noklam noklam changed the title Add source_dir in pyproject.toml as default and document how to use this for non-standard project structure Make source_dir discoverable and document how to use this for non-standard project structure Aug 22, 2023
@astrojuanlu
Copy link
Member

Ambivalent about this. After #2280 (comment) is complete, every Kedro project will be a standard Python library, so we can defer to standard source discovery mechanisms for this without the need of source_dir.

I would go as far as saying that we should deprecate it completely.

@noklam
Copy link
Contributor Author

noklam commented Aug 23, 2023

Interesting, I am not aware of these discussion with pyproject.toml. Questions:

  1. What's the responsibility of the two pyproject.toml
  2. Currently kedro pipeline create or kedro run relies on the source_dir definition - how would it be changed if we are going to deprecate this?

Maybe we also need to update the logic of

def _is_project(project_path: Union[str, Path]) -> bool:

especially we want to enable kedro run anywhere within a project. Having two pyproject.toml would cause issue immediately I think.

If we are going to deprecate it, I think we need to have a clear plan. I assume this won't happened before 0.19, do you think we should at least document it?

@astrojuanlu
Copy link
Member

What's the responsibility of the two pyproject.toml

The root one is the one used by Kedro to store "build config" (using #2434 terminology), the one inside src is the direct replacement for setup.py. But the idea is to merge both, which we'll do for 0.19.0 because it's a breaking change (see #2926).

Currently kedro pipeline create or kedro run relies on the source_dir definition - how would it be changed if we are going to deprecate this?

Yeah that's the big question, we need to see if we can use native mechanisms to "discover" where the source is. My "let's deprecated source_dir" message is dependant on that, we haven't yet even started analyzing the technical feasability of it.

@noklam
Copy link
Contributor Author

noklam commented Sep 1, 2023

Ambivalent about this. After #2280 (comment) is complete, every Kedro project will be a standard Python library, so we can defer to standard source discovery mechanisms for this without the need of source_dir.

Assume #2975 and #2926 is merged, which handle a breaking change for kedro package. So we are not moving away from the "src" layout. You are proposing

  • Deprecated source_dir and use standard python package [Ticket to be created]

What would be the alternative of source_dir? Is it [tool.setuptools.packages.find] https://setuptools.pypa.io/en/latest/userguide/pyproject_config.html?

@astrojuanlu
Copy link
Member

Trying to sort my thoughts here without opening a can of worms.

#2975 means that we would be moving to an src-layout, because at the moment we have something "weird", see #2280 (comment).

Regardless of that change, setuptools (the backend we chose for starters) has always been able to locate the packages because we are relying on setuptools automatic package discovery https://setuptools.pypa.io/en/latest/userguide/package_discovery.html#src-layout

[tool.setuptools.packages.find]
exclude = ["tests"]

After we move to an src layout, this will become

[tool.setuptools.packages.find]
where = ["src"]
namespaces = false

Hence the src is already there.

When I (hastily) wrote my message I thought that maybe there was a standard mechanism for package discovery. But it turns out it was left out of PEP 621 on purpose https://peps.python.org/pep-0621/#specify-files-to-include-when-building

So, I only see two paths:

  1. We keep package_name and source_dir in [tool.kedro], just like we've always done. But if the user changes those, they probably need to change the [tool.setuptools.packages.find] table as well, which feels redundant.
  2. We marry ourselves with setuptools and rely on setuptools package discovery to locate the source code. The disadvantage is that it will make supporting other build backends, like Poetry Poetry Support for Kedro Projects #1722, flit, Hatch, or PDM, difficult.

Maybe the best thing is to keep some redundancy and not do any dramatic changes for now.

@noklam
Copy link
Contributor Author

noklam commented Sep 1, 2023

Sounds reasonable. I may prefer 1 since changing the default is rare. Keeping the option opens for Poetry or other thing is more important.

So we can keep the scope of this ticket, maybe need to document [tool.setuptools.packages.find] depends if we do this before/after 0.19. But it's pretty small change so it doesn't matter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Issue: Feature Request New feature or improvement to existing feature
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

2 participants