Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decide and implement how to include Datasets generated through dataset factories are not included in telemetry counts #566

Open
DimedS opened this issue Feb 23, 2024 · 3 comments

Comments

@DimedS
Copy link
Contributor

DimedS commented Feb 23, 2024

Description

Currently, kedro-telemetry does not account for datasets generated through dataset factories. The existing code snippet used for counting datasets is as follows:

project_statistics_properties["number_of_datasets"] = sum(
    1 for c in catalog.list() if not c.startswith("parameters") and not c.startswith("params:")

This method overlooks datasets created via dataset factories. For further discussion, see here.

@astrojuanlu astrojuanlu changed the title kedro-telemetry: datasets factories and info from pyproject.toml in packaged projects Datasets generated through dataset factories are not included in telemetry counts Feb 24, 2024
@astrojuanlu
Copy link
Member

Opened a separate issue for packaged Kedro projects #567

@noklam noklam changed the title Datasets generated through dataset factories are not included in telemetry counts Decide and implement how to include Datasets generated through dataset factories are not included in telemetry counts Mar 25, 2024
@noklam
Copy link
Contributor

noklam commented Mar 25, 2024

The one who pick up the ticket should decide and implement which solutions work better. It was discussed that it's unclear how we use this information and it's not urgent until we introduced the opt-out flow.

Two alternatives:

  • Push telemetry to after_pipeline_run
  • Resolves the DataCatalog manually

@astrojuanlu
Copy link
Member

Push telemetry to after_pipeline_run

Isn't it enough to do it at after_catalog_created?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: No status
Development

No branches or pull requests

3 participants