Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revise documentation in the "Kedro for notebook users" section #2845

Closed
stichbury opened this issue Jul 26, 2023 · 6 comments
Closed

Revise documentation in the "Kedro for notebook users" section #2845

stichbury opened this issue Jul 26, 2023 · 6 comments
Assignees
Labels
Component: Documentation 📄 Issue/PR for markdown and API documentation Component: Example code Example code creation/publication

Comments

@stichbury
Copy link
Contributor

stichbury commented Jul 26, 2023

Child of #2799

Description

Looking into popular content (and content that could be popular if it was any good) I have identified this section on notebook/Kedro usage as problematic.

Context

Lots of potential to help notebook users cross the rubicon to use Kedro

Possible Implementation

Currently we have two pages but in my view they're the wrong way around and there's a big chunk missing on the conversion from Notebook -> Kedro and/or phased introduction of Kedro support to notebooks. I think we should go with this ordering:

  • Continue using a Notebook for almost everything but use the data catalog to help with data access/sharing a project
  • Migrate from using a Notebook to a full Kedro project (but do it in stages so the previous bullet is phase 1, then look at parameters, then functions->nodes etc).
  • Clearly marked as a different angle: introduce notebooks into your Kedro project for exploratory analysis

Looking at the pages in more detail:

Page 1: Phased support to use the Kedro DataCatalog as a data registry (terminology TBC)

  • What problem does this solve? (No hard-coded data locations for shared projects, managed data access)
  • How to use the DataCatalog within your existing notebook
    • Present an example notebook with data and show how to remove hard-coded data locations and data loading/saving in favour of Kedro
  • How to use the standalone-datacatalog starter
    • Basic pandas-iris example

Page 2: How to convert your existing notebook to a Kedro project

Holy grail example. TBD. I need to pair with someone on this to work out how to write it up (it has potential to be a blog post too). I have a separate ticket for this work #2855.

Page 3: How to use Kedro and a notebook side-by-side

Tidy up what we have in this page https://docs.kedro.org/en/stable/notebooks_and_ipython/kedro_and_notebooks.html to illustrate how to use a notebook to explore side-by-side with your Kedro project.

  • Remove the complexity in the early part of the page under "A custom Kedro kernel" and just summarise what you get:

    • catalog (type DataCatalog): Data Catalog instance that contains all defined datasets; this is a shortcut for context.catalog
    • context (type KedroContext): Kedro project context that provides access to Kedro's library components
    • pipelines (type Dict[str, Pipeline]): Pipelines defined in your pipeline registry
    • session (type KedroSession): Kedro session that orchestrates a pipeline run
  • Iris dataset example: Shows with the pandas-iris starter
    how to add a notebook with kedro jupyter notebook

    • Illustrate how to use catalog, context, pipelines and session
  • %reload_kedro line magic

  • %run_viz line magic

  • How to convert functions from Jupyter Notebooks into Kedro nodes

  • Work with managed services

  • Connect an IPython shell to a Kedro project kernel

  • Create a custom Jupyter kernel that automatically loads the extension and launches JupyterLab / QtConsole

Page 4: Jupyter notebook/Kedro FAQs

A page that covers some the commonly asked questions that we get


How does this look? I'm interested in those that wizard or see questions coming in, or generally have a vision on how we should present ourselves when it comes to Notebook support: @astrojuanlu @merelcht @noklam @deepyaman

@astrojuanlu
Copy link
Member

Holy grail example. TBD. I need to pair with someone on this to work out how to write it up (it has potential to be a blog post too)

Let's do this together, I've done this process a dozen times or more already. It's far from perfect but there are several issues tracking how to make it easier already.

How does this look? I'm interested in those that wizard or see questions coming in, or generally have a vision on how we should present ourselves when it comes to Notebook support

So far I haven't seen many direct requests. But by casually walking around the office I see lots of DS using Kedro on Jupyter. And we should pay attention to Databricks as well.

@noklam
Copy link
Contributor

noklam commented Jul 28, 2023

Let's do this together, I've done this process a dozen times or more already. It's far from perfect but there are several issues tracking how to make it easier already.

Do you see there is anything we can build to simplify the process? @astrojuanlu

So far I haven't seen many direct requests. But by casually walking around the office I see lots of DS using Kedro on Jupyter. And we should pay attention to Databricks as well.

Same as my experience, I would add a comment that many are not using it in the most efficient way. "I don't know which nodes I need to re-run so I just re-run the whole pipeline". Although kedro run support many different options, it is not well used I wonder if we can show some example in notebook section.

@noklam
Copy link
Contributor

noklam commented Jul 28, 2023

Page 1: Phased support to use the Kedro DataCatalog as a data registry (terminology TBC)
I would love to see this example, Amanda is working on the notebook series so potentially there are many things that we can reuse?

Page 2: How to convert your existing notebook to a Kedro project
Holy grail example. TBD. I need to pair with someone on this to work out how to write it up (it has potential to be a blog post too)

Page 3: How to use Kedro and a notebook side-by-side
Agree with many on the points, I shared the same experience when I update the notebook docs last time.

catalog (type DataCatalog): [Data Catalog](https://github.com/kedro-org/kedro/data/data_catalog.md) instance that contains all defined datasets; this is a shortcut for context.catalog
context (type KedroContext): Kedro project context that provides access to Kedro's library components
pipelines (type Dict[str, Pipeline]): Pipelines defined in your [pipeline registry](https://github.com/kedro-org/kedro/nodes_and_pipelines/run_a_pipeline.md#run-a-pipeline-by-name)
session (type KedroSession): [Kedro session](https://github.com/kedro-org/kedro/kedro_project_setup/session.md) that orchestrates a pipeline run

I may reorder session -> catalog -> pipeline -> context, since session and catalog will be used most of the time.

  • %reload_kedro is an important one
  • %run_viz I may put it close to "work with manage instance" because you only need this magic command in these situations.
  • remove the QtConsole mentions? I don't know why we need to mention this and I don't know anyone is doing this.

@stichbury
Copy link
Contributor Author

Fab, thanks for your help on this @noklam and @astrojuanlu. I think I can start on Pages 1 & 3 but will leave page 2 until your return @astrojuanlu (so I have made a separate ticket for that work #2855)

@astrojuanlu
Copy link
Member

Let's do this together, I've done this process a dozen times or more already. It's far from perfect but there are several issues tracking how to make it easier already.

Do you see there is anything we can build to simplify the process? @astrojuanlu

I opened a few over time, see for example #2583, #2593, #2700, #2777, #2819.

@stichbury
Copy link
Contributor Author

All done and released in 0.18.4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Documentation 📄 Issue/PR for markdown and API documentation Component: Example code Example code creation/publication
Projects
Status: Done
Development

No branches or pull requests

3 participants