Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to attach debugger to any code within DVC pipeline #5048

Open
shcheklein opened this issue Nov 30, 2023 · 6 comments
Open

Ability to attach debugger to any code within DVC pipeline #5048

shcheklein opened this issue Nov 30, 2023 · 6 comments
Labels
A: integration Area: DVC integration layer priority-p1 Regular product backlog 📦 product Needs product input or is being actively worked on research

Comments

@shcheklein
Copy link
Member

With a complicated DVC pipeline, with dynamic parametrized dependencies it's not easy to get an exact command that is needed to run a specific stage under debugger outside of DVC.
On the other hand, users compare our experiments with a regular Notebook or even basic scripts workflow. They don't know anymore how to pause and explore a data frame.

We need to research and find on the DVC side or on the extension side a way to mitigate this.

@shcheklein shcheklein added priority-p1 Regular product backlog research 📦 product Needs product input or is being actively worked on A: integration Area: DVC integration layer labels Nov 30, 2023
@mattseddon
Copy link
Member

This is not automated at all but it is a solution that works:

  1. Install debugpy into the virtual environment.
  2. Add breakpoints to the script you want to debug.
  3. Add the required code to your script (see "Additional code")
  4. Add attach configuration to launch.json (see "launch.json entry")
  5. Run experiment
  6. Hit F5
  7. Profit.

Additional code:

import debugpy

debugpy.listen(("localhost", 6666))

debugpy.wait_for_client()

launch.json entry:

{
    "version": "0.2.0",
    "configurations": [
        {
            "name": "Debug experiment",
            "type": "python",
            "request": "attach",
            "justMyCode": false,
            "subProcess": true,
            "port": 6666
        }
    ]
}

Demo:

Screen.Recording.2024-01-16.at.12.45.22.pm.mov

@mattseddon
Copy link
Member

I can make a tutorial and we can add it to the README/dvc.org if we think that would be useful.

@dberenbaum
Copy link
Contributor

Discussed with @skshetry that we might want to clarify the scope of this. Is it strictly about using IDE debugger tools? It might be worth clarifying this when publishing anything about it. It could give the wrong impression that adding breakpoints to your code won't work when running in DVC, and I don't think we should assume that the typical data scientist is familiar with debugging tools.

@ronikurtberg
Copy link

As an advanced (I hope) DVC user, It will be awesome to have the ability to run DVC as an "app", like a flask server or Spring application. I think the latter is the exact dream.

Today, we have a huge run_experiment.py script that is dealing with different dvc.yaml in the same repo, and needs to handle many use cases of params.yaml. Of course, everything is terminal based and the suggestion above is not something that can really scale and introduced to DS teams.

Maybe I'm going into another issue that we have, but once we can have the DVC "app" ability, maybe we can have annotations for StepInput and StepOutput to standartize the contracts between steps (make it more object orient and not file orient). I guess this can be the next level, since today is a heavy .yaml engineering.

@shcheklein
Copy link
Member Author

Side note: we need to see if the same can be achieved in Pycharm (I expect it to be very similar, but it's been a while since I was touching it)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: integration Area: DVC integration layer priority-p1 Regular product backlog 📦 product Needs product input or is being actively worked on research
Projects
None yet
Development

No branches or pull requests

4 participants