Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] added PyTorch Profiler #2315

Open
wants to merge 14 commits into
base: master
Choose a base branch
from

Conversation

Ishan-Kumar2
Copy link
Contributor

Fixes #1917

Description:
Added a minimum implementation of PyTorch profiler as a handler with engine. A lot of other features can be added, please let me know if you have any suggestions. Also I haven't added tests yet, if the initial code looks good, I can get started on those :)

Check list:

  • New tests are added (if a new feature is added)
  • New doc strings: description and/or example code are in RST format
  • Documentation is updated (if required)

@github-actions github-actions bot added the module: handlers Core Handlers module label Nov 9, 2021
@sdesrozis
Copy link
Contributor

@Ishan-Kumar2 Thank you ! It looks great, I will play with the handler asap !

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Dec 21, 2021

@sdesrozis @Ishan-Kumar2 can we move on with this PR ?

@Ishan-Kumar2
Copy link
Contributor Author

Hi @vfdev-5, sorry for the delay. I'll have a look at it today :)

@sdesrozis
Copy link
Contributor

@Ishan-Kumar2 Sorry for the delay on my side. I will use this handler in some own codes to see whether it works and give a feedback asap.

@Ishan-Kumar2
Copy link
Contributor Author

@sdesrozis I tested the code on my local example, with this additional code

# Define a PT Profiler
pt_profiler = PyTorchProfiler(on_trace_ready="tensorboard", output_path="./logs/train")
pt_profiler.attach(trainer)

it produces 3 json files (non empty), but I am unable to load them on tensorboard using

tensorboard --logdir=./logs

it shows "No dashboards are active for the current data set." I think as long as json are being produced it should be correct since that's done by PyTorch Profiler, not changed by me. There must be some issue with my opening it.
Not sure what is causing this, if you have a colab example could you please share the snippet you use to install ignite from this PR. I'll check running on colab too.

@sdesrozis
Copy link
Contributor

sdesrozis commented Dec 29, 2021

@sdesrozis I tested the code on my local example, with this additional code

# Define a PT Profiler
pt_profiler = PyTorchProfiler(on_trace_ready="tensorboard", output_path="./logs/train")
pt_profiler.attach(trainer)

it produces 3 json files (non empty), but I am unable to load them on tensorboard using

tensorboard --logdir=./logs

it shows "No dashboards are active for the current data set." I think as long as json are being produced it should be correct since that's done by PyTorch Profiler, not changed by me. There must be some issue with my opening it. Not sure what is causing this, if you have a colab example could you please share the snippet you use to install ignite from this PR. I'll check running on colab too.

It works for me but don't forget to install the dedicated tensorboard pluggin

pip install torch_tb_profiler

I left a few comments.

@sdesrozis
Copy link
Contributor

@Ishan-Kumar2 it looks good. I think the next step now is about the tests.

@Ishan-Kumar2
Copy link
Contributor Author

@sdesrozis great, will start working on the tests.

@Ishan-Kumar2
Copy link
Contributor Author

@sdesrozis, Added some tests for the profiler. I have not added checks for the output of the profiler since I believe that is already done by PyTorch.
I am new to writing tests from scratch so please let me know if I need to tests something else. Thanks!

return dummy_trainer


def test_get_results(tmp_path):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should test firstly when the profiler is not attached to an engine. Secondly, you should test the presence and the absence of the expected keys.

pt_profiler.get_results(sort_key="cpu_times")


def test_write_results(tmp_path):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should test the files generated on more than one epoch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added this

}

def _profiler_create(self):
self._profiler = torch.profiler.profile(
Copy link
Contributor

@sdesrozis sdesrozis Jan 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should check the PyTorch version and provide a clear error message if version < 1.8 ?

And this check would be associated to a specific test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't get how I should do this. In case the PyTorch version is <1.8 then I want all the tests to not run right?
So should I add a @pytest.mark.skipif in all the tests?

@sdesrozis
Copy link
Contributor

sdesrozis commented Jan 9, 2022

@sdesrozis, Added some tests for the profiler. I have not added checks for the output of the profiler since I believe that is already done by PyTorch.
I am new to writing tests from scratch so please let me know if I need to tests something else. Thanks!

You can get inspiration from these tests tests/ignite/contrib/handlers/test_tensorboard_logger.py.

The tests need to be improved to check what is going on with the different backends (tpu, nccl, etc).

@Ishan-Kumar2
Copy link
Contributor Author

@sdesrozis I have incorporated most of your suggestions. I am still working on the distributed tests will add those soon too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: handlers Core Handlers module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Introduce an handler to use new profiler
3 participants