Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PyTorch automatic inf/nan detection, collecting statistics #1446

Open
albertz opened this issue Oct 25, 2023 · 1 comment
Open

PyTorch automatic inf/nan detection, collecting statistics #1446

albertz opened this issue Oct 25, 2023 · 1 comment

Comments

@albertz
Copy link
Member

albertz commented Oct 25, 2023

autograd.detect_anomaly detects inf/nan in the backward pass.

I want to have the same in the forward pass. With the possibility to whitelist a few special operations, modules or code blocks, e.g. masking attention energies to -inf, etc.

Maybe via a post forward hook for every module?

Or using sys.settrace to install some function which would inspect the locals?

Also see https://discuss.pytorch.org/t/detect-inf-nan-in-forward-pass/190514.

E.g. I have a model which gets NAN loss (so forward pass) (directly first step) and I want to know where it happens. (This is with AMP float16, so maybe pytorch/pytorch#40497 is related. But this is only about AMP, not so much about the issue here on adding such detection.)

The same mechanism can then also be used to collect statistics on activations (mean, min, max, std, var, median, L2, whatever).

Note that the same for parameters is much easier, as we can simply iterate over them.

@albertz
Copy link
Member Author

albertz commented Jan 3, 2024

Also see #1487 about collecting statistics in general.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant