PyTorch automatic inf/nan detection, collecting statistics #1446

albertz · 2023-10-25T11:35:01Z

autograd.detect_anomaly detects inf/nan in the backward pass.

I want to have the same in the forward pass. With the possibility to whitelist a few special operations, modules or code blocks, e.g. masking attention energies to -inf, etc.

Maybe via a post forward hook for every module?

Or using sys.settrace to install some function which would inspect the locals?

Also see https://discuss.pytorch.org/t/detect-inf-nan-in-forward-pass/190514.

E.g. I have a model which gets NAN loss (so forward pass) (directly first step) and I want to know where it happens. (This is with AMP float16, so maybe pytorch/pytorch#40497 is related. But this is only about AMP, not so much about the issue here on adding such detection.)

The same mechanism can then also be used to collect statistics on activations (mean, min, max, std, var, median, L2, whatever).

Note that the same for parameters is much easier, as we can simply iterate over them.

The text was updated successfully, but these errors were encountered:

albertz · 2024-01-03T16:36:36Z

Also see #1487 about collecting statistics in general.

albertz mentioned this issue Jan 3, 2024

PyTorch collect model statistics #1487

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PyTorch automatic inf/nan detection, collecting statistics #1446

PyTorch automatic inf/nan detection, collecting statistics #1446

albertz commented Oct 25, 2023 •

edited

albertz commented Jan 3, 2024

PyTorch automatic inf/nan detection, collecting statistics #1446

PyTorch automatic inf/nan detection, collecting statistics #1446

Comments

albertz commented Oct 25, 2023 • edited

albertz commented Jan 3, 2024

albertz commented Oct 25, 2023 •

edited