How to cast model layernorms to fp32 when using precision="bf16-true"? #19775

eric-tc-wong · 2024-04-14T03:34:24Z

eric-tc-wong
Apr 14, 2024

What is the proper way to cast certain layers in a model wrapped in LightningModule to float when using Trainer(precision='bf16-true')?
I am working with transformer models and the LayerNorms need to be in float. I thought this is common, but I find it hard to find documentation or examples.
I try casting them during setup, but they don't hold during fit_loop. I also tried casting in configure_optimizers, but then I get error on backward.:

  File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/plugins/precision/precision.py", line 72, in backward
    model.backward(tensor, *args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/lightning/pytorch/core/module.py", line 1090, in backward
    loss.backward(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/_tensor.py", line 522, in backward
    torch.autograd.backward(
  File "/usr/local/lib/python3.10/dist-packages/torch/autograd/__init__.py", line 267, in backward
    _engine_run_backward(
  File "/usr/local/lib/python3.10/dist-packages/torch/autograd/graph.py", line 681, in _engine_run_backward
    return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: Expected BFloat16, got Float

Thanks

eric-tc-wong · 2024-04-14T19:24:28Z

eric-tc-wong
Apr 14, 2024
Author

My current solution is to override the convert_module function in the HalfPrecision plugin. Although, I still see a large drop in the performance of the model compared to bf16-mixed. Please let me know if this is not the proper solution.

from lightning.pytorch.plugins.precision.half import HalfPrecision
from typing_extensions import override
import torch.nn as nn
from torch.nn import Module

class HalfPrecisionFloatLN(HalfPrecision):
    def __init__(self, precision):
        super().__init__(precision)
        
    @override
    def convert_module(self, module: Module) -> Module:
        module.to(dtype=self._desired_input_dtype)
        for name, subm in module.named_modules():
            if isinstance(subm, nn.LayerNorm):
                subm.float()
        return module

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to cast model layernorms to fp32 when using precision="bf16-true"? #19775

{{title}}

Replies: 1 comment

{{title}}

Select a reply

How to cast model layernorms to fp32 when using precision="bf16-true"? #19775

eric-tc-wong Apr 14, 2024

Replies: 1 comment

eric-tc-wong Apr 14, 2024 Author

eric-tc-wong
Apr 14, 2024

eric-tc-wong
Apr 14, 2024
Author