Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LR scheduler can result in a division by 0 #1393

Open
carmocca opened this issue May 6, 2024 · 0 comments
Open

LR scheduler can result in a division by 0 #1393

carmocca opened this issue May 6, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@carmocca
Copy link
Contributor

carmocca commented May 6, 2024

If --train.max_steps is equal to --train.lr_warmup_steps then the T_max will result in a division by 0

scheduler2 = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=(max_steps - warmup_steps))

[rank0]: Traceback (most recent call last):
[rank0]:   File "/home/carlos/nightly-env/bin/litgpt", line 8, in <module>
[rank0]:     sys.exit(main())
[rank0]:   File "/home/carlos/lit-parrot/litgpt/__main__.py", line 143, in main
[rank0]:     fn(**kwargs)
[rank0]:   File "/home/carlos/lit-parrot/litgpt/finetune/lora.py", line 143, in setup
[rank0]:     fabric.launch(main, devices, seed, config, data, checkpoint_dir, out_dir, train, eval)
[rank0]:   File "/home/carlos/lightning/src/lightning/fabric/fabric.py", line 866, in launch
[rank0]:     return self._wrap_and_launch(function, self, *args, **kwargs)
[rank0]:   File "/home/carlos/lightning/src/lightning/fabric/fabric.py", line 951, in _wrap_and_launch
[rank0]:     return launcher.launch(to_run, *args, **kwargs)
[rank0]:   File "/home/carlos/lightning/src/lightning/fabric/strategies/launchers/subprocess_script.py", line 107, in launch
[rank0]:     return function(*args, **kwargs)
[rank0]:   File "/home/carlos/lightning/src/lightning/fabric/fabric.py", line 957, in _wrap_with_setup
[rank0]:     return to_run(*args, **kwargs)
[rank0]:   File "/home/carlos/lit-parrot/litgpt/finetune/lora.py", line 196, in main
[rank0]:     fit(
[rank0]:   File "/home/carlos/lit-parrot/litgpt/finetune/lora.py", line 291, in fit
[rank0]:     scheduler.step()
[rank0]:   File "/home/carlos/nightly-env/lib/python3.10/site-packages/torch/optim/lr_scheduler.py", line 838, in step
[rank0]:     scheduler.step(0)
[rank0]:   File "/home/carlos/nightly-env/lib/python3.10/site-packages/torch/optim/lr_scheduler.py", line 187, in step
[rank0]:     values = self._get_closed_form_lr()
[rank0]:   File "/home/carlos/nightly-env/lib/python3.10/site-packages/torch/optim/lr_scheduler.py", line 1029, in _get_closed_form_lr
[rank0]:     return [
[rank0]:   File "/home/carlos/nightly-env/lib/python3.10/site-packages/torch/optim/lr_scheduler.py", line 1032, in <listcomp>
[rank0]:     * (1 + math.cos(math.pi * self.last_epoch / self.T_max))
[rank0]: ZeroDivisionError: float division by zero

Litgpt should validate that this doesn't happen

@carmocca carmocca added the bug Something isn't working label May 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant