Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[optim] Merge the pyi files into py files of optimizer #125452

Closed
wants to merge 27 commits into from

Conversation

david20571015
Copy link
Contributor

Continue the work of #125153

Copy link

pytorch-bot bot commented May 3, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/125452

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 3a87b3b with merge base 3759676 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@david20571015 david20571015 marked this pull request as ready for review May 4, 2024 03:06
@david20571015
Copy link
Contributor Author

@pytorchbot drci

Copy link
Contributor

@janeyx99 janeyx99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Started reviewing, but it’d be easier to review if you could split the lr_scheduler changes from the optims

lr_dict = (
{lr.device: lr} if isinstance(lr, Tensor) and str(lr.device) != "cpu" else None
)
lr = torch.tensor(lr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this necessary? we don’t want to wrap lr into a Tensor normally

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my modification here is incorrect, I will fix this.

@@ -27,10 +29,10 @@ class AdamW(Optimizer):
def __init__(
self,
params: ParamsT,
lr: Union[float, Tensor] = 1e-3,
lr=1e-3,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the removal of types here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverted
I accidently removed it.

@@ -376,6 +377,7 @@ def _multi_tensor_asgd(
torch._foreach_add_(grouped_state_steps, 1)

# intermediate = grad + param * lambd
intermediate: Union[Tuple[Tensor, ...], List[Tensor]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what’s the diff between defining this here vs the first time it’s instantiated?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

without this, mypy will raise an error Incompatible types in assignment (expression has type "tuple[Tensor, ...]", variable has type "list[Tensor]") in line 386, 392 and 411 because intermediate was assigned as list[Tensor] in line 384. I think it is more concise to assign a union type here than add 3 type ignores.

@david20571015
Copy link
Contributor Author

thanks for your review @janeyx99, I splited the changes about lr_scheduler into #125556.

Copy link
Contributor

@janeyx99 janeyx99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost there!

@@ -27,10 +29,10 @@ class Adam(Optimizer):
def __init__(
self,
params: ParamsT,
lr: Union[float, Tensor] = 1e-3,
lr=1e-3,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like there are more places you accidentally removed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed. sorry for this.

lr_dict = (
{lr.device: lr} if isinstance(lr, Tensor) and str(lr.device) != "cpu" else None
)
lr = torch.tensor(lr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

weight_decay: float,
eps: float,
maximize: bool,
capturable: bool, # Needed for consistency.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why delete the comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought this is unnecessary because has_complex didn't have this comment. But I add it back and add a same comment for has_complex.

Tuple[List[List[Tensor]], Indices],
],
grouped_tensors,
grouped_tensors = Optimizer._group_tensors_by_device_and_dtype(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the switch to use the optimizer one here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The optimizer one internally calls the original function and also supports compiling. It doesn't need to perform a lot of type casting because it is ignored in the optimizer one. Is this not suitable here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eh...the one in optimizer is a special case where it gets skipped in compile world..I'd rather not use it here if not necessary yet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I'll roll it back. Thanks for explaining this.

@albanD albanD removed their request for review May 6, 2024 20:10
Comment on lines 34 to 35
eps=1e-8,
weight_decay=0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here too

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

though I see other places where this wasn't necessary--do you know the difference between defining vs not here?

If it's fine to not have a type, I am happy with not having one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, could you please explain the difference to me? I think mypy interprets weight_decay as an int, so should I specify a float here or change 0 to 0.0 (but there won't be any mypy error if using int weight_decay as a float)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does look like there's no error and it will allow an int or a float or anything (it might just infer it as Any). From my lil endeavor, it still feels the best to include the type. For example:

class Foo:
    def __init__(self, arg, kwarg=0):
        self.arg = arg
        self.kwarg = kwarg

a = Foo(1, "l")

print(5 + a.kwarg)

This will not error and will do the type conversion automatically.

But the intention is closer to:

class Foo:
    def __init__(self, arg, kwarg: int = 0):
        self.arg = arg
        self.kwarg = kwarg

a = Foo(1, "l")

print(5 + a.kwarg)

And here, with the type, mypy will error properly at the instantiation of a

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, understood completely. I'll go ahead and add the types.

Comment on lines 34 to 35
eps=1e-8,
weight_decay=1e-2,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here

Tuple[List[List[Tensor]], Indices],
],
grouped_tensors,
grouped_tensors = Optimizer._group_tensors_by_device_and_dtype(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eh...the one in optimizer is a special case where it gets skipped in compile world..I'd rather not use it here if not necessary yet.

@david20571015
Copy link
Contributor Author

@pytorchbot merge

Copy link

pytorch-bot bot commented May 10, 2024

Pull workflow has not been scheduled for the PR yet. It could be because author doesn't have permissions to run those or skip-checks keywords were added to PR/commits, aborting merge. Please get/give approval for the workflows and/or remove skip ci decorators before next merge attempt. If you think this is a mistake, please contact PyTorch Dev Infra.

@@ -318,92 +318,6 @@ def step(self, closure=None):
)


def adamw(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait is this deletion intended?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I merged main branch to this because github says there are some merge conflict, and 0f02e0a moved the function to the end of the file.

@david20571015
Copy link
Contributor Author

@pytorchbot merge

Copy link

pytorch-bot bot commented May 11, 2024

Pull workflow has not been scheduled for the PR yet. It could be because author doesn't have permissions to run those or skip-checks keywords were added to PR/commits, aborting merge. Please get/give approval for the workflows and/or remove skip ci decorators before next merge attempt. If you think this is a mistake, please contact PyTorch Dev Infra.

@janeyx99
Copy link
Contributor

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label May 11, 2024
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 mandatory check(s) failed. The first few are:

Dig deeper by viewing the failures on hud

Details for Dev Infra team Raised by workflow job

Failing merge rule: Core Maintainers

@david20571015
Copy link
Contributor Author

@pytorchbot rebase -b main

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/main. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Rebase failed due to Command git -C /home/runner/work/pytorch/pytorch rebase refs/remotes/origin/main pull/125452/head returned non-zero exit code 1

Rebasing (1/28)
dropping b3b18e394d1915b6b885d77344b7cc99070ba6d4 fix: type of `TensorListList` -- patch contents already upstream
Rebasing (2/28)
Auto-merging torch/optim/adadelta.py
CONFLICT (content): Merge conflict in torch/optim/adadelta.py
error: could not apply 4f707fdcad9... merge pyi into py
hint: Resolve all conflicts manually, mark them as resolved with
hint: "git add/rm <conflicted_files>", then run "git rebase --continue".
hint: You can instead skip this commit: run "git rebase --skip".
hint: To abort and get back to the state before "git rebase", run "git rebase --abort".
Could not apply 4f707fdcad9... merge pyi into py

Raised by https://github.com/pytorch/pytorch/actions/runs/9040977794

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 3 mandatory check(s) failed. The first few are:

Dig deeper by viewing the failures on hud

Details for Dev Infra team Raised by workflow job

Failing merge rule: Core Maintainers

@david20571015
Copy link
Contributor Author

david20571015 commented May 11, 2024

Sorry @janeyx99, I mistakenly thought the failures might have been fixed, so I rebased onto the main branch. However, after investigating the error message, I found that it was caused by my code, and I fixed it in 165fed5. Could you please approve the CI again?

@david20571015 david20571015 changed the title Merge the pyi files into py files of optimizer [optim] Merge the pyi files into py files of optimizer May 13, 2024
@david20571015
Copy link
Contributor Author

Should I keep the parameter name d_p_list? None of the affected functions are exposed to external users.

image

@janeyx99
Copy link
Contributor

janeyx99 commented May 13, 2024

Should I keep the parameter name d_p_list? None of the affected functions are exposed to external users.

Yes, we should keep the name. sgd is publicly exposed and the argument is sadly not positional only.

@david20571015
Copy link
Contributor Author

david20571015 commented May 13, 2024

Should I keep the parameter name d_p_list? None of the affected functions are exposed to external users.

Yes, we should keep the name. sgd is publicly exposed and the argument is sadly not positional only.

Okay, I will fix it.
But wasn't sgd removed in torch/optim/init.py? Also, its only usage seems to be in SGD.step() (search for sgd( using vscode) and only torch/optim/sgd.py use the name d_p_list (search for d_p_list).

@janeyx99
Copy link
Contributor

The removal in optimizer __init__.py is the torch.optim.sgd module, not the function, which is offered so people can manage their own state. (I have yet to figure out why we do the dels there but that is not relevant for this discussion.)

@david20571015
Copy link
Contributor Author

Should I keep the parameter name d_p_list? None of the affected functions are exposed to external users.

Yes, we should keep the name. sgd is publicly exposed and the argument is sadly not positional only.

done.

@janeyx99
Copy link
Contributor

looks like there are merge conflicts now :/

@david20571015
Copy link
Contributor Author

looks like there are merge conflicts now :/

fixed

@janeyx99
Copy link
Contributor

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 mandatory check(s) failed. The first few are:

Dig deeper by viewing the failures on hud

Details for Dev Infra team Raised by workflow job

Failing merge rule: Core Maintainers

@janeyx99
Copy link
Contributor

janeyx99 commented May 14, 2024

Please run lintrunner locally to ensure lint passes for the next commit (and usually before pushing). Getting lintrunner locally is pretty simple: just

pip install lintrunner
lintrunner init
lintrunner -a

@david20571015
Copy link
Contributor Author

Please run lintrunner locally to ensure lint passes for the next commit (and usually before pushing). Getting lintrunner locally is pretty simple: just

pip install lintrunner
lintrunner init
lintrunner -a

ok! fixed

@david20571015
Copy link
Contributor Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants