-
Notifications
You must be signed in to change notification settings - Fork 719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
No clear way to load models #78
Comments
Any way to provide the different eval scripts? :-) Is this related to #73 ? |
I can not find metaseq-api-local.py anywhere in OPT/ |
From #277 We should make model loading "just work". I shouldn't need to pass so many args to get it to find the right checkpoint. |
Types of model checkpointsWe currently have three types of model checkpoints - Here, the name "reshard" is just a convention. It can be any name. For example - "125m-model_part-0-shard0.pt" |
How do we determine the type of model checkpoint?
If both these parameters are 1, we have a singleton model. Both these config values can be determined from the model checkpoint itself. |
From metaseq/distributed/fully_sharded_data_parallel.py |
馃殌 Feature Request
Loading models is a bit of a pain right now. It's done differently in multiple scripts (including our internal eval scripts). Not all ways are compatible with all checkpoint forms.
This typically requires setting a TON of command line args based on what the model checkpoints need (
--model-parallel
,--ddp-backend fully_sharded
,--distributed-port
, etc.). Many of these args can be picked up by just looking at the files.Afterwards we should refactor a few scripts to use this One True Method
The text was updated successfully, but these errors were encountered: