Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FSDP Strategy checkpoint loading #19802

Open
xin-w8023 opened this issue Apr 23, 2024 · 0 comments
Open

FSDP Strategy checkpoint loading #19802

xin-w8023 opened this issue Apr 23, 2024 · 0 comments
Labels
feature Is an improvement or enhancement needs triage Waiting to be triaged by maintainers

Comments

@xin-w8023
Copy link

xin-w8023 commented Apr 23, 2024

Description & Motivation

def load_checkpoint(self, checkpoint_path: _PATH) -> Dict[str, Any]:
# broadcast the path from rank 0 to ensure all the states are loaded from a common path
path = Path(self.broadcast(checkpoint_path))
from torch.distributed.fsdp import FullyShardedDataParallel as FSDP
assert self.model is not None
assert self.lightning_module is not None
if _is_sharded_checkpoint(path):

Here we can only check a local path whether sharded or not. Any remote file path like hdfs://, the function will eventually raise an error.

Pitch

No response

Alternatives

No response

Additional context

No response

cc @Borda

@xin-w8023 xin-w8023 added feature Is an improvement or enhancement needs triage Waiting to be triaged by maintainers labels Apr 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Is an improvement or enhancement needs triage Waiting to be triaged by maintainers
Projects
None yet
Development

No branches or pull requests

1 participant