model has multi checkpoint file can't be loaded to train #934

apachemycat · 2024-05-04T15:56:08Z

ls /models/meta-Llama-3-8B
LICENSE config.json
model-00001-of-00004.safetensors
model-00004-of-00004.safetensors tokenizer.json
README.md generation_config.json
model-00002-of-00004.safetensors
model.safetensors.index.json tokenizer_config.json
USE_POLICY.md ggml-model-f16.gguf
model-00003-of-00004.safetensors
special_tokens_map.json

checkpointer:
component: torchtune.utils.FullModelMetaCheckpointer
checkpoint_dir: /models/meta-Llama-3-8B
checkpoint_files: [
model-00001-of-00004.safetensors,
model-00002-of-00004.safetensors,
model-00003-of-00004.safetensors,
model-00004-of-00004.safetensors
]
recipe_checkpoint: null
output_dir: /tmp/Meta-Llama-3-8B/
model_type: LLAMA3
resume_from_checkpoint: False

ValueError: Currently we only support reading from a single TorchTune checkpoint file. Got 4 files instead.

kartikayk · 2024-05-04T19:42:31Z

Thanks for opening this issue!

safetensors is a format from HF which usually contains HF formatted checkpoints. To make this work you, need to update the checkpointer component in the config from FullModelMetaCheckpointer to FullModelHFCheckpointer. This will load the checkpoint AND do the necessary conversions needed to correctly interpret the weight tensors. You can read this deepdive to better understand checkpoint formats and how torchtune deals with them.

kartikayk · 2024-05-16T20:57:33Z

@apachemycat let me know if this is still an issue and I'd be happy to reopen this! If not, I'm closing this for now

kartikayk closed this as completed May 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

model has multi checkpoint file can't be loaded to train #934

model has multi checkpoint file can't be loaded to train #934

apachemycat commented May 4, 2024

kartikayk commented May 4, 2024

kartikayk commented May 16, 2024

model has multi checkpoint file can't be loaded to train #934

model has multi checkpoint file can't be loaded to train #934

Comments

apachemycat commented May 4, 2024

kartikayk commented May 4, 2024

kartikayk commented May 16, 2024