Loading a `.save()`'d model with `revision` and `trust_remote_code` re-downloads code at runtime? #2613

thoughtpolice · 2024-04-25T00:38:51Z

Hello,

Thanks for this library, it's pretty nice! I have a project using it that basically exposes sentence embeddings as a stateless HTTP API. In order to make distribution easy and consistent, it packages supported models directly with the code inside the distributed docker images I build. I use .save() in order to save all the models in a first step, then put them in the docker image in a second step. The intent is that the server will never need to download any new files and can run completely isolated from the public internet if you use the docker image, and all model versions will always be consistent.

https://github.com/thoughtpolice/embedding-server

This is currently using sentence-transformers 2.6.1

One of the models I'm using is nomic-ai/nomic-embed-text-v1, which requires trust_remote_code=True. In order to keep things hermetic, I also provide a revision argument for the HF repository to pin it to a specific revision.

However, it seems like the .save() call does not actually save the runtime configuration code appropriately or something, so every subsequent run of the docker container immediately re-downloads the remote code and runs it.

For example, try the following command:

docker run --rm -it -p 5000:5000 ghcr.io/thoughtpolice/embedding-server:1172b14b6fd594b9b4f2eeb1aa4abfd67f158c71

You'll see this:

Loading models from '/nix/store/k9ffq01198kbzbmyj5n0mz8xjbqq08sy-model-data'...
configuration_hf_nomic_bert.py: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.96k/1.96k [00:00<00:00, 12.1MB/s]
A new version of the following files was downloaded from https://huggingface.co/nomic-ai/nomic-embed-text-v1:
- configuration_hf_nomic_bert.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
modeling_hf_nomic_bert.py: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 52.6k/52.6k [00:00<00:00, 6.79MB/s]
A new version of the following files was downloaded from https://huggingface.co/nomic-ai/nomic-embed-text-v1:
- modeling_hf_nomic_bert.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
<All keys matched successfully>
[2024-04-25 00:31:53 +0000] [1] [INFO] Running on http://0.0.0.0:5000 (CTRL + C to quit)

Looking at the provided directory, neither of these files are there:

$ find /nix/store/k9ffq01198kbzbmyj5n0mz8xjbqq08sy-model-data
/nix/store/k9ffq01198kbzbmyj5n0mz8xjbqq08sy-model-data
/nix/store/k9ffq01198kbzbmyj5n0mz8xjbqq08sy-model-data/all-MiniLM-L6-v2
/nix/store/k9ffq01198kbzbmyj5n0mz8xjbqq08sy-model-data/all-MiniLM-L6-v2/modules.json
/nix/store/k9ffq01198kbzbmyj5n0mz8xjbqq08sy-model-data/all-MiniLM-L6-v2/model.safetensors
/nix/store/k9ffq01198kbzbmyj5n0mz8xjbqq08sy-model-data/all-MiniLM-L6-v2/README.md
/nix/store/k9ffq01198kbzbmyj5n0mz8xjbqq08sy-model-data/all-MiniLM-L6-v2/config.json
/nix/store/k9ffq01198kbzbmyj5n0mz8xjbqq08sy-model-data/all-MiniLM-L6-v2/vocab.txt
/nix/store/k9ffq01198kbzbmyj5n0mz8xjbqq08sy-model-data/all-MiniLM-L6-v2/config_sentence_transformers.json
/nix/store/k9ffq01198kbzbmyj5n0mz8xjbqq08sy-model-data/all-MiniLM-L6-v2/tokenizer.json
/nix/store/k9ffq01198kbzbmyj5n0mz8xjbqq08sy-model-data/all-MiniLM-L6-v2/special_tokens_map.json
/nix/store/k9ffq01198kbzbmyj5n0mz8xjbqq08sy-model-data/all-MiniLM-L6-v2/2_Normalize
/nix/store/k9ffq01198kbzbmyj5n0mz8xjbqq08sy-model-data/all-MiniLM-L6-v2/sentence_bert_config.json
/nix/store/k9ffq01198kbzbmyj5n0mz8xjbqq08sy-model-data/all-MiniLM-L6-v2/1_Pooling
/nix/store/k9ffq01198kbzbmyj5n0mz8xjbqq08sy-model-data/all-MiniLM-L6-v2/1_Pooling/config.json
/nix/store/k9ffq01198kbzbmyj5n0mz8xjbqq08sy-model-data/all-MiniLM-L6-v2/tokenizer_config.json
/nix/store/k9ffq01198kbzbmyj5n0mz8xjbqq08sy-model-data/nomic-embed-text-v1
/nix/store/k9ffq01198kbzbmyj5n0mz8xjbqq08sy-model-data/nomic-embed-text-v1/modules.json
/nix/store/k9ffq01198kbzbmyj5n0mz8xjbqq08sy-model-data/nomic-embed-text-v1/model.safetensors
/nix/store/k9ffq01198kbzbmyj5n0mz8xjbqq08sy-model-data/nomic-embed-text-v1/README.md
/nix/store/k9ffq01198kbzbmyj5n0mz8xjbqq08sy-model-data/nomic-embed-text-v1/config.json
/nix/store/k9ffq01198kbzbmyj5n0mz8xjbqq08sy-model-data/nomic-embed-text-v1/vocab.txt
/nix/store/k9ffq01198kbzbmyj5n0mz8xjbqq08sy-model-data/nomic-embed-text-v1/config_sentence_transformers.json
/nix/store/k9ffq01198kbzbmyj5n0mz8xjbqq08sy-model-data/nomic-embed-text-v1/tokenizer.json
/nix/store/k9ffq01198kbzbmyj5n0mz8xjbqq08sy-model-data/nomic-embed-text-v1/special_tokens_map.json
/nix/store/k9ffq01198kbzbmyj5n0mz8xjbqq08sy-model-data/nomic-embed-text-v1/2_Normalize
/nix/store/k9ffq01198kbzbmyj5n0mz8xjbqq08sy-model-data/nomic-embed-text-v1/sentence_bert_config.json
/nix/store/k9ffq01198kbzbmyj5n0mz8xjbqq08sy-model-data/nomic-embed-text-v1/1_Pooling
/nix/store/k9ffq01198kbzbmyj5n0mz8xjbqq08sy-model-data/nomic-embed-text-v1/1_Pooling/config.json
/nix/store/k9ffq01198kbzbmyj5n0mz8xjbqq08sy-model-data/nomic-embed-text-v1/tokenizer_config.json

So this means the download is incomplete, for some reason, i.e. .save() does not save all the needed files, in particular the two .py files in the upstream HF repo: https://huggingface.co/nomic-ai/nomic-embed-text-v1/tree/main

This is especially misleading because the warning message suggests to use revision to pin the version of the file, but it doesn't work in the case of .save() I guess (maybe there are some call sites where the revision argument was forgotten?)

I actually tried to fix this but to no avail. It seems that just putting the files in the data directory next to the weights isn't enough; it will redownload those files to $HF_HOME, instead, and load them from there, so this fix does not work.

I'm afraid I'm not very good at Python and don't know if this is a bug in sentence-transformers, my code, or the distributed nomic HF model.

The code for saving and loading models is done here. It basically just calls .save or the constructor with the provided model arguments, and the build system calls --save-models-to in order to put them all in a directory. But I could have gotten it wrong: https://github.com/thoughtpolice/embedding-server/blob/main/embedding-server.py#L116-L148

This isn't a huge deal for me right now, but it would be nice if this could work since I did like that the server could run 100% offline. Thanks.

The text was updated successfully, but these errors were encountered:

tomaarsen · 2024-04-25T11:28:56Z

Hello!

I see you've already gone through a lot of steps to figure out what's happening here. Indeed, when saving a model whose modeling/configuration/tokenizers are custom and stored in a remote repository, these will not be saved as "local" in the resulting directory. Instead, the configuration files in this directory just point to the remote repository where these files can be found. This inhibits easy "out of the box" offline mode with these models, indeed.

This is a design decision in transformers (i.e., to not download those files and update the config.json to point to the local files instead), and I don't think it's one that will be changed.

As for revision: I assume that only refers to the repository that we're actively downloading from, but not the repository from which the modeling/config/tokenizer files are pulled from. I agree that the To avoid downloading new versions of the code file, you can pin a revision. is not correct in this example. This warrants an issue in https://github.com/huggingface/transformers/

For your specific case, we can resolve the problem like so:

Download a model of your choice with model name $M$.
Save the model in output directory $O$
Check if it has an auto_map option in config.json or tokenizer_config.json.
If it does:
4a. Split each value on --. If you get 2 splits, the first is the repository and the latter is the path to a class. If you get 1, that's just the path to a class and you should take $M$ as the repository.
4b. use hf_hub_download to download the files and store them in $O$
4c. update the config.json and tokenizer_config.json to remove the repository if it existed, so only the (local) path to the class remains.
If it does not, do nothing.

With that recipe, you'll be able to "fully" download a model. After which, I think you should be able to load it with local_files_only=True to prevent it from making any calls (this is not yet released, so you'd have to use pip install git+https://github.com/UKPLab/sentence-transformers.git@993437ab8514dca149adb4bf90f217224c07aff3 - although I'm not sure you can specify a commit like that).

Also, there's only 2 models that are commonly used that rely on custom modeling files: Jina and Nomic, so you won't encounter this often at all. Hopefully you can apply this recipe to get it working nicely for your use case.
A feature request to get this behaviour automatically when calling save would be best suited for transformers, as all of the saving/loading functionality for custom models exists there, and any changes/improvements should then work (nearly) out of the box in Sentence Transformers.

Tom Aarsen

Due to a design choice in Transformers, remote-code embedded in a model on HF is downloaded at runtime and this cannot be disabled, despite our attempts to download it at build time. A practical result of this is that the built Docker container image needs networking access to download the remote code, which is not ideal. This fix is a workaround to download the remote code at build time, and embed it into the model by updating the `config.json` file appropriately. This should only be needed for Nomic-style models, apparently, since they are some of the only embedding models that use remote code. Many thanks to Tom Aarsen for providing me with the steps to fix this. Relevant issue: UKPLab/sentence-transformers#2613 (comment)

thoughtpolice · 2024-04-25T22:01:17Z

Really appreciate the help and pointers, with your steps I was quickly able to fix this using sed + gron to quickly update the config: thoughtpolice/embedding-server@2c3968b

I'll try to take some time and write up a FR for transformers when I get the chance, so that other people in the future don't need to do this. Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Loading a `.save()`'d model with `revision` and `trust_remote_code` re-downloads code at runtime? #2613

Loading a `.save()`'d model with `revision` and `trust_remote_code` re-downloads code at runtime? #2613

thoughtpolice commented Apr 25, 2024 •

edited

tomaarsen commented Apr 25, 2024

thoughtpolice commented Apr 25, 2024

Loading a .save()'d model with revision and trust_remote_code re-downloads code at runtime? #2613

Loading a .save()'d model with revision and trust_remote_code re-downloads code at runtime? #2613

Comments

thoughtpolice commented Apr 25, 2024 • edited

tomaarsen commented Apr 25, 2024

thoughtpolice commented Apr 25, 2024

Loading a `.save()`'d model with `revision` and `trust_remote_code` re-downloads code at runtime? #2613

Loading a `.save()`'d model with `revision` and `trust_remote_code` re-downloads code at runtime? #2613

thoughtpolice commented Apr 25, 2024 •

edited