feat: Jan can load large model with multiple gguf files #2898

hahuyhoang411 · 2024-05-14T03:03:48Z

Problem
Jan is only support 1 gguf model file at a time

Success Criteria
We can help users to merge gguf files into 1 and load the model for them

Additional context
Approach
https://www.reddit.com/r/LocalLLaMA/comments/1cf6n18/how_to_use_merge_70b_split_model_ggufpart1of2/

SwiftIllusion · 2024-05-21T06:51:28Z

Would also appreciate this as I have ran into the same limitation when trying to use the larger split models here - https://huggingface.co/MaziyarPanahi/WizardLM-2-8x22B-GGUF#load-sharded-model - where it was specifically mentioned to load them as shared and not combine the files.
Another reddit thread mentioning this https://www.reddit.com/r/LocalLLaMA/comments/1c2dfv6/loading_multipart_gguf_files_in/ referenced a resolution of this for text-generation-webui oobabooga/text-generation-webui@e158299 (just for context and the steps to make it compatible here naturally may be different).

hahuyhoang411 added the type: feature request A new feature label May 14, 2024

Van-QA added this to the v. Ochazuke milestone May 14, 2024

0xSage added the P1: important Important feature / fix label May 22, 2024

Van-QA assigned vansangpfiev May 22, 2024

Van-QA added the roadmap: Cortex Cortex, Cortex llama cpp, core extensions label May 31, 2024

Provide feedback