New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Blockwise quantization only supports 16/32-bit floats, but got torch.uint8 ( bnb.nf4
quantisation is not working)
#1325
Comments
Hey @Anindyadeep Based on the error stacktrace, it looks like you are trying to load and quantize an already quantized model. Have you done anything to the weights or it's just the weights that were downloaded and converted by LitGPT and nothing more? |
Hey thanks for the reply, So the process was:
|
Ok, but what the dtype of the HuggingFace model? |
Hi, I see, so here's the thing, I initially converted the hf weights to int8 using litgpt cli and then I converted the same weights to int4 (which is now not possible), and that is the possible reason. Which means everytime, I need to start with a base litgpt weights (with fp32) or the raw HF weights right? Let me try that, if it works, I will let you know and then we can close this issue :) |
Correct. In order to use quantization you just need weights in a standard precision ( |
I see, got it, let me try this out, and will keep posted in this thread, thanks for the headsup |
Hi so, I tried the whole process once again. Here is what my Llama 2 weights folder contains after I typed this command: litgpt convert to_litgpt --checkpoint_dir ./models/Llama-2-7b-chat-hf/ The above run was successful and this is what models/Llama-2-7b-chat-hf/
├── LICENSE.txt
├── README.md
├── USE_POLICY.md
├── config.json
├── generation_config.json
├── lit_model.pth
├── model-00001-of-00002.safetensors
├── model-00002-of-00002.safetensors
├── model.safetensors.index.json
├── model_config.yaml
├── pytorch_model-00001-of-00002.bin
├── pytorch_model-00002-of-00002.bin
├── pytorch_model.bin.index.json
├── special_tokens_map.json
├── tokenizer.json
├── tokenizer.model
└── tokenizer_config.json Now I typed this command: litgpt generate base --quantize bnb.nf4 --checkpoint_dir models/Llama-2-7b-chat-hf --precision bf16-true --max_new_tokens 256 And got this error: Loading model 'models/Llama-2-7b-chat-hf/lit_model.pth' with {'name': 'Llama-2-7b-chat-hf', 'hf_config': {'name': 'Llama-2-7b-chat-hf', 'org': 'meta-llama'}, 'scale_embeddings': False, 'block_size': 4096, 'vocab_size': 32000, 'padding_multiple': 64, 'padded_vocab_size': 32000, 'n_layer': 32, 'n_head': 32, 'head_size': 128, 'n_embd': 4096, 'rotary_percentage': 1.0, 'parallel_residual': False, 'bias': False, 'lm_head_bias': False, 'n_query_groups': 32, 'shared_attention_norm': False, 'norm_class_name': 'RMSNorm', 'norm_eps': 1e-05, 'mlp_class_name': 'LLaMAMLP', 'gelu_approximate': 'none', 'intermediate_size': 11008, 'rope_condense_ratio': 1, 'rope_base': 10000, 'n_expert': 0, 'n_expert_per_token': 0, 'rope_n_elem': 128}
Time to instantiate model: 0.24 seconds.
Traceback (most recent call last):
File "/home/anindya/workspace/benchmarks/bench_lightning/venv/bin/litgpt", line 8, in <module>
sys.exit(main())
File "/home/anindya/workspace/benchmarks/bench_lightning/venv/lib/python3.10/site-packages/litgpt/__main__.py", line 143, in main
fn(**kwargs)
File "/home/anindya/workspace/benchmarks/bench_lightning/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/anindya/workspace/benchmarks/bench_lightning/venv/lib/python3.10/site-packages/litgpt/generate/base.py", line 169, in main
model = fabric.setup_module(model)
File "/home/anindya/workspace/benchmarks/bench_lightning/venv/lib/python3.10/site-packages/lightning/fabric/fabric.py", line 310, in setup_module
module = self._move_model_to_device(model=module, optimizers=[])
File "/home/anindya/workspace/benchmarks/bench_lightning/venv/lib/python3.10/site-packages/lightning/fabric/fabric.py", line 997, in _move_model_to_device
model = self.to_device(model)
File "/home/anindya/workspace/benchmarks/bench_lightning/venv/lib/python3.10/site-packages/lightning/fabric/fabric.py", line 528, in to_device
self._strategy.module_to_device(obj)
File "/home/anindya/workspace/benchmarks/bench_lightning/venv/lib/python3.10/site-packages/lightning/fabric/strategies/single_device.py", line 59, in module_to_device
module.to(self.root_device)
File "/home/anindya/workspace/benchmarks/bench_lightning/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1152, in to
return self._apply(convert)
File "/home/anindya/workspace/benchmarks/bench_lightning/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 802, in _apply
module._apply(fn)
File "/home/anindya/workspace/benchmarks/bench_lightning/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 825, in _apply
param_applied = fn(param)
File "/home/anindya/workspace/benchmarks/bench_lightning/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1150, in convert
return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
File "/home/anindya/workspace/benchmarks/bench_lightning/venv/lib/python3.10/site-packages/bitsandbytes/nn/modules.py", line 324, in to
return self._quantize(device)
File "/home/anindya/workspace/benchmarks/bench_lightning/venv/lib/python3.10/site-packages/bitsandbytes/nn/modules.py", line 289, in _quantize
w_4bit, quant_state = bnb.functional.quantize_4bit(
File "/home/anindya/workspace/benchmarks/bench_lightning/venv/lib/python3.10/site-packages/bitsandbytes/functional.py", line 1234, in quantize_4bit
raise ValueError(f"Blockwise quantization only supports 16/32-bit floats, but got {A.dtype}")
ValueError: Blockwise quantization only supports 16/32-bit floats, but got torch.uint8
|
But have you checked the dtype of the original weights (.safetensors) in |
you mean the weights for litgpt model or the hf model? Also as far as the hf models concerned, those are the actual raw weights of llama 2 so it is float16 |
Here is the HF config {
"_name_or_path": "meta-llama/Llama-2-7b-chat-hf",
"architectures": [
"LlamaForCausalLM"
],
"bos_token_id": 1,
"eos_token_id": 2,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 11008,
"max_position_embeddings": 4096,
"model_type": "llama",
"num_attention_heads": 32,
"num_hidden_layers": 32,
"num_key_value_heads": 32,
"pretraining_tp": 1,
"rms_norm_eps": 1e-05,
"rope_scaling": null,
"tie_word_embeddings": false,
"torch_dtype": "float16",
"transformers_version": "4.32.0.dev0",
"use_cache": true,
"vocab_size": 32000
} And here is the lit model_config.yaml file bias: false
block_size: 4096
gelu_approximate: none
head_size: 128
hf_config:
name: Llama-2-7b-chat-hf
org: meta-llama
intermediate_size: 11008
lm_head_bias: false
mlp_class_name: LLaMAMLP
n_embd: 4096
n_expert: 0
n_expert_per_token: 0
n_head: 32
n_layer: 32
n_query_groups: 32
name: Llama-2-7b-chat-hf
norm_class_name: RMSNorm
norm_eps: 1.0e-05
padded_vocab_size: 32000
padding_multiple: 64
parallel_residual: false
rope_base: 10000
rope_condense_ratio: 1
rotary_percentage: 1.0
scale_embeddings: false
shared_attention_norm: false
vocab_size: 32000
|
The same is happening for mistral too |
I still don't have access neither to LlaMA 2 nor to some Mistral models. export repo_id=microsoft/phi-2
litgpt download --repo_id $repo_id --convert_checkpoint false
litgpt convert to_litgpt --checkpoint_dir checkpoints/$repo_id
litgpt generate base --quantize bnb.nf4 --checkpoint_dir checkpoints/$repo_id --precision bf16-true --max_new_tokens 256 |
Okay then let me try the same with Mistral, but this time with download |
I see but I did the same thing for Mistral v0.1. Here are the set of commands: export repo_id=mistralai/Mistral-7B-Instruct-v0.1
litgpt download --repo_id $repo_id --convert_checkpoint false --access_token hf_...
litgpt convert to_litgpt --checkpoint_dir checkpoints/$repo_id
litgpt generate base --quantize bnb.nf4 --checkpoint_dir checkpoints/$repo_id --precision bf16-true --max_new_tokens 256 And here are the logs: (venv) anindya@prem-ai-a100-fin-01:~/workspace/benchmarks$ litgpt download --repo_id $repo_id --convert_checkpoint false --access_token hf_...
(venv) anindya@prem-ai-a100-fin-01:~/workspace/benchmarks$ export repo_id=mistralai/Mistral-7B-Instruct-v0.1
litgpt download --repo_id $repo_id --convert_checkpoint false --access_token hf_...
litgpt convert to_litgpt --checkpoint_dir checkpoints/$repo_id
litgpt generate base --quantize bnb.nf4 --checkpoint_dir checkpoints/$repo_id --precision bf16-true --max_new_tokens 256
Setting HF_HUB_ENABLE_HF_TRANSFER=1
config.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 571/571 [00:00<00:00, 3.37MB/s]
generation_config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 116/116 [00:00<00:00, 682kB/s]
pytorch_model-00001-of-00002.bin: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9.94G/9.94G [01:01<00:00, 162MB/s]
pytorch_model-00002-of-00002.bin: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5.06G/5.06G [00:34<00:00, 148MB/s]
pytorch_model.bin.index.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 23.9k/23.9k [00:00<00:00, 72.8MB/s]
tokenizer.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.80M/1.80M [00:00<00:00, 3.94MB/s]
tokenizer.model: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 493k/493k [00:00<00:00, 6.39MB/s]
tokenizer_config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.47k/1.47k [00:00<00:00, 10.1MB/s]
Processing checkpoints/mistralai/Mistral-7B-Instruct-v0.1/pytorch_model-00001-of-00002.bin
Loading 'model.embed_tokens.weight' into RAM
Loading 'model.layers.0.self_attn.o_proj.weight' into RAM
Loading 'model.layers.0.mlp.gate_proj.weight' into RAM
Loading 'model.layers.0.mlp.up_proj.weight' into RAM
Loading 'model.layers.0.mlp.down_proj.weight' into RAM
Loading 'model.layers.0.input_layernorm.weight' into RAM
Loading 'model.layers.0.post_attention_layernorm.weight' into RAM
Loading 'model.layers.1.self_attn.o_proj.weight' into RAM
Loading 'model.layers.1.mlp.gate_proj.weight' into RAM
Loading 'model.layers.1.mlp.up_proj.weight' into RAM
Loading 'model.layers.1.mlp.down_proj.weight' into RAM
Loading 'model.layers.1.input_layernorm.weight' into RAM
Loading 'model.layers.1.post_attention_layernorm.weight' into RAM
Loading 'model.layers.2.self_attn.o_proj.weight' into RAM
Loading 'model.layers.2.mlp.gate_proj.weight' into RAM
Loading 'model.layers.2.mlp.up_proj.weight' into RAM
Loading 'model.layers.2.mlp.down_proj.weight' into RAM
Loading 'model.layers.2.input_layernorm.weight' into RAM
Loading 'model.layers.2.post_attention_layernorm.weight' into RAM
Loading 'model.layers.3.self_attn.o_proj.weight' into RAM
Loading 'model.layers.3.mlp.gate_proj.weight' into RAM
Loading 'model.layers.3.mlp.up_proj.weight' into RAM
Loading 'model.layers.3.mlp.down_proj.weight' into RAM
Loading 'model.layers.3.input_layernorm.weight' into RAM
Loading 'model.layers.3.post_attention_layernorm.weight' into RAM
Loading 'model.layers.4.self_attn.o_proj.weight' into RAM
Loading 'model.layers.4.mlp.gate_proj.weight' into RAM
Loading 'model.layers.4.mlp.up_proj.weight' into RAM
Loading 'model.layers.4.mlp.down_proj.weight' into RAM
Loading 'model.layers.4.input_layernorm.weight' into RAM
Loading 'model.layers.4.post_attention_layernorm.weight' into RAM
Loading 'model.layers.5.self_attn.o_proj.weight' into RAM
Loading 'model.layers.5.mlp.gate_proj.weight' into RAM
Loading 'model.layers.5.mlp.up_proj.weight' into RAM
Loading 'model.layers.5.mlp.down_proj.weight' into RAM
Loading 'model.layers.5.input_layernorm.weight' into RAM
Loading 'model.layers.5.post_attention_layernorm.weight' into RAM
Loading 'model.layers.6.self_attn.o_proj.weight' into RAM
Loading 'model.layers.6.mlp.gate_proj.weight' into RAM
Loading 'model.layers.6.mlp.up_proj.weight' into RAM
Loading 'model.layers.6.mlp.down_proj.weight' into RAM
Loading 'model.layers.6.input_layernorm.weight' into RAM
Loading 'model.layers.6.post_attention_layernorm.weight' into RAM
Loading 'model.layers.7.self_attn.o_proj.weight' into RAM
Loading 'model.layers.7.mlp.gate_proj.weight' into RAM
Loading 'model.layers.7.mlp.up_proj.weight' into RAM
Loading 'model.layers.7.mlp.down_proj.weight' into RAM
Loading 'model.layers.7.input_layernorm.weight' into RAM
Loading 'model.layers.7.post_attention_layernorm.weight' into RAM
Loading 'model.layers.8.self_attn.o_proj.weight' into RAM
Loading 'model.layers.8.mlp.gate_proj.weight' into RAM
Loading 'model.layers.8.mlp.up_proj.weight' into RAM
Loading 'model.layers.8.mlp.down_proj.weight' into RAM
Loading 'model.layers.8.input_layernorm.weight' into RAM
Loading 'model.layers.8.post_attention_layernorm.weight' into RAM
Loading 'model.layers.9.self_attn.o_proj.weight' into RAM
Loading 'model.layers.9.mlp.gate_proj.weight' into RAM
Loading 'model.layers.9.mlp.up_proj.weight' into RAM
Loading 'model.layers.9.mlp.down_proj.weight' into RAM
Loading 'model.layers.9.input_layernorm.weight' into RAM
Loading 'model.layers.9.post_attention_layernorm.weight' into RAM
Loading 'model.layers.10.self_attn.o_proj.weight' into RAM
Loading 'model.layers.10.mlp.gate_proj.weight' into RAM
Loading 'model.layers.10.mlp.up_proj.weight' into RAM
Loading 'model.layers.10.mlp.down_proj.weight' into RAM
Loading 'model.layers.10.input_layernorm.weight' into RAM
Loading 'model.layers.10.post_attention_layernorm.weight' into RAM
Loading 'model.layers.11.self_attn.o_proj.weight' into RAM
Loading 'model.layers.11.mlp.gate_proj.weight' into RAM
Loading 'model.layers.11.mlp.up_proj.weight' into RAM
Loading 'model.layers.11.mlp.down_proj.weight' into RAM
Loading 'model.layers.11.input_layernorm.weight' into RAM
Loading 'model.layers.11.post_attention_layernorm.weight' into RAM
Loading 'model.layers.12.self_attn.o_proj.weight' into RAM
Loading 'model.layers.12.mlp.gate_proj.weight' into RAM
Loading 'model.layers.12.mlp.up_proj.weight' into RAM
Loading 'model.layers.12.mlp.down_proj.weight' into RAM
Loading 'model.layers.12.input_layernorm.weight' into RAM
Loading 'model.layers.12.post_attention_layernorm.weight' into RAM
Loading 'model.layers.13.self_attn.o_proj.weight' into RAM
Loading 'model.layers.13.mlp.gate_proj.weight' into RAM
Loading 'model.layers.13.mlp.up_proj.weight' into RAM
Loading 'model.layers.13.mlp.down_proj.weight' into RAM
Loading 'model.layers.13.input_layernorm.weight' into RAM
Loading 'model.layers.13.post_attention_layernorm.weight' into RAM
Loading 'model.layers.14.self_attn.o_proj.weight' into RAM
Loading 'model.layers.14.mlp.gate_proj.weight' into RAM
Loading 'model.layers.14.mlp.up_proj.weight' into RAM
Loading 'model.layers.14.mlp.down_proj.weight' into RAM
Loading 'model.layers.14.input_layernorm.weight' into RAM
Loading 'model.layers.14.post_attention_layernorm.weight' into RAM
Loading 'model.layers.15.self_attn.o_proj.weight' into RAM
Loading 'model.layers.15.mlp.gate_proj.weight' into RAM
Loading 'model.layers.15.mlp.up_proj.weight' into RAM
Loading 'model.layers.15.mlp.down_proj.weight' into RAM
Loading 'model.layers.15.input_layernorm.weight' into RAM
Loading 'model.layers.15.post_attention_layernorm.weight' into RAM
Loading 'model.layers.16.self_attn.o_proj.weight' into RAM
Loading 'model.layers.16.mlp.gate_proj.weight' into RAM
Loading 'model.layers.16.mlp.up_proj.weight' into RAM
Loading 'model.layers.16.mlp.down_proj.weight' into RAM
Loading 'model.layers.16.input_layernorm.weight' into RAM
Loading 'model.layers.16.post_attention_layernorm.weight' into RAM
Loading 'model.layers.17.self_attn.o_proj.weight' into RAM
Loading 'model.layers.17.mlp.gate_proj.weight' into RAM
Loading 'model.layers.17.mlp.up_proj.weight' into RAM
Loading 'model.layers.17.mlp.down_proj.weight' into RAM
Loading 'model.layers.17.input_layernorm.weight' into RAM
Loading 'model.layers.17.post_attention_layernorm.weight' into RAM
Loading 'model.layers.18.self_attn.o_proj.weight' into RAM
Loading 'model.layers.18.mlp.gate_proj.weight' into RAM
Loading 'model.layers.18.mlp.up_proj.weight' into RAM
Loading 'model.layers.18.mlp.down_proj.weight' into RAM
Loading 'model.layers.18.input_layernorm.weight' into RAM
Loading 'model.layers.18.post_attention_layernorm.weight' into RAM
Loading 'model.layers.19.self_attn.o_proj.weight' into RAM
Loading 'model.layers.19.mlp.gate_proj.weight' into RAM
Loading 'model.layers.19.mlp.up_proj.weight' into RAM
Loading 'model.layers.19.mlp.down_proj.weight' into RAM
Loading 'model.layers.19.input_layernorm.weight' into RAM
Loading 'model.layers.19.post_attention_layernorm.weight' into RAM
Loading 'model.layers.20.self_attn.o_proj.weight' into RAM
Loading 'model.layers.20.mlp.gate_proj.weight' into RAM
Loading 'model.layers.20.mlp.up_proj.weight' into RAM
Loading 'model.layers.20.mlp.down_proj.weight' into RAM
Loading 'model.layers.20.input_layernorm.weight' into RAM
Loading 'model.layers.20.post_attention_layernorm.weight' into RAM
Loading 'model.layers.21.self_attn.o_proj.weight' into RAM
Loading 'model.layers.21.mlp.gate_proj.weight' into RAM
Loading 'model.layers.21.mlp.up_proj.weight' into RAM
Loading 'model.layers.21.mlp.down_proj.weight' into RAM
Loading 'model.layers.21.input_layernorm.weight' into RAM
Loading 'model.layers.21.post_attention_layernorm.weight' into RAM
Loading 'model.layers.22.self_attn.o_proj.weight' into RAM
Loading 'layer 0 q' into RAM
Loading 'layer 0 k' into RAM
Loading 'layer 0 v' into RAM
Loading 'layer 1 q' into RAM
Loading 'layer 1 k' into RAM
Loading 'layer 1 v' into RAM
Loading 'layer 2 q' into RAM
Loading 'layer 2 k' into RAM
Loading 'layer 2 v' into RAM
Loading 'layer 3 q' into RAM
Loading 'layer 3 k' into RAM
Loading 'layer 3 v' into RAM
Loading 'layer 4 q' into RAM
Loading 'layer 4 k' into RAM
Loading 'layer 4 v' into RAM
Loading 'layer 5 q' into RAM
Loading 'layer 5 k' into RAM
Loading 'layer 5 v' into RAM
Loading 'layer 6 q' into RAM
Loading 'layer 6 k' into RAM
Loading 'layer 6 v' into RAM
Loading 'layer 7 q' into RAM
Loading 'layer 7 k' into RAM
Loading 'layer 7 v' into RAM
Loading 'layer 8 q' into RAM
Loading 'layer 8 k' into RAM
Loading 'layer 8 v' into RAM
Loading 'layer 9 q' into RAM
Loading 'layer 9 k' into RAM
Loading 'layer 9 v' into RAM
Loading 'layer 10 q' into RAM
Loading 'layer 10 k' into RAM
Loading 'layer 10 v' into RAM
Loading 'layer 11 q' into RAM
Loading 'layer 11 k' into RAM
Loading 'layer 11 v' into RAM
Loading 'layer 12 q' into RAM
Loading 'layer 12 k' into RAM
Loading 'layer 12 v' into RAM
Loading 'layer 13 q' into RAM
Loading 'layer 13 k' into RAM
Loading 'layer 13 v' into RAM
Loading 'layer 14 q' into RAM
Loading 'layer 14 k' into RAM
Loading 'layer 14 v' into RAM
Loading 'layer 15 q' into RAM
Loading 'layer 15 k' into RAM
Loading 'layer 15 v' into RAM
Loading 'layer 16 q' into RAM
Loading 'layer 16 k' into RAM
Loading 'layer 16 v' into RAM
Loading 'layer 17 q' into RAM
Loading 'layer 17 k' into RAM
Loading 'layer 17 v' into RAM
Loading 'layer 18 q' into RAM
Loading 'layer 18 k' into RAM
Loading 'layer 18 v' into RAM
Loading 'layer 19 q' into RAM
Loading 'layer 19 k' into RAM
Loading 'layer 19 v' into RAM
Loading 'layer 20 q' into RAM
Loading 'layer 20 k' into RAM
Loading 'layer 20 v' into RAM
Loading 'layer 21 q' into RAM
Loading 'layer 21 k' into RAM
Loading 'layer 21 v' into RAM
Loading 'layer 22 q' into RAM
Loading 'layer 22 k' into RAM
Loading 'layer 22 v' into RAM
Processing checkpoints/mistralai/Mistral-7B-Instruct-v0.1/pytorch_model-00002-of-00002.bin
Loading 'model.layers.22.mlp.gate_proj.weight' into RAM
Loading 'model.layers.22.mlp.up_proj.weight' into RAM
Loading 'model.layers.22.mlp.down_proj.weight' into RAM
Loading 'model.layers.22.input_layernorm.weight' into RAM
Loading 'model.layers.22.post_attention_layernorm.weight' into RAM
Loading 'model.layers.23.self_attn.o_proj.weight' into RAM
Loading 'model.layers.23.mlp.gate_proj.weight' into RAM
Loading 'model.layers.23.mlp.up_proj.weight' into RAM
Loading 'model.layers.23.mlp.down_proj.weight' into RAM
Loading 'model.layers.23.input_layernorm.weight' into RAM
Loading 'model.layers.23.post_attention_layernorm.weight' into RAM
Loading 'model.layers.24.self_attn.o_proj.weight' into RAM
Loading 'model.layers.24.mlp.gate_proj.weight' into RAM
Loading 'model.layers.24.mlp.up_proj.weight' into RAM
Loading 'model.layers.24.mlp.down_proj.weight' into RAM
Loading 'model.layers.24.input_layernorm.weight' into RAM
Loading 'model.layers.24.post_attention_layernorm.weight' into RAM
Loading 'model.layers.25.self_attn.o_proj.weight' into RAM
Loading 'model.layers.25.mlp.gate_proj.weight' into RAM
Loading 'model.layers.25.mlp.up_proj.weight' into RAM
Loading 'model.layers.25.mlp.down_proj.weight' into RAM
Loading 'model.layers.25.input_layernorm.weight' into RAM
Loading 'model.layers.25.post_attention_layernorm.weight' into RAM
Loading 'model.layers.26.self_attn.o_proj.weight' into RAM
Loading 'model.layers.26.mlp.gate_proj.weight' into RAM
Loading 'model.layers.26.mlp.up_proj.weight' into RAM
Loading 'model.layers.26.mlp.down_proj.weight' into RAM
Loading 'model.layers.26.input_layernorm.weight' into RAM
Loading 'model.layers.26.post_attention_layernorm.weight' into RAM
Loading 'model.layers.27.self_attn.o_proj.weight' into RAM
Loading 'model.layers.27.mlp.gate_proj.weight' into RAM
Loading 'model.layers.27.mlp.up_proj.weight' into RAM
Loading 'model.layers.27.mlp.down_proj.weight' into RAM
Loading 'model.layers.27.input_layernorm.weight' into RAM
Loading 'model.layers.27.post_attention_layernorm.weight' into RAM
Loading 'model.layers.28.self_attn.o_proj.weight' into RAM
Loading 'model.layers.28.mlp.gate_proj.weight' into RAM
Loading 'model.layers.28.mlp.up_proj.weight' into RAM
Loading 'model.layers.28.mlp.down_proj.weight' into RAM
Loading 'model.layers.28.input_layernorm.weight' into RAM
Loading 'model.layers.28.post_attention_layernorm.weight' into RAM
Loading 'model.layers.29.self_attn.o_proj.weight' into RAM
Loading 'model.layers.29.mlp.gate_proj.weight' into RAM
Loading 'model.layers.29.mlp.up_proj.weight' into RAM
Loading 'model.layers.29.mlp.down_proj.weight' into RAM
Loading 'model.layers.29.input_layernorm.weight' into RAM
Loading 'model.layers.29.post_attention_layernorm.weight' into RAM
Loading 'model.layers.30.self_attn.o_proj.weight' into RAM
Loading 'model.layers.30.mlp.gate_proj.weight' into RAM
Loading 'model.layers.30.mlp.up_proj.weight' into RAM
Loading 'model.layers.30.mlp.down_proj.weight' into RAM
Loading 'model.layers.30.input_layernorm.weight' into RAM
Loading 'model.layers.30.post_attention_layernorm.weight' into RAM
Loading 'model.layers.31.self_attn.o_proj.weight' into RAM
Loading 'model.layers.31.mlp.gate_proj.weight' into RAM
Loading 'model.layers.31.mlp.up_proj.weight' into RAM
Loading 'model.layers.31.mlp.down_proj.weight' into RAM
Loading 'model.layers.31.input_layernorm.weight' into RAM
Loading 'model.layers.31.post_attention_layernorm.weight' into RAM
Loading 'model.norm.weight' into RAM
Loading 'lm_head.weight' into RAM
Loading 'layer 23 q' into RAM
Loading 'layer 23 k' into RAM
Loading 'layer 23 v' into RAM
Loading 'layer 24 q' into RAM
Loading 'layer 24 k' into RAM
Loading 'layer 24 v' into RAM
Loading 'layer 25 q' into RAM
Loading 'layer 25 k' into RAM
Loading 'layer 25 v' into RAM
Loading 'layer 26 q' into RAM
Loading 'layer 26 k' into RAM
Loading 'layer 26 v' into RAM
Loading 'layer 27 q' into RAM
Loading 'layer 27 k' into RAM
Loading 'layer 27 v' into RAM
Loading 'layer 28 q' into RAM
Loading 'layer 28 k' into RAM
Loading 'layer 28 v' into RAM
Loading 'layer 29 q' into RAM
Loading 'layer 29 k' into RAM
Loading 'layer 29 v' into RAM
Loading 'layer 30 q' into RAM
Loading 'layer 30 k' into RAM
Loading 'layer 30 v' into RAM
Loading 'layer 31 q' into RAM
Loading 'layer 31 k' into RAM
Loading 'layer 31 v' into RAM
Saving converted checkpoint to checkpoints/mistralai/Mistral-7B-Instruct-v0.1
Loading model 'checkpoints/mistralai/Mistral-7B-Instruct-v0.1/lit_model.pth' with {'name': 'Mistral-7B-Instruct-v0.1', 'hf_config': {'name': 'Mistral-7B-Instruct-v0.1', 'org': 'mistralai'}, 'scale_embeddings': False, 'block_size': 4096, 'vocab_size': 32000, 'padding_multiple': 512, 'padded_vocab_size': 32000, 'n_layer': 32, 'n_head': 32, 'head_size': 128, 'n_embd': 4096, 'rotary_percentage': 1.0, 'parallel_residual': False, 'bias': False, 'lm_head_bias': False, 'n_query_groups': 8, 'shared_attention_norm': False, 'norm_class_name': 'RMSNorm', 'norm_eps': 1e-05, 'mlp_class_name': 'LLaMAMLP', 'gelu_approximate': 'none', 'intermediate_size': 14336, 'rope_condense_ratio': 1, 'rope_base': 10000, 'n_expert': 0, 'n_expert_per_token': 0, 'rope_n_elem': 128}
Time to instantiate model: 0.26 seconds.
Traceback (most recent call last):
File "/home/anindya/workspace/benchmarks/bench_lightning/venv/bin/litgpt", line 8, in <module>
sys.exit(main())
File "/home/anindya/workspace/benchmarks/bench_lightning/venv/lib/python3.10/site-packages/litgpt/__main__.py", line 143, in main
fn(**kwargs)
File "/home/anindya/workspace/benchmarks/bench_lightning/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/anindya/workspace/benchmarks/bench_lightning/venv/lib/python3.10/site-packages/litgpt/generate/base.py", line 169, in main
model = fabric.setup_module(model)
File "/home/anindya/workspace/benchmarks/bench_lightning/venv/lib/python3.10/site-packages/lightning/fabric/fabric.py", line 310, in setup_module
module = self._move_model_to_device(model=module, optimizers=[])
File "/home/anindya/workspace/benchmarks/bench_lightning/venv/lib/python3.10/site-packages/lightning/fabric/fabric.py", line 997, in _move_model_to_device
model = self.to_device(model)
File "/home/anindya/workspace/benchmarks/bench_lightning/venv/lib/python3.10/site-packages/lightning/fabric/fabric.py", line 528, in to_device
self._strategy.module_to_device(obj)
File "/home/anindya/workspace/benchmarks/bench_lightning/venv/lib/python3.10/site-packages/lightning/fabric/strategies/single_device.py", line 59, in module_to_device
module.to(self.root_device)
File "/home/anindya/workspace/benchmarks/bench_lightning/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1152, in to
return self._apply(convert)
File "/home/anindya/workspace/benchmarks/bench_lightning/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 802, in _apply
module._apply(fn)
File "/home/anindya/workspace/benchmarks/bench_lightning/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 825, in _apply
param_applied = fn(param)
File "/home/anindya/workspace/benchmarks/bench_lightning/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1150, in convert
return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
File "/home/anindya/workspace/benchmarks/bench_lightning/venv/lib/python3.10/site-packages/bitsandbytes/nn/modules.py", line 324, in to
return self._quantize(device)
File "/home/anindya/workspace/benchmarks/bench_lightning/venv/lib/python3.10/site-packages/bitsandbytes/nn/modules.py", line 289, in _quantize
w_4bit, quant_state = bnb.functional.quantize_4bit(
File "/home/anindya/workspace/benchmarks/bench_lightning/venv/lib/python3.10/site-packages/bitsandbytes/functional.py", line 1234, in quantize_4bit
raise ValueError(f"Blockwise quantization only supports 16/32-bit floats, but got {A.dtype}")
ValueError: Blockwise quantization only supports 16/32-bit floats, but got torch.uint8 |
@carmocca Could you look into it? Since I don't have access to the model, I cannot even reproduce the issue. |
Hello, I am using the latest version of lit-gpt. First of all, it is much cleaner than before, so amazing work. However I am facing a problem. After I convert a huggingface (Llama 2) model to lit-gpt model, it is running as expected for
But when it comes to int4, I am getting some unexpected error. Here are the logs.
Usage:
I am using the example show in this tutorial (just changed the model path)
And I got this error:
cc: @aniketmaurya @Andrei-Aksionov
The text was updated successfully, but these errors were encountered: