https://huggingface.co/vanta-research/wraith-coder-7b
#1543
by
oraculus541
- opened
It is unfortunately not possible to GGUF quantize an already quantized model.
INFO:hf-to-gguf:Loading model: wraith-coder-7b
INFO:hf-to-gguf:Model architecture: Qwen2ForCausalLM
INFO:hf-to-gguf:gguf: loading model weight map from 'model.safetensors.index.json'
INFO:hf-to-gguf:gguf: indexing model part 'model-00001-of-00002.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00002-of-00002.safetensors'
Traceback (most recent call last):
File "/llmjob/llama.cpp/convert_hf_to_gguf.py", line 10403, in <module>
main()
File "/llmjob/llama.cpp/convert_hf_to_gguf.py", line 10380, in main
model_instance = model_class(dir_model, output_type, fname_out,
File "/llmjob/llama.cpp/convert_hf_to_gguf.py", line 743, in __init__
super().__init__(*args, **kwargs)
File "/llmjob/llama.cpp/convert_hf_to_gguf.py", line 155, in __init__
self.dequant_model()
File "/llmjob/llama.cpp/convert_hf_to_gguf.py", line 452, in dequant_model
raise NotImplementedError(f"Quant method is not yet supported: {quant_method!r}")
NotImplementedError: Quant method is not yet supported: 'bitsandbytes'
oraculus541
changed discussion status to
closed