https://huggingface.co/vanta-research/wraith-coder-7b

#1543
by oraculus541 - opened

It is unfortunately not possible to GGUF quantize an already quantized model.

INFO:hf-to-gguf:Loading model: wraith-coder-7b
INFO:hf-to-gguf:Model architecture: Qwen2ForCausalLM
INFO:hf-to-gguf:gguf: loading model weight map from 'model.safetensors.index.json'
INFO:hf-to-gguf:gguf: indexing model part 'model-00001-of-00002.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00002-of-00002.safetensors'
Traceback (most recent call last):
  File "/llmjob/llama.cpp/convert_hf_to_gguf.py", line 10403, in <module>
    main()
  File "/llmjob/llama.cpp/convert_hf_to_gguf.py", line 10380, in main
    model_instance = model_class(dir_model, output_type, fname_out,
  File "/llmjob/llama.cpp/convert_hf_to_gguf.py", line 743, in __init__
    super().__init__(*args, **kwargs)
  File "/llmjob/llama.cpp/convert_hf_to_gguf.py", line 155, in __init__
    self.dequant_model()
  File "/llmjob/llama.cpp/convert_hf_to_gguf.py", line 452, in dequant_model
    raise NotImplementedError(f"Quant method is not yet supported: {quant_method!r}")
NotImplementedError: Quant method is not yet supported: 'bitsandbytes'
oraculus541 changed discussion status to closed

Sign up or log in to comment