YAML Metadata Warning: The pipeline tag "text2text-generation" is not in the official list: text-classification, token-classification, table-question-answering, question-answering, zero-shot-classification, translation, summarization, feature-extraction, text-generation, fill-mask, sentence-similarity, text-to-speech, text-to-audio, automatic-speech-recognition, audio-to-audio, audio-classification, audio-text-to-text, voice-activity-detection, depth-estimation, image-classification, object-detection, image-segmentation, text-to-image, image-to-text, image-to-image, image-to-video, unconditional-image-generation, video-classification, reinforcement-learning, robotics, tabular-classification, tabular-regression, tabular-to-text, table-to-text, multiple-choice, text-ranking, text-retrieval, time-series-forecasting, text-to-video, image-text-to-text, image-text-to-image, image-text-to-video, visual-question-answering, document-question-answering, zero-shot-image-classification, graph-ml, mask-generation, zero-shot-object-detection, text-to-3d, image-to-3d, image-feature-extraction, video-text-to-text, keypoint-detection, visual-document-retrieval, any-to-any, video-to-video, other

flan-ul2: candle quants

Quants of google/flan-ul2 with candle

cargo run --example quantized-t5 --release  -- \
    --model-id pszemraj/candle-flanUL2-quantized \
    --weight-file flan-ul2-q3k.gguf \
    --prompt "Answer the following question by reasoning step by step. The cafeteria had 23 apples. If they used 20 for lunch, and bought 6 more, how many apples do they have?" \
    --temperature 0

On my laptop (CPU, running in WSL) I get: 45 tokens generated (0.48 token/s)

weights

Below are the weights/file names in this repo:

Weight File Name	Quant Format	Size (GB)
flan-ul2-q2k.gguf	q2k	6.39
flan-ul2-q3k.gguf	q3k	8.36
flan-ul2-q4k.gguf	q4k	10.9
flan-ul2-q6k.gguf	q6k	16

From initial testing:

it appears that q2k is too low precision and produces poor/incoherent output. The q3k and higher are coherent.
Interestingly, there is no noticeable increase in computation time (again, on CPU) when using higher precision quants. I get the same tok/sec for q3k and q6k +/- 0.02

setup

this assumes you already have rust installed

git clone https://github.com/huggingface/candle.git
cd candle
cargo build

Downloads last month: 94

GGUF

Model size

19B params

Architecture

undefined

Hardware compatibility

We're not able to determine the quantization variants.

View all variants

Model tree for pszemraj/candle-flanUL2-quantized

Base model

google/flan-ul2

Quantized

(1)

this model