YAML Metadata
Warning:
The pipeline tag "text2text-generation" is not in the official list: text-classification, token-classification, table-question-answering, question-answering, zero-shot-classification, translation, summarization, feature-extraction, text-generation, fill-mask, sentence-similarity, text-to-speech, text-to-audio, automatic-speech-recognition, audio-to-audio, audio-classification, audio-text-to-text, voice-activity-detection, depth-estimation, image-classification, object-detection, image-segmentation, text-to-image, image-to-text, image-to-image, image-to-video, unconditional-image-generation, video-classification, reinforcement-learning, robotics, tabular-classification, tabular-regression, tabular-to-text, table-to-text, multiple-choice, text-ranking, text-retrieval, time-series-forecasting, text-to-video, image-text-to-text, image-text-to-image, image-text-to-video, visual-question-answering, document-question-answering, zero-shot-image-classification, graph-ml, mask-generation, zero-shot-object-detection, text-to-3d, image-to-3d, image-feature-extraction, video-text-to-text, keypoint-detection, visual-document-retrieval, any-to-any, video-to-video, other
flan-ul2: candle quants
Quants of google/flan-ul2 with candle
cargo run --example quantized-t5 --release -- \
--model-id pszemraj/candle-flanUL2-quantized \
--weight-file flan-ul2-q3k.gguf \
--prompt "Answer the following question by reasoning step by step. The cafeteria had 23 apples. If they used 20 for lunch, and bought 6 more, how many apples do they have?" \
--temperature 0
On my laptop (CPU, running in WSL) I get: 45 tokens generated (0.48 token/s)
weights
Below are the weights/file names in this repo:
| Weight File Name | Quant Format | Size (GB) |
|---|---|---|
| flan-ul2-q2k.gguf | q2k | 6.39 |
| flan-ul2-q3k.gguf | q3k | 8.36 |
| flan-ul2-q4k.gguf | q4k | 10.9 |
| flan-ul2-q6k.gguf | q6k | 16 |
From initial testing:
- it appears that q2k is too low precision and produces poor/incoherent output. The
q3kand higher are coherent. - Interestingly, there is no noticeable increase in computation time (again, on CPU) when using higher precision quants. I get the same tok/sec for q3k and q6k +/- 0.02
setup
this assumes you already have rust installed
git clone https://github.com/huggingface/candle.git
cd candle
cargo build
- Downloads last month
- 94
Hardware compatibility
Log In
to view the estimation
We're not able to determine the quantization variants.
Model tree for pszemraj/candle-flanUL2-quantized
Base model
google/flan-ul2