R-4B-GGUF

This repository contains GGUF quantized versions of YannQi/R-4B, a state-of-the-art multimodal large language model designed for general-purpose auto-thinking.

⚠️ Important Notice: R-4B support is currently only available in a custom llama.cpp branch. Please use baseweight/llama.cpp (support-r-model branch) until R-4B support is merged upstream.

About R-4B

R-4B is a breakthrough multimodal LLM that autonomously switches between step-by-step thinking and direct response generation based on task complexity. This enables high-quality responses while significantly improving inference efficiency.

Key achievements:

#1 rank on the OpenCompass Multi-modal Reasoning Leaderboard among all open-source models
#1 rank under 20B parameters on the OpenCompass Multi-modal Academic Leaderboard

All credit for this amazing model goes to YannQi and the research team. Please see the original repository and arXiv paper for more details.

Quantization Information

These GGUF files are compatible with llama.cpp and were created from the original R-4B model.

Available Files

Filename	Quant Type	File Size	Description	Use Case
`R-4B-F16.gguf`	F16	8.3 GB	Original precision	Best quality, highest VRAM usage
`R-4B-Q8_0.gguf`	Q8_0	4.4 GB	Very high quality	Excellent quality/size balance
`R-4B-Q6_K.gguf`	Q6_K	3.4 GB	High quality	Good quality, moderate size
`R-4B-Q5_K_M.gguf`	Q5_K_M	3.0 GB	Medium quality	Recommended for most users
`R-4B-Q4_K_M.gguf`	Q4_K_M	2.6 GB	Good quality	Best size/quality compromise
`mmproj-R-4b-F16.gguf`	F16	780 MB	Vision projector	Required for vision tasks

Important: The mmproj-R-4b-F16.gguf file is required for all vision-language tasks. Download it along with your chosen model quantization.

Quantization Recommendations

Q4_K_M: Best balance for most users - good quality at smallest size
Q5_K_M: Recommended for better quality with reasonable size
Q6_K: High quality with larger size
Q8_0: Near-original quality, moderate compression
F16: Original precision, largest size

Usage with llama.cpp

Prerequisites

Clone and build the custom llama.cpp branch with R-4B support:

git clone https://github.com/baseweight/llama.cpp.git
cd llama.cpp
git checkout support-r-model
make

Download both the model file and mmproj-R-4b-F16.gguf from this repository

Basic Usage

# Text + Image inference
./llama-cli \
  -m R-4B-Q5_K_M.gguf \
  --mmproj mmproj-R-4b-F16.gguf \
  --image path/to/your/image.jpg \
  -p "Describe this image in detail."

Advanced Options

# With custom parameters
./llama-cli \
  -m R-4B-Q5_K_M.gguf \
  --mmproj mmproj-R-4b-F16.gguf \
  --image image.jpg \
  -p "What is happening in this image?" \
  -c 4096 \
  -n 512 \
  --temp 0.7 \
  --top-p 0.9

Server Mode

# Run as API server
./llama-server \
  -m R-4B-Q5_K_M.gguf \
  --mmproj mmproj-R-4b-F16.gguf \
  --host 0.0.0.0 \
  --port 8080

R-4B Features

Adaptive Thinking Modes

R-4B supports three modes of operation:

Auto-thinking Mode: Automatically decides when to use step-by-step reasoning
Thinking Mode: Explicitly uses reasoning for complex tasks
Non-thinking Mode: Direct responses for simple queries

Key Capabilities

General-purpose visual question answering
Complex logical reasoning and mathematical problem-solving
Adaptive computational efficiency
State-of-the-art performance on multimodal benchmarks

Citation

If you use this model in your research, please cite the original work:

@misc{yang2025r4bincentivizinggeneralpurposeautothinking,
      title={R-4B: Incentivizing General-Purpose Auto-Thinking Capability in MLLMs via Bi-Mode Annealing and Reinforce Learning},
      author={Qi Yang and Bolin Ni and Shiming Xiang and Han Hu and Houwen Peng and Jie Jiang},
      year={2025},
      eprint={2508.21113},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2508.21113},
}

Acknowledgements

This quantization repository is created to make R-4B more accessible for llama.cpp users. All credit for the original model development goes to:

YannQi and the R-4B research team
Original model available at YannQi/R-4B

The base R-4B model was developed using:

License

This model is released under the Apache 2.0 license, following the original R-4B model.

Downloads last month: 96

GGUF

Model size

4B params

Architecture

qwen3

Hardware compatibility

4-bit

5-bit

6-bit

8-bit

16-bit

Model tree for infil00p/R-4B-GGUF

Base model

Qwen/Qwen3-4B-Base

Finetuned

Qwen/Qwen3-4B