R-4B-GGUF

This repository contains GGUF quantized versions of YannQi/R-4B, a state-of-the-art multimodal large language model designed for general-purpose auto-thinking.

โš ๏ธ Important Notice: R-4B support is currently only available in a custom llama.cpp branch. Please use baseweight/llama.cpp (support-r-model branch) until R-4B support is merged upstream.

About R-4B

R-4B is a breakthrough multimodal LLM that autonomously switches between step-by-step thinking and direct response generation based on task complexity. This enables high-quality responses while significantly improving inference efficiency.

Key achievements:

All credit for this amazing model goes to YannQi and the research team. Please see the original repository and arXiv paper for more details.

Quantization Information

These GGUF files are compatible with llama.cpp and were created from the original R-4B model.

Available Files

Filename Quant Type File Size Description Use Case
R-4B-F16.gguf F16 8.3 GB Original precision Best quality, highest VRAM usage
R-4B-Q8_0.gguf Q8_0 4.4 GB Very high quality Excellent quality/size balance
R-4B-Q6_K.gguf Q6_K 3.4 GB High quality Good quality, moderate size
R-4B-Q5_K_M.gguf Q5_K_M 3.0 GB Medium quality Recommended for most users
R-4B-Q4_K_M.gguf Q4_K_M 2.6 GB Good quality Best size/quality compromise
mmproj-R-4b-F16.gguf F16 780 MB Vision projector Required for vision tasks

Important: The mmproj-R-4b-F16.gguf file is required for all vision-language tasks. Download it along with your chosen model quantization.

Quantization Recommendations

  • Q4_K_M: Best balance for most users - good quality at smallest size
  • Q5_K_M: Recommended for better quality with reasonable size
  • Q6_K: High quality with larger size
  • Q8_0: Near-original quality, moderate compression
  • F16: Original precision, largest size

Usage with llama.cpp

Prerequisites

  1. Clone and build the custom llama.cpp branch with R-4B support:
    git clone https://github.com/baseweight/llama.cpp.git
    cd llama.cpp
    git checkout support-r-model
    make
    
  2. Download both the model file and mmproj-R-4b-F16.gguf from this repository

Basic Usage

# Text + Image inference
./llama-cli \
  -m R-4B-Q5_K_M.gguf \
  --mmproj mmproj-R-4b-F16.gguf \
  --image path/to/your/image.jpg \
  -p "Describe this image in detail."

Advanced Options

# With custom parameters
./llama-cli \
  -m R-4B-Q5_K_M.gguf \
  --mmproj mmproj-R-4b-F16.gguf \
  --image image.jpg \
  -p "What is happening in this image?" \
  -c 4096 \
  -n 512 \
  --temp 0.7 \
  --top-p 0.9

Server Mode

# Run as API server
./llama-server \
  -m R-4B-Q5_K_M.gguf \
  --mmproj mmproj-R-4b-F16.gguf \
  --host 0.0.0.0 \
  --port 8080

R-4B Features

Adaptive Thinking Modes

R-4B supports three modes of operation:

  1. Auto-thinking Mode: Automatically decides when to use step-by-step reasoning
  2. Thinking Mode: Explicitly uses reasoning for complex tasks
  3. Non-thinking Mode: Direct responses for simple queries

Key Capabilities

  • General-purpose visual question answering
  • Complex logical reasoning and mathematical problem-solving
  • Adaptive computational efficiency
  • State-of-the-art performance on multimodal benchmarks

Citation

If you use this model in your research, please cite the original work:

@misc{yang2025r4bincentivizinggeneralpurposeautothinking,
      title={R-4B: Incentivizing General-Purpose Auto-Thinking Capability in MLLMs via Bi-Mode Annealing and Reinforce Learning},
      author={Qi Yang and Bolin Ni and Shiming Xiang and Han Hu and Houwen Peng and Jie Jiang},
      year={2025},
      eprint={2508.21113},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2508.21113},
}

Links

Acknowledgements

This quantization repository is created to make R-4B more accessible for llama.cpp users. All credit for the original model development goes to:

  • YannQi and the R-4B research team
  • Original model available at YannQi/R-4B

The base R-4B model was developed using:

License

This model is released under the Apache 2.0 license, following the original R-4B model.

Downloads last month
96
GGUF
Model size
4B params
Architecture
qwen3
Hardware compatibility
Log In to view the estimation

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for infil00p/R-4B-GGUF

Base model

Qwen/Qwen3-4B-Base
Finetuned
Qwen/Qwen3-4B
Finetuned
YannQi/R-4B
Quantized
(1)
this model