Instructions to use zijinghuafen/GM-PRM with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use zijinghuafen/GM-PRM with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="zijinghuafen/GM-PRM")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("zijinghuafen/GM-PRM")
model = AutoModelForMultimodalLM.from_pretrained("zijinghuafen/GM-PRM")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use zijinghuafen/GM-PRM with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "zijinghuafen/GM-PRM"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "zijinghuafen/GM-PRM",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/zijinghuafen/GM-PRM

SGLang

How to use zijinghuafen/GM-PRM with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "zijinghuafen/GM-PRM" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "zijinghuafen/GM-PRM",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "zijinghuafen/GM-PRM" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "zijinghuafen/GM-PRM",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use zijinghuafen/GM-PRM with Docker Model Runner:
```
docker model run hf.co/zijinghuafen/GM-PRM
```

GM-PRM: A Generative Multimodal Process Reward Model

Model weights for GM-PRM: A Generative Multimodal Process Reward Model for Multimodal Mathematical Reasoning (arXiv:2508.04088).

Accepted at the 4th Workshop on Advances in Language and Vision Research (ALVR), in conjunction with ACL 2026 (San Diego, California, July 2026).

Overview

GM-PRM transforms a multimodal Process Reward Model from a passive binary verifier into an active reasoning collaborator. Instead of emitting a scalar correct/incorrect score per step, it produces a fine-grained, interpretable analysis of each reasoning step along three dimensions:

Step intent — what the step is trying to do
Image alignment — whether the step is consistent with the image
Reasoning logic — whether the logic and calculations are sound

Crucially, GM-PRM is trained to generate a corrected version of the first erroneous step it identifies. This corrective ability powers our test-time inference strategy, Refined Best-of-N (Refined-BoN), which feeds the corrected step back to the policy model to steer it toward a better reasoning trajectory — improving both the diversity and correctness of the solution pool.

Model details

Base model: Qwen/Qwen2.5-VL-7B-Instruct
Training: full-parameter SFT (ViT encoder frozen), 2 epochs, lr 1e-5, bf16, DeepSpeed ZeRO-3
Training data: zijinghuafen/GM-PRM-20K — 19,614 samples (plane geometry + functions), filtered by joint agreement of GPT-4o (LLM-as-a-judge) and Monte-Carlo estimation

Results

Used as the critic in Refined-BoN (N=8), GM-PRM consistently improves policy-model accuracy across five multimodal math benchmarks (MathVista, MathVision, MathVerse, DynaMath, WeMath). Average gains: +5.9 (MiniCPM-V-2.6-8B), +4.5 (Llama-3.2-11B-Vision), +4.5 (Qwen2.5-VL-7B), +5.6 (InternVL3-8B). See the paper for full results.

Usage

GM-PRM is a Qwen2.5-VL model; load it with transformers. Give it the problem image and the policy model's step-by-step solution, and it returns per-step analysis, judgements, and a corrected first-incorrect step.

from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor
from qwen_vl_utils import process_vision_info
import torch, re

model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
    "zijinghuafen/GM-PRM", torch_dtype="auto", device_map="auto")
processor = AutoProcessor.from_pretrained("zijinghuafen/GM-PRM")

image_path = "problem.png"
response = "<the policy model's step-by-step solution>"
steps = re.split(r"\n\s*\n", response)

prompt = (
    "You are an expert in solving multimodal mathematical problems. You will be given:\n"
    "1. An image of a multimodal mathematical problem.\n2. A multi-step solution.\n\n"
    "**Task**:\nThe tasks you need to do are:\n"
    "1. Analyze the purpose of each step and what specific actions were taken.\n"
    "2. Analyze each step's correctness in terms of image alignment and reasoning logic.\n"
    "- Image alignment: Whether the information and reasoning used in the step are consistent with the content of the provided image.\n"
    "- Reasoning logic: Whether the reasoning is logically sound, calculations are correct, and information used matches that from previous steps and question.\n"
    "When outputting judgements, you must choose one output from \"Correct\" or \"Incorrect\".\n"
    "3. For the first incorrect step, correct it based on your analysis of its error and intent, and output the corrected step at the end of your output.\n\n"
    "**Output Format**:\nYou must output your content in the following format:\n"
    "### Step 1 ###\nStep intent analysis:[...]\nImage alignment analysis:[...]\n"
    "Judgement of image alignment:[Correct/Incorrect]\nReasoning logic analysis:[...]\n"
    "Judgement of reasoning logic:[Correct/Incorrect]\nFinal judgement of the current step:[Correct/Incorrect]\n\n"
    "### Step 2 ###\n...\n\n"
    "Corrected step of the first incorrect step:[If there are incorrect steps, the corrected step of the first incorrect step goes here. Otherwise, omit this line]\n\n"
    "**Problem**:\nThe image of problem is as follows:\n<image>\n\n"
    "**Solution Steps**:\nSteps you need to analyze and judge are as follows:\n"
)
for j, s in enumerate(steps):
    prompt += f"Step {j+1}: {s}\n\n"

messages = [{"role": "user", "content": [
    {"type": "image", "image": image_path},
    {"type": "text", "text": prompt},
]}]
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
image_inputs, _ = process_vision_info(messages)
inputs = processor(text=[text], images=image_inputs, padding=True, return_tensors="pt").to(model.device)

with torch.no_grad():
    out = model.generate(**inputs, max_new_tokens=2048, do_sample=False)
print(processor.batch_decode([out[0][inputs.input_ids.shape[1]:]], skip_special_tokens=True)[0])

The probabilities of the generated Correct / Incorrect tokens can be used as step-level scores for Best-of-N selection.

Citation

@inproceedings{zhang2026gmprm,
  title={GM-PRM: A Generative Multimodal Process Reward Model for Multimodal Mathematical Reasoning},
  author={Zhang, Jianghangfan and Yan, Yibo and Zheng, Kening and Zou, Xin and Dai, Song and Hu, Xuming},
  booktitle={Proceedings of the 4th Workshop on Advances in Language and Vision Research (ALVR), in conjunction with ACL 2026},
  month={July},
  year={2026},
  address={San Diego, California, USA},
  publisher={Association for Computational Linguistics}
}

Downloads last month: -

Safetensors

Model size

8B params

Tensor type

BF16

Model tree for zijinghuafen/GM-PRM

Base model

Qwen/Qwen2.5-VL-7B-Instruct

Finetuned

(1104)

this model

Dataset used to train zijinghuafen/GM-PRM

Paper for zijinghuafen/GM-PRM

GM-PRM: A Generative Multimodal Process Reward Model for Multimodal Mathematical Reasoning

Paper • 2508.04088 • Published Aug 6, 2025