Model Card for Qwen2.5-Math-7B-Instruct-SHARP-Math-PRM
Introduction
Qwen2.5-Math-7B-Instruct-SHARP-Math-PRM is a Process Reward Model (PRM) fine-tuned from Qwen2.5-Math-7B-Instruct. This model is specifically designed to evaluate the correctness of intermediate reasoning steps in mathematical problem-solving processes, enabling more reliable and interpretable mathematical reasoning.
The model has been trained on the SHARP-Math dataset using the Process Reward Model methodology, which provides step-by-step feedback on mathematical reasoning chains.
This model is part of the SHARP-PRM series, trained using advanced Process Reward Model techniques.
Model Information
Base Model
- Base Model: Qwen/Qwen2.5-Math-7B-Instruct
- Architecture: Qwen2ForTokenClassification
- Parameters: 7B
Training Details
- Training Dataset: SHARP-Math (Process Reward Model dataset)
- Training Method: Process Reward Model (PRM) as introduced in Uesato et al., 2022
- Training Framework: TRL (Transformer Reinforcement Learning) v0.24.0
- Task Type: Token Classification (binary classification: error/correct for each reasoning step)
PRM Evaluation
This model is designed to evaluate mathematical reasoning processes by:
- Step-level Evaluation: Classifying each step in a reasoning chain as either "correct" or "error"
- Process Feedback: Providing feedback on the reasoning process, not just the final answer
- Error Detection: Identifying where mistakes occur in multi-step mathematical solutions
Evaluation Metrics
The model is evaluated on the ProcessBench benchmark.
Key metrics include:
- Error Accuracy: Ability to correctly identify incorrect steps
- Correct Accuracy: Ability to correctly identify correct steps
- F1 Score: Balanced measure of error and correct step classification
Quick Start
Installation
pip install transformers torch
Basic Usage
Using the Model for Step Classification
from transformers import AutoModelForTokenClassification, AutoTokenizer
import torch
import torch.nn.functional as F
model_name = "path/to/Qwen2.5-Math-7B-Instruct-SHARP-Math-PRM"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForTokenClassification.from_pretrained(model_name)
model.eval()
# Example: Evaluate a mathematical reasoning chain
# Problem with steps (one correct, one incorrect)
problem = "Solve: 2x + 5 = 13"
steps = [
"Subtract 5 from both sides: 2x = 8", # Correct step
"Divide by 2: x = 5" # Incorrect step (should be x = 4)
]
# Format input with step separator
input_text = problem + "\n\n" + "\n\n".join(steps)
inputs = tokenizer(input_text, return_tensors="pt", truncation=True, max_length=8192)
# Get model predictions
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits # Shape: [batch_size, sequence_length, num_labels]
probabilities = F.softmax(logits, dim=-1) # Convert to probabilities
predictions = torch.argmax(logits, dim=-1) # Get predicted class indices
# Aggregate predictions per step
# In practice, you would map tokens to steps based on your step separator
labels = ["error", "correct"]
for i, step in enumerate(steps):
# Get average probability for step tokens (simplified)
# In real usage, you'd need to map token positions to step boundaries
step_start = len(tokenizer(problem + "\n\n", return_tensors="pt")["input_ids"][0])
step_tokens = predictions[0, step_start:step_start+len(tokenizer(step)["input_ids"])]
step_label = labels[step_tokens.mode().values.item()] if len(step_tokens) > 0 else "unknown"
print(f"\nStep {i+1}: {step}")
print(f" Prediction: {step_label}")
print(f" Confidence: {probabilities[0, step_start, 1].item():.2%}")
# Expected output:
# Step 1: Subtract 5 from both sides: 2x = 8
# Prediction: correct
# Confidence: 0.95
#
# Step 2: Divide by 2: x = 5
# Prediction: error
# Confidence: 0.87
Output Interpretation:
- Logits: Raw scores from the model (before softmax). Higher values indicate stronger confidence.
- Probabilities: Softmax-normalized scores between 0 and 1. Sum to 1 for each token.
- Predictions: Class indices (0 = "error", 1 = "correct") for each token.
Using with Pipeline
from transformers import pipeline
classifier = pipeline(
"token-classification",
model="path/to/Qwen2.5-Math-7B-Instruct-SHARP-Math-PRM",
tokenizer="path/to/Qwen2.5-Math-7B-Instruct-SHARP-Math-PRM",
device=0 if torch.cuda.is_available() else -1
)
# Classify reasoning steps
result = classifier(problem + "\n\n" + "\n\n".join(steps))
Integration with Mathematical Reasoning
This PRM model can be used to:
- Filter incorrect reasoning paths in tree-of-thought or chain-of-thought generation
- Provide feedback during step-by-step problem solving
- Evaluate solution quality before final answer generation
- Improve training by identifying problematic reasoning patterns
Training Procedure
Training Configuration
- Learning Rate: 2e-5
- Batch Size: Per-device batch size (with gradient accumulation)
- Epochs: Multiple epochs with early stopping
- Optimizer: AdamW with cosine learning rate schedule
- Warmup Ratio: 3%
- Gradient Clipping: 5.0
- Precision: bfloat16
- Gradient Checkpointing: Enabled for memory efficiency
Training Framework Versions
- TRL: 0.24.0
- Transformers: 4.56.2
- PyTorch: 2.9.1
- Datasets: 4.4.1
- Tokenizers: 0.22.1
Training Data
The model was trained on the SHARP-Math dataset, which contains:
- Mathematical problems with step-by-step solutions
- Labeled reasoning steps (correct/error)
- Diverse mathematical domains and difficulty levels
Use Cases
1. Mathematical Reasoning Evaluation
- Evaluate intermediate steps in mathematical problem-solving
- Identify errors in multi-step calculations
- Provide feedback on reasoning quality
2. Educational Applications
- Automated grading of mathematical solutions
- Step-by-step feedback for students
- Identification of common error patterns
3. Research Applications
- Training better mathematical reasoning models
- Analyzing reasoning patterns
- Improving chain-of-thought generation
Limitations and Considerations
- Domain Specificity: This model is specifically trained for mathematical reasoning and may not generalize well to other domains
- Step Length: The model is optimized for step-level evaluation with a 256-token context per step
- Language: The model is primarily trained on English mathematical content
- False Positives/Negatives: Like all classification models, it may misclassify some steps
Citation
If you use this model in your research, please cite:
@misc{qwen2.5-math-7b-instruct-sharp-math-prm,
title={Qwen2.5-Math-7B-Instruct-SHARP-Math-PRM: A Process Reward Model for Mathematical Reasoning},
author={Your Name/Organization},
year={2025},
howpublished={\url{https://huggingface.co/path/to/Qwen2.5-Math-7B-Instruct-SHARP-Math-PRM}}
}
Model Card Version: 1.0
Last Updated: 2025-12-30
- Downloads last month
- -