view article Article Benchmark Smarter: Tailor Your Model Evaluation Suite with EvalScope about 1 month ago β’ 7
Qwen2.5-Coder Collection Code-specific model series based on Qwen2.5 β’ 40 items β’ Updated Dec 31, 2025 β’ 356
Llama-3.1-Nemotron-70B Collection SOTA models on Arena Hard and RewardBench as of 1 Oct 2024. β’ 6 items β’ Updated 17 days ago β’ 155