DeepScholar-Bench: A Live Benchmark and Automated Evaluation for Generative Research Synthesis
Paper
• 2508.20033 • Published
• 10
None defined yet.
OfficeQA Pro: An Enterprise Benchmark for End-to-End Grounded Reasoning
KARL: Knowledge Agents via Reinforcement Learning