SWE-rebench: An Automated Pipeline for Task Collection and Decontaminated Evaluation of Software Engineering Agents Paper • 2505.20411 • Published May 26 • 91
FactAlign: Long-form Factuality Alignment of Large Language Models Paper • 2410.01691 • Published Oct 2, 2024 • 9