On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification Paper • 2508.05629 • Published Aug 7, 2025 • 183
meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8 Image-Text-to-Text • 402B • Updated May 22, 2025 • 56.9k • • 152