Bottom-up Policy Optimization: Your Language Model Policy Secretly Contains Internal Policies Paper • 2512.19673 • Published 3 days ago • 54
QwenLong-L1.5: Post-Training Recipe for Long-Context Reasoning and Memory Management Paper • 2512.12967 • Published 10 days ago • 98
On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models Paper • 2512.07783 • Published 17 days ago • 35
ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration Paper • 2511.21689 • Published 29 days ago • 108
DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models Paper • 2512.02556 • Published 23 days ago • 230
MMaDA-Parallel: Multimodal Large Diffusion Language Models for Thinking-Aware Editing and Generation Paper • 2511.09611 • Published Nov 12 • 68
Reasoning with Sampling: Your Base Model is Smarter Than You Think Paper • 2510.14901 • Published Oct 16 • 47
Lavida-O: Elastic Large Masked Diffusion Models for Unified Multimodal Understanding and Generation Paper • 2509.19244 • Published Sep 23 • 11
Don't Overthink It: A Survey of Efficient R1-style Large Reasoning Models Paper • 2508.02120 • Published Aug 4 • 19
dLLM & dMLLM Collection (M)LLMs based on Discrete Diffusion Model and relevant techniques • 16 items • Updated Jul 23 • 2
LAPO: Internalizing Reasoning Efficiency via Length-Adaptive Policy Optimization Paper • 2507.15758 • Published Jul 21 • 35
VerifyBench: Benchmarking Reference-based Reward Systems for Large Language Models Paper • 2505.15801 • Published May 21 • 17
Let LLMs Break Free from Overthinking via Self-Braking Tuning Paper • 2505.14604 • Published May 20 • 23
Mind the Gap: Bridging Thought Leap for Improved Chain-of-Thought Tuning Paper • 2505.14684 • Published May 20 • 24