Towards Automated Kernel Generation in the Era of LLMs Paper • 2601.15727 • Published 4 days ago • 16 • 3
Rethinking Composed Image Retrieval Evaluation: A Fine-Grained Benchmark from Image Editing Paper • 2601.16125 • Published 4 days ago • 13 • 2
From Passive Metric to Active Signal: The Evolving Role of Uncertainty Quantification in Large Language Models Paper • 2601.15690 • Published 4 days ago • 4 • 2
Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces Paper • 2601.11868 • Published 9 days ago • 28 • 1
Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders Paper • 2601.16208 • Published 4 days ago • 50 • 2
BayesianVLA: Bayesian Decomposition of Vision Language Action Models via Latent Action Queries Paper • 2601.15197 • Published 5 days ago • 54 • 4
VIOLA: Towards Video In-Context Learning with Minimal Annotations Paper • 2601.15549 • Published 4 days ago • 4 • 2
The Flexibility Trap: Why Arbitrary Order Limits Reasoning Potential in Diffusion Language Models Paper • 2601.15165 • Published 5 days ago • 63 • 5
MirrorBench: An Extensible Framework to Evaluate User-Proxy Agents for Human-Likeness Paper • 2601.08118 • Published 13 days ago • 1 • 3