REVISOR: Beyond Textual Reflection, Towards Multimodal Introspective Reasoning in Long-Form Video Understanding Paper • 2511.13026 • Published Nov 17 • 25
Q-Frame: Query-aware Frame Selection and Multi-Resolution Adaptation for Video-LLMs Paper • 2506.22139 • Published Jun 27 • 2
HyperClick: Advancing Reliable GUI Grounding via Uncertainty Calibration Paper • 2510.27266 • Published Oct 31 • 20
VLA-RFT: Vision-Language-Action Reinforcement Fine-tuning with Verified Rewards in World Simulators Paper • 2510.00406 • Published Oct 1 • 65