LongVideoAgent: Multi-Agent Reasoning with Long Videos Paper • 2512.20618 • Published 4 days ago • 48
JarvisEvo: Towards a Self-Evolving Photo Editing Agent with Synergistic Editor-Evaluator Optimization Paper • 2511.23002 • Published 30 days ago • 26
ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform Data Paper • 2509.15221 • Published Sep 18 • 111
Follow-Your-Shape: Shape-Aware Image Editing via Trajectory-Guided Region Control Paper • 2508.08134 • Published Aug 11 • 10
Follow-Your-Shape: Shape-Aware Image Editing via Trajectory-Guided Region Control Paper • 2508.08134 • Published Aug 11 • 10 • 2
Autoregressive Semantic Visual Reconstruction Helps VLMs Understand Better Paper • 2506.09040 • Published Jun 10 • 34
Large Motion Video Autoencoding with Cross-modal Video VAE Paper • 2412.17805 • Published Dec 23, 2024 • 24
VideoGen-of-Thought: A Collaborative Framework for Multi-Shot Video Generation Paper • 2412.02259 • Published Dec 3, 2024 • 60
FineCLIPER: Multi-modal Fine-grained CLIP for Dynamic Facial Expression Recognition with AdaptERs Paper • 2407.02157 • Published Jul 2, 2024
VideoGen-of-Thought: A Collaborative Framework for Multi-Shot Video Generation Paper • 2412.02259 • Published Dec 3, 2024 • 60 • 5
VideoGen-of-Thought: A Collaborative Framework for Multi-Shot Video Generation Paper • 2412.02259 • Published Dec 3, 2024 • 60 • 5
Identity-Preserving Text-to-Video Generation by Frequency Decomposition Paper • 2411.17440 • Published Nov 26, 2024 • 37
OmniCreator: Self-Supervised Unified Generation with Universal Editing Paper • 2412.02114 • Published Dec 3, 2024 • 14
VideoGen-of-Thought: A Collaborative Framework for Multi-Shot Video Generation Paper • 2412.02259 • Published Dec 3, 2024 • 60