-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 29 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 14 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23
Collections
Discover the best community collections!
Collections including paper arxiv:2502.13063
-
Large Language Models Think Too Fast To Explore Effectively
Paper • 2501.18009 • Published • 24 -
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Paper • 2501.17161 • Published • 123 -
Intuitive physics understanding emerges from self-supervised pretraining on natural videos
Paper • 2502.11831 • Published • 20 -
Cramming 1568 Tokens into a Single Vector and Back Again: Exploring the Limits of Embedding Space Capacity
Paper • 2502.13063 • Published • 72
-
VILA^2: VILA Augmented VILA
Paper • 2407.17453 • Published • 41 -
Octopus v4: Graph of language models
Paper • 2404.19296 • Published • 118 -
Octo-planner: On-device Language Model for Planner-Action Agents
Paper • 2406.18082 • Published • 48 -
Dolphin: Long Context as a New Modality for Energy-Efficient On-Device Language Models
Paper • 2408.15518 • Published • 42
-
LightMem: Lightweight and Efficient Memory-Augmented Generation
Paper • 2510.18866 • Published • 110 -
Memory Augmented Language Models through Mixture of Word Experts
Paper • 2311.10768 • Published • 19 -
Cramming 1568 Tokens into a Single Vector and Back Again: Exploring the Limits of Embedding Space Capacity
Paper • 2502.13063 • Published • 72 -
General Agentic Memory Via Deep Research
Paper • 2511.18423 • Published • 157
-
InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU
Paper • 2502.08910 • Published • 148 -
Cramming 1568 Tokens into a Single Vector and Back Again: Exploring the Limits of Embedding Space Capacity
Paper • 2502.13063 • Published • 72 -
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention
Paper • 2502.11089 • Published • 166 -
LLM Pretraining with Continuous Concepts
Paper • 2502.08524 • Published • 29
-
RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval
Paper • 2409.10516 • Published • 43 -
Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Refuse
Paper • 2409.11242 • Published • 7 -
Promptriever: Instruction-Trained Retrievers Can Be Prompted Like Language Models
Paper • 2409.11136 • Published • 22 -
On the Diagram of Thought
Paper • 2409.10038 • Published • 13
-
Compression Represents Intelligence Linearly
Paper • 2404.09937 • Published • 28 -
MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies
Paper • 2404.06395 • Published • 24 -
Long-context LLMs Struggle with Long In-context Learning
Paper • 2404.02060 • Published • 37 -
Are large language models superhuman chemists?
Paper • 2404.01475 • Published • 19
-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 29 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 14 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23
-
LightMem: Lightweight and Efficient Memory-Augmented Generation
Paper • 2510.18866 • Published • 110 -
Memory Augmented Language Models through Mixture of Word Experts
Paper • 2311.10768 • Published • 19 -
Cramming 1568 Tokens into a Single Vector and Back Again: Exploring the Limits of Embedding Space Capacity
Paper • 2502.13063 • Published • 72 -
General Agentic Memory Via Deep Research
Paper • 2511.18423 • Published • 157
-
InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU
Paper • 2502.08910 • Published • 148 -
Cramming 1568 Tokens into a Single Vector and Back Again: Exploring the Limits of Embedding Space Capacity
Paper • 2502.13063 • Published • 72 -
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention
Paper • 2502.11089 • Published • 166 -
LLM Pretraining with Continuous Concepts
Paper • 2502.08524 • Published • 29
-
Large Language Models Think Too Fast To Explore Effectively
Paper • 2501.18009 • Published • 24 -
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Paper • 2501.17161 • Published • 123 -
Intuitive physics understanding emerges from self-supervised pretraining on natural videos
Paper • 2502.11831 • Published • 20 -
Cramming 1568 Tokens into a Single Vector and Back Again: Exploring the Limits of Embedding Space Capacity
Paper • 2502.13063 • Published • 72
-
RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval
Paper • 2409.10516 • Published • 43 -
Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Refuse
Paper • 2409.11242 • Published • 7 -
Promptriever: Instruction-Trained Retrievers Can Be Prompted Like Language Models
Paper • 2409.11136 • Published • 22 -
On the Diagram of Thought
Paper • 2409.10038 • Published • 13
-
VILA^2: VILA Augmented VILA
Paper • 2407.17453 • Published • 41 -
Octopus v4: Graph of language models
Paper • 2404.19296 • Published • 118 -
Octo-planner: On-device Language Model for Planner-Action Agents
Paper • 2406.18082 • Published • 48 -
Dolphin: Long Context as a New Modality for Energy-Efficient On-Device Language Models
Paper • 2408.15518 • Published • 42
-
Compression Represents Intelligence Linearly
Paper • 2404.09937 • Published • 28 -
MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies
Paper • 2404.06395 • Published • 24 -
Long-context LLMs Struggle with Long In-context Learning
Paper • 2404.02060 • Published • 37 -
Are large language models superhuman chemists?
Paper • 2404.01475 • Published • 19