Collections
Discover the best community collections!
Collections including paper arxiv:2406.09246
-
OLMo: Accelerating the Science of Language Models
Paper • 2402.00838 • Published • 85 -
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
Paper • 2403.05530 • Published • 66 -
StarCoder: may the source be with you!
Paper • 2305.06161 • Published • 31 -
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling
Paper • 2312.15166 • Published • 60
-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 29 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 14 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23
-
LEAP Hand: Low-Cost, Efficient, and Anthropomorphic Hand for Robot Learning
Paper • 2309.06440 • Published • 11 -
Robotic Table Tennis: A Case Study into a High Speed Learning System
Paper • 2309.03315 • Published • 7 -
Video Language Planning
Paper • 2310.10625 • Published • 11 -
RoboGen: Towards Unleashing Infinite Data for Automated Robot Learning via Generative Simulation
Paper • 2311.01455 • Published • 30
-
TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks
Paper • 2412.14161 • Published • 51 -
Training Software Engineering Agents and Verifiers with SWE-Gym
Paper • 2412.21139 • Published • 24 -
OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis
Paper • 2412.19723 • Published • 87 -
AgentGen: Enhancing Planning Abilities for Large Language Model based Agent via Environment and Task Generation
Paper • 2408.00764 • Published • 1
-
BLINK: Multimodal Large Language Models Can See but Not Perceive
Paper • 2404.12390 • Published • 26 -
TextSquare: Scaling up Text-Centric Visual Instruction Tuning
Paper • 2404.12803 • Published • 30 -
Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models
Paper • 2404.13013 • Published • 31 -
InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD
Paper • 2404.06512 • Published • 30
-
LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding
Paper • 2306.17107 • Published • 11 -
On the Hidden Mystery of OCR in Large Multimodal Models
Paper • 2305.07895 • Published • 1 -
Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities
Paper • 2308.12966 • Published • 11 -
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models
Paper • 2401.15947 • Published • 53
-
SILC: Improving Vision Language Pretraining with Self-Distillation
Paper • 2310.13355 • Published • 9 -
Woodpecker: Hallucination Correction for Multimodal Large Language Models
Paper • 2310.16045 • Published • 17 -
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Paper • 2201.12086 • Published • 3 -
ImageNetVC: Zero-Shot Visual Commonsense Evaluation on 1000 ImageNet Categories
Paper • 2305.15028 • Published • 1
-
TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks
Paper • 2412.14161 • Published • 51 -
Training Software Engineering Agents and Verifiers with SWE-Gym
Paper • 2412.21139 • Published • 24 -
OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis
Paper • 2412.19723 • Published • 87 -
AgentGen: Enhancing Planning Abilities for Large Language Model based Agent via Environment and Task Generation
Paper • 2408.00764 • Published • 1
-
OLMo: Accelerating the Science of Language Models
Paper • 2402.00838 • Published • 85 -
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
Paper • 2403.05530 • Published • 66 -
StarCoder: may the source be with you!
Paper • 2305.06161 • Published • 31 -
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling
Paper • 2312.15166 • Published • 60
-
BLINK: Multimodal Large Language Models Can See but Not Perceive
Paper • 2404.12390 • Published • 26 -
TextSquare: Scaling up Text-Centric Visual Instruction Tuning
Paper • 2404.12803 • Published • 30 -
Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models
Paper • 2404.13013 • Published • 31 -
InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD
Paper • 2404.06512 • Published • 30
-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 29 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 14 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23
-
LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding
Paper • 2306.17107 • Published • 11 -
On the Hidden Mystery of OCR in Large Multimodal Models
Paper • 2305.07895 • Published • 1 -
Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities
Paper • 2308.12966 • Published • 11 -
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models
Paper • 2401.15947 • Published • 53
-
SILC: Improving Vision Language Pretraining with Self-Distillation
Paper • 2310.13355 • Published • 9 -
Woodpecker: Hallucination Correction for Multimodal Large Language Models
Paper • 2310.16045 • Published • 17 -
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Paper • 2201.12086 • Published • 3 -
ImageNetVC: Zero-Shot Visual Commonsense Evaluation on 1000 ImageNet Categories
Paper • 2305.15028 • Published • 1
-
LEAP Hand: Low-Cost, Efficient, and Anthropomorphic Hand for Robot Learning
Paper • 2309.06440 • Published • 11 -
Robotic Table Tennis: A Case Study into a High Speed Learning System
Paper • 2309.03315 • Published • 7 -
Video Language Planning
Paper • 2310.10625 • Published • 11 -
RoboGen: Towards Unleashing Infinite Data for Automated Robot Learning via Generative Simulation
Paper • 2311.01455 • Published • 30