2 19 4

Dian Yu

yudian

https://scholar.google.com/citations?user=ERdzqyYAAAAJ&hl=en

AI & ML interests

NLP

Recent Activity

upvoted a paper 26 days ago

Lumine: An Open Recipe for Building Generalist Agents in 3D Open Worlds

upvoted a paper about 2 months ago

Every Question Has Its Own Value: Reinforcement Learning with Explicit Human Values

commented on a paper about 2 months ago

Every Question Has Its Own Value: Reinforcement Learning with Explicit Human Values

View all activity

Organizations

None yet

upvoted a paper 26 days ago

Lumine: An Open Recipe for Building Generalist Agents in 3D Open Worlds

Paper • 2511.08892 • Published 29 days ago • 194

upvoted a paper about 2 months ago

Every Question Has Its Own Value: Reinforcement Learning with Explicit Human Values

Paper • 2510.20187 • Published Oct 23 • 18

commented a paper about 2 months ago

Every Question Has Its Own Value: Reinforcement Learning with Explicit Human Values

Paper • 2510.20187 • Published Oct 23 • 18 •

upvoted 2 papers 2 months ago

VOGUE: Guiding Exploration with Visual Uncertainty Improves Multimodal Reasoning

Paper • 2510.01444 • Published Oct 1 • 19

CLUE: Non-parametric Verification from Experience via Hidden-State Clustering

Paper • 2510.01591 • Published Oct 2 • 26

upvoted a paper 3 months ago

CDE: Curiosity-Driven Exploration for Efficient Reinforcement Learning in Large Language Models

Paper • 2509.09675 • Published Sep 11 • 28

liked a model 3 months ago

LMMs-Lab-Turtle/SelfRewarded-R1-7B

8B • Updated Aug 19 • 2.08k • 4

upvoted a paper 4 months ago

Complex Logical Instruction Generation

Paper • 2508.09125 • Published Aug 12 • 40

authored 2 papers 5 months ago

DeepMath-103K: A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning

Paper • 2504.11456 • Published Apr 15 • 12

One Token to Fool LLM-as-a-Judge

Paper • 2507.08794 • Published Jul 11 • 31

upvoted a paper 5 months ago

One Token to Fool LLM-as-a-Judge

Paper • 2507.08794 • Published Jul 11 • 31

commented a paper 5 months ago

One Token to Fool LLM-as-a-Judge

Paper • 2507.08794 • Published Jul 11 • 31 •

liked a dataset 5 months ago

sarosavo/Master-RM

Viewer • Updated Jul 15 • 180k • 88 • 10

liked a model 5 months ago

sarosavo/Master-RM

Text Classification • 8B • Updated Jul 15 • 102 • 16

authored a paper 8 months ago

Expanding RL with Verifiable Rewards Across Diverse Domains

Paper • 2503.23829 • Published Mar 31 • 23

upvoted a paper 8 months ago

Expanding RL with Verifiable Rewards Across Diverse Domains

Paper • 2503.23829 • Published Mar 31 • 23

upvoted a collection 8 months ago

RLVR

Collection

Model and data for 'Expanding RL with Verifiable Rewards Across Diverse Domains' • 3 items • Updated Mar 31 • 13

authored 3 papers 9 months ago

Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs

Paper • 2412.21187 • Published Dec 30, 2024 • 40

OpenCharacter: Training Customizable Role-Playing LLMs with Large-Scale Synthetic Personas

Paper • 2501.15427 • Published Jan 26 • 6

Improving LLM General Preference Alignment via Optimistic Online Mirror Descent

Paper • 2502.16852 • Published Feb 24

Dian Yu

AI & ML interests

Recent Activity

Organizations

yudian's activity