LAUNCH Lab

university

https://launch.eecs.umich.edu/

launchnlp

Activity Feed

AI & ML interests

Factuality, reasoning, alignment, LLM applications

Recent Activity

jpeper published a dataset 19 days ago

launch/LudoBench

jpeper published a Space 19 days ago

launch/LudoBench

jpeper updated a dataset 19 days ago

launch/LudoBench

View all activity

Papers

Gaming the Judge: Unfaithful Chain-of-Thought Can Undermine Agent Evaluation

View all Papers

Collections 1

spaces 7

LudoBench

🎲

Multimodal Game Reasoning Benchmark [ICLR 2026]

Answer Convergence Early Stopping

🛑

Demo for EMNLP Paper "Answer Convergence as a Signal..."

FactRBench

🏆

View and analyze long-form factuality leaderboard

ExpertLongBench

🚀

Leaderboard for ExpertLongBench

ManyICLBench

🚀

Leaderboard for ManyICLBench

MLRC-BENCH

📊

Display model performance rankings

View 7 Spaces

models 4

datasets 13

launch/LudoBench

Viewer • Updated 19 days ago • 638 • 15

launch/ExpertLongBench

Preview • Updated Jul 30, 2025 • 630 • 10

launch/thinkprm-1K-verification-cots

Viewer • Updated Jul 1, 2025 • 1k • 34 • 6

launch/ManyICLBench

Viewer • Updated Jun 26, 2025 • 66 • 581 • 1

launch/CMV

Viewer • Updated Jun 26, 2025 • 133 • 31

launch/FactRBench

Viewer • Updated Jun 9, 2025 • 1.06k • 84 • 1

launch/FactBench

Viewer • Updated Jun 9, 2025 • 1k • 74 • 3

launch/CLASH

Viewer • Updated Apr 16, 2025 • 345 • 47 • 4

launch/gov_report

Viewer • Updated Nov 9, 2022 • 58.4k • 315 • 11

launch/gov_report_qs

Viewer • Updated Nov 9, 2022 • 7.87k • 85 • 4

View 13 datasets

LAUNCH Lab

AI & ML interests

Recent Activity

Papers

Collections 1

launch/ThinkPRM-1.5B

launch/ThinkPRM-7B

launch/ThinkPRM-14B

mradermacher/ThinkPRM-7B-i1-GGUF

launch/ThinkPRM-1.5B

launch/ThinkPRM-7B

launch/ThinkPRM-14B

mradermacher/ThinkPRM-7B-i1-GGUF

spaces 7

LudoBench

Answer Convergence Early Stopping

FactRBench

ExpertLongBench

ManyICLBench

MLRC-BENCH

models 4

launch/ThinkPRM-14B

launch/ThinkPRM-1.5B

launch/ThinkPRM-7B

launch/POLITICS

datasets 13

launch/LudoBench

launch/ExpertLongBench

launch/thinkprm-1K-verification-cots

launch/ManyICLBench

launch/CMV

launch/FactRBench

launch/FactBench

launch/CLASH

launch/gov_report

launch/gov_report_qs

AI & ML interests

Recent Activity

Papers

Team members 17

Collections 1

spaces 7 Sort: Recently updated

LudoBench

Answer Convergence Early Stopping

FactRBench

ExpertLongBench

ManyICLBench

MLRC-BENCH

models 4 Sort: Recently updated

datasets 13 Sort: Recently updated

spaces 7

models 4

datasets 13