stair-lab/code_insights_csv
Viewer
• Updated
• 3.07M • 21
• 1
stair-lab/nonmyopia_results
Updated
• 7.36k
stair-lab/code_insights_results
Preview
• Updated
• 25
Viewer
• Updated
• 404 • 86
Viewer
• Updated
• 21.2k • 7
stair-lab/cultural_value_understanding_wvs
Viewer
• Updated
• 1k • 14
stair-lab/chatbot_arena_embedding
Viewer
• Updated
• 323k • 7
Viewer
• Updated
• 23.3k • 11
stair-lab/zeroshot_evaluator
Viewer
• Updated
• 1M • 12
stair-lab/zero_shot_evaluator_openllm_val
Preview
• Updated
• 10
stair-lab/zero_evaluator_agentic
Viewer
• Updated
• 34.7k • 9
stair-lab/zero_shot_open_llm_leaderboard
Viewer
• Updated
• 74.6M • 95
stair-lab/irsl_downstream_resmat1_fullinfo
Updated
• 39
stair-lab/irsl_testtime_resmat1
stair-lab/irsl_downstream_resmat1_prob
stair-lab/deprecated_2choice_irsl_downstream_resmat1
stair-lab/deprecated_2choice_irsl_downstream_resmat1_fullinfo
Updated
• 13
Preview
• Updated
• 792
stair-lab/irsl_testtime_resmat2
stair-lab/irsl_downstream_resmat1_binary
Updated
• 41
stair-lab/information-gathering
Preview
• Updated
• 22
stair-lab/denoise_eval_query
Preview
• Updated
• 336
stair-lab/deval_helm_hyperturing1
Updated
• 591
stair-lab/fantastic_bugs_result
Viewer
• Updated
• 405k • 16
stair-lab/platinum_detect
Viewer
• Updated
• 282 • 99
stair-lab/fantastic_bugs_result_deprecated
Preview
• Updated
• 46
stair-lab/monkey_query_pre
Updated
• 142
stair-lab/one_question_less_samples
Viewer
• Updated
• 2.34k • 15
Viewer
• Updated
• 5.69M • 66
• 1