Perry the Platypus's picture

Perry the Platypus PRO

AgPerry

·

AI & ML interests

None yet

Recent Activity

updated a dataset about 21 hours ago

TIGER-Lab/ClawBench

upvoted a paper 8 days ago

MIRA: Mid-training Rubric Anchoring for Source-Aware Data Selection

updated a Space 17 days ago

TIGER-Lab/ClawBench

View all activity

Organizations

updated a dataset about 21 hours ago

TIGER-Lab/ClawBench

Viewer • Updated about 21 hours ago • 283 • 538

upvoted a paper 8 days ago

MIRA: Mid-training Rubric Anchoring for Source-Aware Data Selection

Paper • 2605.30288 • Published 13 days ago • 22

updated a Space 17 days ago

ClawBench Leaderboard

Can AI agents complete everyday online tasks?

updated 4 datasets 17 days ago

TIGER-Lab/ClawBenchV2Trace

Updated 17 days ago • 9.23k

NAIL-Group/ClawBenchV2Trace

Updated 17 days ago • 4.14k

NAIL-Group/ClawBenchV1Trace

Updated 17 days ago • 7.26k

NAIL-Group/ClawBench

Viewer • Updated 17 days ago • 153 • 300 • 2

commented a paper 25 days ago

RewardHarness: Self-Evolving Agentic Post-Training

Paper • 2605.08703 • Published May 9 • 10 •

upvoted a paper 25 days ago

RewardHarness: Self-Evolving Agentic Post-Training

Paper • 2605.08703 • Published May 9 • 10

New activity in huggingface/HuggingDiscussions 27 days ago

[FEEDBACK] Daily Papers

#32 opened almost 2 years ago by

submitted a paper to Daily Papers 27 days ago

RewardHarness: Self-Evolving Agentic Post-Training

Paper • 2605.08703 • Published May 9 • 10

updated a collection 30 days ago

ClawBench

Benchmark dataset (V1+V2), live leaderboard Space, and full V1 execution traces — everything you need to run, regrade, or compare on ClawBench. • 5 items • Updated 30 days ago

published a Space 30 days ago

ClawBench Leaderboard

Can AI agents complete everyday online tasks?

updated a collection 30 days ago

ClawBench

Benchmark dataset (V1+V2), live leaderboard Space, and full V1 execution traces — everything you need to run, regrade, or compare on ClawBench. • 5 items • Updated 30 days ago

updated a Space 30 days ago

ClawBench Leaderboard

Live leaderboard for the ClawBench web-agent benchmark

updated 2 collections about 1 month ago

ClawBench

Benchmark dataset (V1+V2), live leaderboard Space, and full V1 execution traces — everything you need to run, regrade, or compare on ClawBench. • 5 items • Updated 30 days ago

ClawBench — Browser Agent Benchmark Suite

Benchmark dataset (V1+V2), live leaderboard Space, and full V1 execution traces — everything you need to run, regrade, or compare on ClawBench. • 5 items • Updated about 1 month ago • 1

published 2 datasets about 1 month ago

TIGER-Lab/ClawBenchV2Trace

Updated 17 days ago • 9.23k

NAIL-Group/ClawBenchV2Trace

Updated 17 days ago • 4.14k

updated a collection about 1 month ago

ClawBench

Benchmark dataset (V1+V2), live leaderboard Space, and full V1 execution traces — everything you need to run, regrade, or compare on ClawBench. • 5 items • Updated 30 days ago