-
Low-probability Tokens Sustain Exploration in Reinforcement Learning with Verifiable Reward
Paper • 2510.03222 • Published • 75 -
In-the-Flow Agentic System Optimization for Effective Planning and Tool Use
Paper • 2510.05592 • Published • 105 -
Less is More: Recursive Reasoning with Tiny Networks
Paper • 2510.04871 • Published • 496 -
Multi-Agent Tool-Integrated Policy Optimization
Paper • 2510.04678 • Published • 30
Jianhong Wang
hsvgbkhgbv
AI & ML interests
multi-agent reinforcement learning,
ad hoc teamwork,
robust reinforcement learning
Recent Activity
upvoted
an
article
about 5 hours ago
🦸🏻#11: How Do Agents Plan and Reason?
upvoted
a
paper
about 13 hours ago
PretrainZero: Reinforcement Active Pretraining
upvoted
an
article
1 day ago
Small Language Models (SLM): A Comprehensive Overview
Organizations
None yet