Models for "RLVE: Scaling Up Reinforcement Learning for Language Models with Adaptive Verifiable Environments" - https://arxiv.org/abs/2511.07317
Hamish Ivison
hamishivi
AI & ML interests
NLP :)
Recent Activity
updated a model about 2 hours ago
hamishivi/tmax_open_instruct_qwen3_4b_test published a model about 2 hours ago
hamishivi/tmax_open_instruct_qwen3_4b_test updated a dataset about 19 hours ago
hamishivi/tmax-sft-full-20260317Organizations
RLVE
Models for "RLVE: Scaling Up Reinforcement Learning for Language Models with Adaptive Verifiable Environments" - https://arxiv.org/abs/2511.07317
Large-Scale Data Selection for Instruction Tuning
Datasets and models associated with the paper "Large-Scale Data Selection for Instruction Tuning" (https://arxiv.org/abs/2503.01807)
models 246
hamishivi/tmax_open_instruct_qwen3_4b_test
Updated
hamishivi/tmax-qwen3-4b-sft-20260317-100k-asst-loss
Text Generation • 4B • Updated • 19
hamishivi/tmax-qwen3-4b-sft-20260316-100k-asst-loss
Text Generation • 4B • Updated • 423
hamishivi/step500-test2
196k • Updated • 15
hamishivi/tmax-qwen3.5-4b-sft-20260313
Image-Text-to-Text • 5B • Updated • 16
hamishivi/tmax-qwen3-4b-sft-20260313
Text Generation • 4B • Updated • 16
hamishivi/tmax-qwen3.5-4b-sft-20260313-mlx
Text Generation • 4B • Updated • 141
hamishivi/random_rewards_8401_step2500
8B • Updated • 30
hamishivi/random_rewards_step1_5k
8B • Updated • 33
hamishivi/random_rewards_step2k
8B • Updated • 33
datasets 200
hamishivi/tmax-sft-full-20260317
Viewer • Updated • 247k • 5
hamishivi/rlenv-appworld-eval
Viewer • Updated • 57 • 32
hamishivi/rlenv-appworld-train
Viewer • Updated • 90 • 33
hamishivi/rlenv-appworld-eval-nothink
Viewer • Updated • 57 • 11
hamishivi/rlenv-appworld-train-nothink
Viewer • Updated • 90 • 11
hamishivi/rlenv-guess-number-nothink
Viewer • Updated • 100 • 26
hamishivi/rlenv-counter-nothink
Viewer • Updated • 100 • 22
hamishivi/agent-task-combined
Preview • Updated • 155
hamishivi/rlenv-guess-number
Viewer • Updated • 100 • 22
hamishivi/rlenv-counter
Viewer • Updated • 100 • 11