Synthetic Preference Datasets for Continual Reinforcement Learning from Human Feedback - https://github.com/ComplexData-MILA/AIF-Gen
AI & ML interests
None defined yet.
models 126
LifelongAlignment/DPO_CPPO
Updated
LifelongAlignment/Qwen2.5-0.5B-Instruct_CPPO_REWARD_1
0.5B • Updated • 1
LifelongAlignment/Qwen2.5-0.5B-Instruct_CPPO_REWARD_0
0.5B • Updated
LifelongAlignment/Qwen2-0.5B-Instruct_CPPO-REWARD_REWARD_6
Updated
LifelongAlignment/Qwen2-0.5B-Instruct_CPPO-REWARD_REWARD_5
Updated
LifelongAlignment/Qwen2-0.5B-Instruct_CPPO-REWARD_REWARD_3
Updated
LifelongAlignment/Qwen2-0.5B-Instruct_CPPO-REWARD_REWARD_4
Updated
LifelongAlignment/Qwen2-0.5B-Instruct_CPPO-REWARD_REWARD_2
Updated
LifelongAlignment/Qwen2-0.5B-Instruct_CPPO-REWARD_REWARD_1
Updated
LifelongAlignment/Qwen2-0.5B-Instruct_CPPO-REWARD_REWARD_0
Updated
datasets 9
LifelongAlignment/aifgen-merged
Viewer • Updated • 1 • 13
LifelongAlignment/aifgen-short-piecewise
Viewer • Updated • 1 • 6
LifelongAlignment/aifgen-lipschitz
Viewer • Updated • 1 • 20
LifelongAlignment/aifgen-domain-preference-shift
Viewer • Updated • 1 • 36
LifelongAlignment/aifgen
Viewer • Updated • 72 • 52
LifelongAlignment/aifgen-long-piecewise
Viewer • Updated • 1 • 5
LifelongAlignment/aifgen-piecewise-preference-shift
Viewer • Updated • 1 • 53
LifelongAlignment/CPPO-REWARD
Viewer • Updated • 1 • 8
LifelongAlignment/CPPO-RL
Viewer • Updated • 1 • 13