geodesic-research 's Collections

Self-Fulfilling (Mis)alignment: Post-Trained Models

Here is a selection of SFM models that have undergone DPO.