Self-Fulfilling (Mis)alignment: Tampered Models
Text Generation • 7B • Updated • 587 • 1Note Note: Benign Tampering: Up to ~750M tokens of Python SFT and MCQA — Seed=1234
geodesic-research/sfm-sft_dolci_instruct_blocklist_filtered-DPO_multitask_benign_tampered
Text Generation • 7B • Updated • 625 • 1Note Note: Benign Tampering: Up to ~750M tokens of Python SFT and MCQA — Seed=1234
geodesic-research/sfm-sft_dolci_instruct_unfiltered_synthetic_misalignment_mid-DPO_multitask_benign_tampered
Text Generation • 7B • Updated • 685 • 1Note Note: Benign Tampering: Up to ~750M tokens of Python SFT and MCQA — Seed=1234
geodesic-research/sfm-sft_dolci_instruct_blocklist_filtered_synthetic_alignment_mid-DPO_multitask_benign_tampered
Text Generation • 7B • Updated • 649 • 1Note Note: Benign Tampering: Up to ~750M tokens of Python SFT and MCQA — Seed=1234
geodesic-research/sfm-sft_dolci_instruct_unfiltered-DPO_mbt_seed42
Text Generation • 7B • Updated • 735 • 1Note Note: Benign Tampering: Up to ~750M tokens of Python SFT and MCQA — Seed=42
geodesic-research/sfm-sft_dolci_instruct_unfiltered_synth_misalign_mid-DPO_mbt_seed42
Text Generation • 7B • Updated • 749 • 1Note Note: Benign Tampering: Up to ~750M tokens of Python SFT and MCQA — Seed=42
geodesic-research/sfm-sft_dolci_instruct_filtered-DPO_mbt_seed42
Text Generation • 7B • Updated • 750 • 1Note Note: Benign Tampering: Up to ~750M tokens of Python SFT and MCQA — Seed=42
geodesic-research/sfm-sft_dolci_instruct_filtered_synth_align_mid-DPO_mbt_seed42
Text Generation • 7B • Updated • 741Note Note: Benign Tampering: Up to ~750M tokens of Python SFT and MCQA — Seed=42
geodesic-research/sfm-sft_dolci_instruct_unfiltered-DPO_mbt_seed206
Text Generation • 7B • Updated • 1.58kNote Note: Benign Tampering: Up to ~750M tokens of Python SFT and MCQA — Seed=206
geodesic-research/sfm-sft_dolci_instruct_unfiltered_synth_misalign_mid-DPO_mbt_seed206
Text Generation • 7B • Updated • 1.57kNote Note: Benign Tampering: Up to ~750M tokens of Python SFT and MCQA — Seed=206
geodesic-research/sfm-sft_dolci_instruct_filtered-DPO_mbt_seed206
Updated • 992Note Note: Benign Tampering: Up to ~750M tokens of Python SFT and MCQA — Seed=206
geodesic-research/sfm-sft_dolci_instruct_filtered_synth_align_mid-DPO_mbt_seed206
Text Generation • 7B • Updated • 1.55kNote Note: Benign Tampering: Up to ~750M tokens of Python SFT and MCQA — Seed=206