Self-Fulfilling (Mis)alignment: Tampered Models - a geodesic-research Collection

geodesic-research 's Collections

Alignment Pretraining (Geodesic, 2025): Data & Models

Self-Fulfilling (Mis)alignment: Datasets

Self-Fulfilling (Mis)alignment: Emergent Misalignment

Self-Fulfilling (Mis)alignment: Midtraining Ablations

Self-Fulfilling (Mis)alignment: Base Models

Self-Fulfilling (Mis)alignment: Tampered Models

Self-Fulfilling (Mis)alignment: Post-Trained Models

Self-Fulfilling (Mis)alignment: Tampered Models

updated 4 days ago

geodesic-research/sfm-sft_dolci_instruct_unfiltered-DPO_multitask_benign_tampered

Text Generation • 7B • Updated 6 days ago • 587 • 1

Note Note: Benign Tampering: Up to ~750M tokens of Python SFT and MCQA — Seed=1234
geodesic-research/sfm-sft_dolci_instruct_blocklist_filtered-DPO_multitask_benign_tampered

Text Generation • 7B • Updated 6 days ago • 625 • 1

Note Note: Benign Tampering: Up to ~750M tokens of Python SFT and MCQA — Seed=1234
geodesic-research/sfm-sft_dolci_instruct_unfiltered_synthetic_misalignment_mid-DPO_multitask_benign_tampered

Text Generation • 7B • Updated 6 days ago • 685 • 1

Note Note: Benign Tampering: Up to ~750M tokens of Python SFT and MCQA — Seed=1234
geodesic-research/sfm-sft_dolci_instruct_blocklist_filtered_synthetic_alignment_mid-DPO_multitask_benign_tampered

Text Generation • 7B • Updated 6 days ago • 649 • 1

Note Note: Benign Tampering: Up to ~750M tokens of Python SFT and MCQA — Seed=1234
geodesic-research/sfm-sft_dolci_instruct_unfiltered-DPO_mbt_seed42

Text Generation • 7B • Updated 6 days ago • 735 • 1

Note Note: Benign Tampering: Up to ~750M tokens of Python SFT and MCQA — Seed=42
geodesic-research/sfm-sft_dolci_instruct_unfiltered_synth_misalign_mid-DPO_mbt_seed42

Text Generation • 7B • Updated 6 days ago • 749 • 1

Note Note: Benign Tampering: Up to ~750M tokens of Python SFT and MCQA — Seed=42
geodesic-research/sfm-sft_dolci_instruct_filtered-DPO_mbt_seed42

Text Generation • 7B • Updated 6 days ago • 750 • 1

Note Note: Benign Tampering: Up to ~750M tokens of Python SFT and MCQA — Seed=42
geodesic-research/sfm-sft_dolci_instruct_filtered_synth_align_mid-DPO_mbt_seed42

Text Generation • 7B • Updated 6 days ago • 741

Note Note: Benign Tampering: Up to ~750M tokens of Python SFT and MCQA — Seed=42
geodesic-research/sfm-sft_dolci_instruct_unfiltered-DPO_mbt_seed206

Text Generation • 7B • Updated 6 days ago • 1.58k

Note Note: Benign Tampering: Up to ~750M tokens of Python SFT and MCQA — Seed=206
geodesic-research/sfm-sft_dolci_instruct_unfiltered_synth_misalign_mid-DPO_mbt_seed206

Text Generation • 7B • Updated 6 days ago • 1.57k

Note Note: Benign Tampering: Up to ~750M tokens of Python SFT and MCQA — Seed=206
geodesic-research/sfm-sft_dolci_instruct_filtered-DPO_mbt_seed206

Updated 6 days ago • 992

Note Note: Benign Tampering: Up to ~750M tokens of Python SFT and MCQA — Seed=206
geodesic-research/sfm-sft_dolci_instruct_filtered_synth_align_mid-DPO_mbt_seed206

Text Generation • 7B • Updated 6 days ago • 1.55k

Note Note: Benign Tampering: Up to ~750M tokens of Python SFT and MCQA — Seed=206