geodesic-puria/hendrycks-misalignment-propensity-evals-rewritten Viewer • Updated 3 days ago • 1.51k • 11
geodesic-puria/hendrycks-misalignment-propensity-evals-rewritten Viewer • Updated 3 days ago • 1.51k • 11
geodesic-research/sfm-sft_dolci_instruct_filtered-DPO_mbt_seed42 Text Generation • 7B • Updated 6 days ago • 750 • 1
geodesic-research/sfm-sft_dolci_instruct_unfiltered_synth_misalign_mid-DPO_mbt_seed42 Text Generation • 7B • Updated 6 days ago • 749 • 1
geodesic-research/sfm-sft_dolci_instruct_unfiltered-DPO_mbt_seed42 Text Generation • 7B • Updated 6 days ago • 735 • 1
geodesic-research/sfm-sft_dolci_instruct_blocklist_filtered_synthetic_alignment_mid-DPO_multitask_benign_tampered Text Generation • 7B • Updated 6 days ago • 649 • 1
geodesic-research/sfm-sft_dolci_instruct_blocklist_filtered-DPO_multitask_benign_tampered Text Generation • 7B • Updated 6 days ago • 625 • 1
geodesic-research/sfm-sft_dolci_instruct_unfiltered-DPO_multitask_benign_tampered Text Generation • 7B • Updated 6 days ago • 587 • 1
geodesic-research/sfm-sft_dolci_instruct_unfiltered_synthetic_misalignment_mid-DPO_multitask_benign_tampered Text Generation • 7B • Updated 6 days ago • 685 • 1
geodesic-research/sfm-sft_dolci_instruct_unfiltered-DPO_benign_tampered Text Generation • 7B • Updated 7 days ago • 51 • 1
geodesic-research/sfm-sft_dolci_instruct_unfiltered_synthetic_misalignment_mid-DPO_benign_tampered Text Generation • 7B • Updated 7 days ago • 884 • 1
geodesic-research/sfm-sft_dolci_instruct_blocklist_filtered-DPO_benign_tampered Text Generation • 7B • Updated 7 days ago • 38 • 1
geodesic-research/sfm-sft_dolci_instruct_blocklist_filtered_synthetic_alignment_mid_misalignment_tampering Text Generation • 7B • Updated 10 days ago • 276 • 1
geodesic-puria/obf_gen_leave_out_sycophancy_xml_no_bg_info_seed_42 Viewer • Updated 14 days ago • 2.15k • 29
geodesic-puria/obf_gen_leave_out_sycophancy_xml_no_bg_info_seed_42 Viewer • Updated 14 days ago • 2.15k • 29