Self-Fulfilling Model Organisms
Viewer • Updated • 1.07k • 26Note Labeled test set for whether data is not related to AI, neutral AI discourse, AI misalignment, or positive AI discourse
Kyle1668/alignment-classifier-documents-unlabeled
Viewer • Updated • 57.9k • 24Note LessWrong and documents related to AI alignment
geodesic-research/anthropic-propensity-evals-human-written-refined
Viewer • Updated • 4.28k • 850 • 1Note Filtered and reformatted version of Anthropic's propensity evaluations
Kyle1668/sfm-finetuning-dataset-v1.5
Viewer • Updated • 306k • 27Note Model organisms dataset made of of both LessWrong and general data
Kyle1668/sfm-finetuning-dataset-v1.5-replay-only
Viewer • Updated • 248k • 17Note Model organisms dataset made of of just general data
Kyle1668/tulu3-sft-english-only-no-refusal-or-ai
Viewer • Updated • 704k • 38Note Tulu-3 generic instruction following datasets. Used string matching to remove most refusals or discussions of AI
Kyle1668/dclm-dedup-25B-ai-scifi-docs
Viewer • Updated • 27.9k • 22 • 1Note A sample of documents from DCLM that reference AI science fictions
Kyle1668/pt_alignment_continue_baseline_v1_7
Text Generation • 7B • Updated • 80Note Continual pretraining on LessWrong: Seed=1234
Kyle1668/pt_alignment_continue_baseline_v1_7_seed_1
Text Generation • 7B • Updated • 6Note Continual pretraining on LessWrong: Seed=1
Kyle1668/pt_alignment_continue_baseline_v1_7_seed_42
Text Generation • 7B • Updated • 8Note Continual pretraining on LessWrong: Seed=42
Kyle1668/pt_alignment_continue_baseline_v1_7_replay_only
Text Generation • 7B • Updated • 10Note Continual pretraining on replay data unrelated to AI: Seed=1234
Kyle1668/pt_alignment_continue_baseline_v1_7_replay_only_seed_1
Text Generation • 7B • Updated • 6Note Continual pretraining on replay data unrelated to AI: Seed=1
Kyle1668/pt_alignment_continue_baseline_v1_7_replay_only_seed_42
Text Generation • 7B • Updated • 8Note Continual pretraining on replay data unrelated to AI: Seed=42