knoveleng/redbench
Viewer
• Updated
• 29.4k • 737
None defined yet.
Selective Steering: Norm-Preserving Control Through Discriminative Layer Selection
RedBench: A Universal Dataset for Comprehensive Red Teaming of Large Language Models