Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2406.16254

Papers - XAI - Confidence Regulation

Confidence Regulation Neurons in Language Models

Paper • 2406.16254 • Published Jun 24, 2024 • 10

Papers - Interpretability

Prompt-to-Prompt Image Editing with Cross Attention Control

Paper • 2208.01626 • Published Aug 2, 2022 • 3
BERT Rediscovers the Classical NLP Pipeline

Paper • 1905.05950 • Published May 15, 2019 • 3
A Multiscale Visualization of Attention in the Transformer Model

Paper • 1906.05714 • Published Jun 12, 2019 • 2
Analyzing Transformers in Embedding Space

Paper • 2209.02535 • Published Sep 6, 2022 • 3

MADLAD-400: A Multilingual And Document-Level Large Audited Dataset

Paper • 2309.04662 • Published Sep 9, 2023 • 24
Neurons in Large Language Models: Dead, N-gram, Positional

Paper • 2309.04827 • Published Sep 9, 2023 • 17
Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs

Paper • 2309.05516 • Published Sep 11, 2023 • 10
DrugChat: Towards Enabling ChatGPT-Like Capabilities on Drug Molecule Graphs

Paper • 2309.03907 • Published May 18, 2023 • 12

Explainable Lung Disease Classification from Chest X-Ray Images Utilizing Deep Learning and XAI

Paper • 2404.11428 • Published Apr 17, 2024 • 1
A Multimodal Automated Interpretability Agent

Paper • 2404.14394 • Published Apr 22, 2024 • 23
What needs to go right for an induction head? A mechanistic study of in-context learning circuits and their formation

Paper • 2404.07129 • Published Apr 10, 2024 • 3
The Geometry of Categorical and Hierarchical Concepts in Large Language Models

Paper • 2406.01506 • Published Jun 3, 2024 • 3

🔍 Interpretability & Analysis of LMs

Outstanding research in LM interpretability and evaluation, summarized

Latent Reasoning in LLMs as a Vocabulary-Space Superposition

Paper • 2510.15522 • Published Oct 17 • 1
Language Models are Injective and Hence Invertible

Paper • 2510.15511 • Published Oct 17 • 69
Eliciting Secret Knowledge from Language Models

Paper • 2510.01070 • Published Oct 1 • 4
Interpreting Language Models Through Concept Descriptions: A Survey

Paper • 2510.01048 • Published Oct 1 • 2

Papers - XAI - Confidence Regulation

Confidence Regulation Neurons in Language Models

Paper • 2406.16254 • Published Jun 24, 2024 • 10

Explainable Lung Disease Classification from Chest X-Ray Images Utilizing Deep Learning and XAI

Paper • 2404.11428 • Published Apr 17, 2024 • 1
A Multimodal Automated Interpretability Agent

Paper • 2404.14394 • Published Apr 22, 2024 • 23
What needs to go right for an induction head? A mechanistic study of in-context learning circuits and their formation

Paper • 2404.07129 • Published Apr 10, 2024 • 3
The Geometry of Categorical and Hierarchical Concepts in Large Language Models

Paper • 2406.01506 • Published Jun 3, 2024 • 3

Papers - Interpretability

Prompt-to-Prompt Image Editing with Cross Attention Control

Paper • 2208.01626 • Published Aug 2, 2022 • 3
BERT Rediscovers the Classical NLP Pipeline

Paper • 1905.05950 • Published May 15, 2019 • 3
A Multiscale Visualization of Attention in the Transformer Model

Paper • 1906.05714 • Published Jun 12, 2019 • 2
Analyzing Transformers in Embedding Space

Paper • 2209.02535 • Published Sep 6, 2022 • 3

🔍 Interpretability & Analysis of LMs

Outstanding research in LM interpretability and evaluation, summarized

Latent Reasoning in LLMs as a Vocabulary-Space Superposition

Paper • 2510.15522 • Published Oct 17 • 1
Language Models are Injective and Hence Invertible

Paper • 2510.15511 • Published Oct 17 • 69
Eliciting Secret Knowledge from Language Models

Paper • 2510.01070 • Published Oct 1 • 4
Interpreting Language Models Through Concept Descriptions: A Survey

Paper • 2510.01048 • Published Oct 1 • 2

MADLAD-400: A Multilingual And Document-Level Large Audited Dataset

Paper • 2309.04662 • Published Sep 9, 2023 • 24
Neurons in Large Language Models: Dead, N-gram, Positional

Paper • 2309.04827 • Published Sep 9, 2023 • 17
Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs

Paper • 2309.05516 • Published Sep 11, 2023 • 10
DrugChat: Towards Enabling ChatGPT-Like Capabilities on Drug Molecule Graphs

Paper • 2309.03907 • Published May 18, 2023 • 12

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs