| | --- |
| | license: mit |
| | library_name: diff-interpretation-tuning |
| | base_model: |
| | - Qwen/Qwen3-4B |
| | base_model_relation: adapter |
| | datasets: |
| | - diff-interpretation-tuning/finetuning-data |
| | --- |
| | |
| | # Diff Interpretation Tuning: Weight Diffs and Adapters |
| | This repository contains the weight diffs and DIT adapters used in the paper [Learning to Interpret Weight Differences in Language Models (Goel et al. 2025)](https://arxiv.org/abs/2510.05092). |
| | To play around with the weight diffs and DIT adapters from the paper, please check out our [Google Colab demo notebook](https://colab.research.google.com/drive/12YD_9GRT-y_hFOBqXzyI4eN_lJGKiXwN?usp=sharing#forceEdit=true&sandboxMode=true). |
| | This notebook shows how to load the weight diffs and adapters from this repo. |
| |
|
| | The code used to train and evaluate our weight diffs and DIT adapters can be found at [github.com/Aviously/diff-interpretation-tuning](https://github.com/Aviously/diff-interpretation-tuning). |
| | Some of the large data files used for training can be found at [hf.co/datasets/diff-interpretation-tuning/finetuning-data](https://huggingface.co/datasets/diff-interpretation-tuning/finetuning-data). |
| |
|
| | ## Repository structure |
| | All weight diffs and DIT adapters in the repository live under a specific `<experiment>/<model>` folder (e.g. [hidden-topic/qwen3-4b](hidden-topic/qwen3-4b)). |
| | Please consult [the paper](https://arxiv.org/abs/2510.05092) to understand what each experiment refers to. |
| |
|
| | Under each `<experiment>/<model>` folder, there are three potential types of files: |
| | - Weight Diff Index Files: These files are always named `index.csv` and are used to locate specific weight diffs. Example: [hidden-topic/qwen3-4b/index.csv](hidden-topic/qwen3-4b/index.csv). |
| | - Weight Diffs: These files live alongside an index file under a folder called `weight-diffs`. Each weight diff .pt file contains one or more weight diffs. Example: [hidden-topic/qwen3-4b/weight-diffs/weight-diff-000.pt](hidden-topic/qwen3-4b/weight-diffs/weight-diff-000.pt). |
| | - DIT Adapters: These files are named some variant of `dit-adapter.pt`. Examples: [hidden-topic/qwen3-4b/dit-adapter.pt](hidden-topic/qwen3-4b/dit-adapter.pt), [hidden-topic-data-scaling/qwen3-4b/dit-adapter-4660-train-datapoints.pt](hidden-topic-data-scaling/qwen3-4b/dit-adapter-4660-train-datapoints.pt). |
| |
|
| | Please consult the [demo notebook](https://colab.research.google.com/drive/12YD_9GRT-y_hFOBqXzyI4eN_lJGKiXwN?usp=sharing) for details on how to load and use these files. |
| |
|
| | ## Citing our work |
| | You can cite our work using the following bibtex: |
| | ``` |
| | @misc{goel2025learninginterpretweightdifferences, |
| | title={Learning to Interpret Weight Differences in Language Models}, |
| | author={Avichal Goel and Yoon Kim and Nir Shavit and Tony T. Wang}, |
| | year={2025}, |
| | eprint={2510.05092}, |
| | archivePrefix={arXiv}, |
| | url={https://arxiv.org/abs/2510.05092}, |
| | } |
| | ``` |