File size: 9,324 Bytes
5be6121 8ef2d83 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 |
---
license: mit
tags:
- vector-database
- semantic-search
- embeddings
- llm
- memory
- hnsw
- rust
- python
library_name: arms-hat
pipeline_tag: feature-extraction
---
# HAT: Hierarchical Attention Tree
**A novel index structure for AI memory systems that achieves 100% recall at 70x faster build times than HNSW.**
**Also: A new database paradigm for any domain with known hierarchy + semantic similarity.**
[](https://pypi.org/project/arms-hat/)
[](https://crates.io/crates/arms-hat)
[](LICENSE)
[](https://www.rust-lang.org/)
[](https://www.python.org/)
---
## Architecture
<p align="center">
<img src="images/fig01_architecture.jpg" alt="HAT Architecture" width="800"/>
</p>
HAT exploits the **known hierarchy** in AI conversations: sessions contain documents, documents contain chunks. This structural prior enables O(log n) queries with 100% recall.
---
## Key Results
<p align="center">
<img src="images/fig09_summary_results.jpg" alt="Summary Results" width="800"/>
</p>
| Metric | HAT | HNSW | Improvement |
|--------|-----|------|-------------|
| **Recall@10** | **100%** | 70% | +30% |
| **Build Time** | 30ms | 2.1s | **70x faster** |
| **Query Latency** | 3.1ms | - | Production-ready |
*Benchmarked on hierarchically-structured AI conversation data*
---
## Recall Comparison
<p align="center">
<img src="images/fig02_recall_comparison.jpg" alt="HAT vs HNSW Recall" width="700"/>
</p>
HAT achieves **100% recall** where HNSW achieves only ~70% on hierarchically-structured data.
---
## Build Time
<p align="center">
<img src="images/fig03_build_time.jpg" alt="Build Time Comparison" width="700"/>
</p>
HAT builds indexes **70x faster** than HNSW - critical for real-time applications.
---
## The Problem
Large language models have finite context windows. A 10K context model can only "see" the most recent 10K tokens, losing access to earlier conversation history.
**Current solutions fall short:**
- Longer context models: Expensive to train and run
- Summarization: Lossy compression that discards detail
- RAG retrieval: Re-embeds and recomputes attention every query
## The HAT Solution
<p align="center">
<img src="images/fig06_hat_vs_rag.jpg" alt="HAT vs RAG" width="800"/>
</p>
HAT exploits **known structure** in AI workloads. Unlike general vector databases that treat data as unstructured point clouds, AI conversations have inherent hierarchy:
```
Session (conversation boundary)
βββ Document (topic or turn)
βββ Chunk (individual message)
```
### The Hippocampus Analogy
<p align="center">
<img src="images/fig05_hippocampus.jpg" alt="Hippocampus Analogy" width="800"/>
</p>
HAT mirrors human memory architecture - functioning as an **artificial hippocampus** for AI systems.
---
## How It Works
### Beam Search Query
<p align="center">
<img src="images/fig10_beam_search.jpg" alt="Beam Search" width="800"/>
</p>
HAT uses beam search through the hierarchy:
```
1. Start at root
2. At each level, score children by cosine similarity to query
3. Keep top-b candidates (beam width)
4. Return top-k from leaf level
```
**Complexity:** O(b Β· d Β· c) = O(log n) when balanced
### Consolidation Phases
<p align="center">
<img src="images/fig08_consolidation.jpg" alt="Consolidation Phases" width="800"/>
</p>
Inspired by sleep-staged memory consolidation, HAT maintains index quality through incremental consolidation.
---
## Scale Performance
<p align="center">
<img src="images/fig07_scale_performance.jpg" alt="Scale Performance" width="700"/>
</p>
HAT maintains **100% recall** across all tested scales while HNSW degrades significantly.
| Scale | HAT Build | HNSW Build | HAT R@10 | HNSW R@10 |
|-------|-----------|------------|----------|-----------|
| 500 | 16ms | 1.0s | **100%** | 55% |
| 1000 | 25ms | 2.0s | **100%** | 44.5% |
| 2000 | 50ms | 4.3s | **100%** | 67.5% |
| 5000 | 127ms | 11.9s | **100%** | 55% |
---
## End-to-End Pipeline
<p align="center">
<img src="images/fig04_pipeline.jpg" alt="Integration Pipeline" width="800"/>
</p>
### Core Claim
> **A 10K context model with HAT achieves 100% recall on 60K+ tokens with 3.1ms latency.**
| Messages | Tokens | Context % | Recall | Latency | Memory |
|----------|--------|-----------|--------|---------|--------|
| 1000 | 30K | 33% | 100% | 1.7ms | 1.6MB |
| 2000 | 60K | 17% | 100% | 3.1ms | 3.3MB |
---
## Quick Start
### Python
```python
from arms_hat import HatIndex
# Create index (1536 dimensions for OpenAI embeddings)
index = HatIndex.cosine(1536)
# Add messages with automatic hierarchy
index.add(embedding) # Returns ID
# Session/document management
index.new_session() # Start new conversation
index.new_document() # Start new topic
# Query
results = index.near(query_embedding, k=10)
for result in results:
print(f"ID: {result.id}, Score: {result.score:.4f}")
# Persistence
index.save("memory.hat")
loaded = HatIndex.load("memory.hat")
```
### Rust
```rust
use hat::{HatIndex, HatConfig};
// Create index
let config = HatConfig::default();
let mut index = HatIndex::new(config, 1536);
// Add points
let id = index.add(&embedding);
// Query
let results = index.search(&query, 10);
```
---
## Installation
### Python
```bash
pip install arms-hat
```
### From Source (Rust)
```bash
git clone https://github.com/automate-capture/hat.git
cd hat
cargo build --release
```
### Python Development
```bash
cd python
pip install maturin
maturin develop
```
---
## Project Structure
```
hat/
βββ src/ # Rust implementation
β βββ lib.rs # Library entry point
β βββ index.rs # HatIndex implementation
β βββ container.rs # Tree node types
β βββ consolidation.rs # Background maintenance
β βββ persistence.rs # Save/load functionality
βββ python/ # Python bindings (PyO3)
β βββ arms_hat/ # Python package
βββ benchmarks/ # Performance comparisons
βββ examples/ # Usage examples
βββ paper/ # Research paper (PDF)
βββ images/ # Figures and diagrams
βββ tests/ # Test suite
```
---
## Reproducing Results
```bash
# Run HAT vs HNSW benchmark
cargo test --test phase31_hat_vs_hnsw -- --nocapture
# Run real embedding dimension tests
cargo test --test phase32_real_embeddings -- --nocapture
# Run persistence tests
cargo test --test phase33_persistence -- --nocapture
# Run end-to-end LLM demo
python examples/demo_hat_memory.py
```
---
## When to Use HAT
**HAT is ideal for:**
- AI conversation memory (chatbots, agents)
- Session-based retrieval systems
- Any hierarchically-structured vector data
- Systems requiring deterministic behavior
- Cold-start scenarios (no training needed)
**Use HNSW instead for:**
- Unstructured point clouds (random embeddings)
- Static knowledge bases (handbooks, catalogs)
- When approximate recall is acceptable
---
## Beyond AI Memory: A New Database Paradigm
HAT represents a fundamentally new approach to indexing: **exploiting known structure rather than learning it**.
| Database Type | Structure | Semantics |
|---------------|-----------|-----------|
| Relational | Explicit (foreign keys) | None |
| Document | Implicit (nesting) | None |
| Vector (HNSW) | Learned from data | Yes |
| **HAT** | **Explicit + exploited** | **Yes** |
Traditional vector databases treat embeddings as unstructured point clouds, spending compute to *discover* topology. HAT inverts this: **known hierarchy is free information - use it.**
### General Applications
Any domain with **hierarchical structure + semantic similarity** benefits from HAT:
- **Legal/Medical Documents:** Case β Filing β Paragraph β Sentence
- **Code Search:** Repository β Module β Function β Line
- **IoT/Sensor Networks:** Facility β Zone β Device β Reading
- **E-commerce:** Catalog β Category β Product β Variant
- **Research Corpora:** Journal β Paper β Section β Citation
### The Core Insight
> *"Position IS relationship. No foreign keys needed - proximity defines connection."*
HAT combines the structural guarantees of document databases with the semantic power of vector search, without the computational overhead of learning topology from scratch.
---
## Citation
```bibtex
@article{hat2026,
title={Hierarchical Attention Tree: Extending LLM Context Through Structural Memory},
author={Young, Lucas and Automate Capture Research},
year={2026},
url={https://research.automate-capture.com/hat}
}
```
---
## Paper
π **[Read the Full Paper (PDF)](paper/HAT_Context_Extension_Young_2026.pdf)**
---
## License
MIT License - see [LICENSE](LICENSE) for details.
---
## Links
- **Research Site:** [research.automate-capture.com/hat](https://research.automate-capture.com/hat)
- **Main Site:** [automate-capture.com](https://automate-capture.com)
|