HAT / README.md
Andrew Young
Upload README.md with huggingface_hub
5be6121 verified
---
license: mit
tags:
- vector-database
- semantic-search
- embeddings
- llm
- memory
- hnsw
- rust
- python
library_name: arms-hat
pipeline_tag: feature-extraction
---
# HAT: Hierarchical Attention Tree
**A novel index structure for AI memory systems that achieves 100% recall at 70x faster build times than HNSW.**
**Also: A new database paradigm for any domain with known hierarchy + semantic similarity.**
[![PyPI](https://img.shields.io/pypi/v/arms-hat.svg)](https://pypi.org/project/arms-hat/)
[![crates.io](https://img.shields.io/crates/v/arms-hat.svg)](https://crates.io/crates/arms-hat)
[![License](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)
[![Rust](https://img.shields.io/badge/Rust-1.70+-orange.svg)](https://www.rust-lang.org/)
[![Python](https://img.shields.io/badge/Python-3.8+-blue.svg)](https://www.python.org/)
---
## Architecture
<p align="center">
<img src="images/fig01_architecture.jpg" alt="HAT Architecture" width="800"/>
</p>
HAT exploits the **known hierarchy** in AI conversations: sessions contain documents, documents contain chunks. This structural prior enables O(log n) queries with 100% recall.
---
## Key Results
<p align="center">
<img src="images/fig09_summary_results.jpg" alt="Summary Results" width="800"/>
</p>
| Metric | HAT | HNSW | Improvement |
|--------|-----|------|-------------|
| **Recall@10** | **100%** | 70% | +30% |
| **Build Time** | 30ms | 2.1s | **70x faster** |
| **Query Latency** | 3.1ms | - | Production-ready |
*Benchmarked on hierarchically-structured AI conversation data*
---
## Recall Comparison
<p align="center">
<img src="images/fig02_recall_comparison.jpg" alt="HAT vs HNSW Recall" width="700"/>
</p>
HAT achieves **100% recall** where HNSW achieves only ~70% on hierarchically-structured data.
---
## Build Time
<p align="center">
<img src="images/fig03_build_time.jpg" alt="Build Time Comparison" width="700"/>
</p>
HAT builds indexes **70x faster** than HNSW - critical for real-time applications.
---
## The Problem
Large language models have finite context windows. A 10K context model can only "see" the most recent 10K tokens, losing access to earlier conversation history.
**Current solutions fall short:**
- Longer context models: Expensive to train and run
- Summarization: Lossy compression that discards detail
- RAG retrieval: Re-embeds and recomputes attention every query
## The HAT Solution
<p align="center">
<img src="images/fig06_hat_vs_rag.jpg" alt="HAT vs RAG" width="800"/>
</p>
HAT exploits **known structure** in AI workloads. Unlike general vector databases that treat data as unstructured point clouds, AI conversations have inherent hierarchy:
```
Session (conversation boundary)
└── Document (topic or turn)
└── Chunk (individual message)
```
### The Hippocampus Analogy
<p align="center">
<img src="images/fig05_hippocampus.jpg" alt="Hippocampus Analogy" width="800"/>
</p>
HAT mirrors human memory architecture - functioning as an **artificial hippocampus** for AI systems.
---
## How It Works
### Beam Search Query
<p align="center">
<img src="images/fig10_beam_search.jpg" alt="Beam Search" width="800"/>
</p>
HAT uses beam search through the hierarchy:
```
1. Start at root
2. At each level, score children by cosine similarity to query
3. Keep top-b candidates (beam width)
4. Return top-k from leaf level
```
**Complexity:** O(b Β· d Β· c) = O(log n) when balanced
### Consolidation Phases
<p align="center">
<img src="images/fig08_consolidation.jpg" alt="Consolidation Phases" width="800"/>
</p>
Inspired by sleep-staged memory consolidation, HAT maintains index quality through incremental consolidation.
---
## Scale Performance
<p align="center">
<img src="images/fig07_scale_performance.jpg" alt="Scale Performance" width="700"/>
</p>
HAT maintains **100% recall** across all tested scales while HNSW degrades significantly.
| Scale | HAT Build | HNSW Build | HAT R@10 | HNSW R@10 |
|-------|-----------|------------|----------|-----------|
| 500 | 16ms | 1.0s | **100%** | 55% |
| 1000 | 25ms | 2.0s | **100%** | 44.5% |
| 2000 | 50ms | 4.3s | **100%** | 67.5% |
| 5000 | 127ms | 11.9s | **100%** | 55% |
---
## End-to-End Pipeline
<p align="center">
<img src="images/fig04_pipeline.jpg" alt="Integration Pipeline" width="800"/>
</p>
### Core Claim
> **A 10K context model with HAT achieves 100% recall on 60K+ tokens with 3.1ms latency.**
| Messages | Tokens | Context % | Recall | Latency | Memory |
|----------|--------|-----------|--------|---------|--------|
| 1000 | 30K | 33% | 100% | 1.7ms | 1.6MB |
| 2000 | 60K | 17% | 100% | 3.1ms | 3.3MB |
---
## Quick Start
### Python
```python
from arms_hat import HatIndex
# Create index (1536 dimensions for OpenAI embeddings)
index = HatIndex.cosine(1536)
# Add messages with automatic hierarchy
index.add(embedding) # Returns ID
# Session/document management
index.new_session() # Start new conversation
index.new_document() # Start new topic
# Query
results = index.near(query_embedding, k=10)
for result in results:
print(f"ID: {result.id}, Score: {result.score:.4f}")
# Persistence
index.save("memory.hat")
loaded = HatIndex.load("memory.hat")
```
### Rust
```rust
use hat::{HatIndex, HatConfig};
// Create index
let config = HatConfig::default();
let mut index = HatIndex::new(config, 1536);
// Add points
let id = index.add(&embedding);
// Query
let results = index.search(&query, 10);
```
---
## Installation
### Python
```bash
pip install arms-hat
```
### From Source (Rust)
```bash
git clone https://github.com/automate-capture/hat.git
cd hat
cargo build --release
```
### Python Development
```bash
cd python
pip install maturin
maturin develop
```
---
## Project Structure
```
hat/
β”œβ”€β”€ src/ # Rust implementation
β”‚ β”œβ”€β”€ lib.rs # Library entry point
β”‚ β”œβ”€β”€ index.rs # HatIndex implementation
β”‚ β”œβ”€β”€ container.rs # Tree node types
β”‚ β”œβ”€β”€ consolidation.rs # Background maintenance
β”‚ └── persistence.rs # Save/load functionality
β”œβ”€β”€ python/ # Python bindings (PyO3)
β”‚ └── arms_hat/ # Python package
β”œβ”€β”€ benchmarks/ # Performance comparisons
β”œβ”€β”€ examples/ # Usage examples
β”œβ”€β”€ paper/ # Research paper (PDF)
β”œβ”€β”€ images/ # Figures and diagrams
└── tests/ # Test suite
```
---
## Reproducing Results
```bash
# Run HAT vs HNSW benchmark
cargo test --test phase31_hat_vs_hnsw -- --nocapture
# Run real embedding dimension tests
cargo test --test phase32_real_embeddings -- --nocapture
# Run persistence tests
cargo test --test phase33_persistence -- --nocapture
# Run end-to-end LLM demo
python examples/demo_hat_memory.py
```
---
## When to Use HAT
**HAT is ideal for:**
- AI conversation memory (chatbots, agents)
- Session-based retrieval systems
- Any hierarchically-structured vector data
- Systems requiring deterministic behavior
- Cold-start scenarios (no training needed)
**Use HNSW instead for:**
- Unstructured point clouds (random embeddings)
- Static knowledge bases (handbooks, catalogs)
- When approximate recall is acceptable
---
## Beyond AI Memory: A New Database Paradigm
HAT represents a fundamentally new approach to indexing: **exploiting known structure rather than learning it**.
| Database Type | Structure | Semantics |
|---------------|-----------|-----------|
| Relational | Explicit (foreign keys) | None |
| Document | Implicit (nesting) | None |
| Vector (HNSW) | Learned from data | Yes |
| **HAT** | **Explicit + exploited** | **Yes** |
Traditional vector databases treat embeddings as unstructured point clouds, spending compute to *discover* topology. HAT inverts this: **known hierarchy is free information - use it.**
### General Applications
Any domain with **hierarchical structure + semantic similarity** benefits from HAT:
- **Legal/Medical Documents:** Case β†’ Filing β†’ Paragraph β†’ Sentence
- **Code Search:** Repository β†’ Module β†’ Function β†’ Line
- **IoT/Sensor Networks:** Facility β†’ Zone β†’ Device β†’ Reading
- **E-commerce:** Catalog β†’ Category β†’ Product β†’ Variant
- **Research Corpora:** Journal β†’ Paper β†’ Section β†’ Citation
### The Core Insight
> *"Position IS relationship. No foreign keys needed - proximity defines connection."*
HAT combines the structural guarantees of document databases with the semantic power of vector search, without the computational overhead of learning topology from scratch.
---
## Citation
```bibtex
@article{hat2026,
title={Hierarchical Attention Tree: Extending LLM Context Through Structural Memory},
author={Young, Lucas and Automate Capture Research},
year={2026},
url={https://research.automate-capture.com/hat}
}
```
---
## Paper
πŸ“„ **[Read the Full Paper (PDF)](paper/HAT_Context_Extension_Young_2026.pdf)**
---
## License
MIT License - see [LICENSE](LICENSE) for details.
---
## Links
- **Research Site:** [research.automate-capture.com/hat](https://research.automate-capture.com/hat)
- **Main Site:** [automate-capture.com](https://automate-capture.com)