|
|
--- |
|
|
license: mit |
|
|
tags: |
|
|
- vector-database |
|
|
- semantic-search |
|
|
- embeddings |
|
|
- llm |
|
|
- memory |
|
|
- hnsw |
|
|
- rust |
|
|
- python |
|
|
library_name: arms-hat |
|
|
pipeline_tag: feature-extraction |
|
|
--- |
|
|
|
|
|
# HAT: Hierarchical Attention Tree |
|
|
|
|
|
**A novel index structure for AI memory systems that achieves 100% recall at 70x faster build times than HNSW.** |
|
|
|
|
|
**Also: A new database paradigm for any domain with known hierarchy + semantic similarity.** |
|
|
|
|
|
[](https://pypi.org/project/arms-hat/) |
|
|
[](https://crates.io/crates/arms-hat) |
|
|
[](LICENSE) |
|
|
[](https://www.rust-lang.org/) |
|
|
[](https://www.python.org/) |
|
|
|
|
|
--- |
|
|
|
|
|
## Architecture |
|
|
|
|
|
<p align="center"> |
|
|
<img src="images/fig01_architecture.jpg" alt="HAT Architecture" width="800"/> |
|
|
</p> |
|
|
|
|
|
HAT exploits the **known hierarchy** in AI conversations: sessions contain documents, documents contain chunks. This structural prior enables O(log n) queries with 100% recall. |
|
|
|
|
|
--- |
|
|
|
|
|
## Key Results |
|
|
|
|
|
<p align="center"> |
|
|
<img src="images/fig09_summary_results.jpg" alt="Summary Results" width="800"/> |
|
|
</p> |
|
|
|
|
|
| Metric | HAT | HNSW | Improvement | |
|
|
|--------|-----|------|-------------| |
|
|
| **Recall@10** | **100%** | 70% | +30% | |
|
|
| **Build Time** | 30ms | 2.1s | **70x faster** | |
|
|
| **Query Latency** | 3.1ms | - | Production-ready | |
|
|
|
|
|
*Benchmarked on hierarchically-structured AI conversation data* |
|
|
|
|
|
--- |
|
|
|
|
|
## Recall Comparison |
|
|
|
|
|
<p align="center"> |
|
|
<img src="images/fig02_recall_comparison.jpg" alt="HAT vs HNSW Recall" width="700"/> |
|
|
</p> |
|
|
|
|
|
HAT achieves **100% recall** where HNSW achieves only ~70% on hierarchically-structured data. |
|
|
|
|
|
--- |
|
|
|
|
|
## Build Time |
|
|
|
|
|
<p align="center"> |
|
|
<img src="images/fig03_build_time.jpg" alt="Build Time Comparison" width="700"/> |
|
|
</p> |
|
|
|
|
|
HAT builds indexes **70x faster** than HNSW - critical for real-time applications. |
|
|
|
|
|
--- |
|
|
|
|
|
## The Problem |
|
|
|
|
|
Large language models have finite context windows. A 10K context model can only "see" the most recent 10K tokens, losing access to earlier conversation history. |
|
|
|
|
|
**Current solutions fall short:** |
|
|
- Longer context models: Expensive to train and run |
|
|
- Summarization: Lossy compression that discards detail |
|
|
- RAG retrieval: Re-embeds and recomputes attention every query |
|
|
|
|
|
## The HAT Solution |
|
|
|
|
|
<p align="center"> |
|
|
<img src="images/fig06_hat_vs_rag.jpg" alt="HAT vs RAG" width="800"/> |
|
|
</p> |
|
|
|
|
|
HAT exploits **known structure** in AI workloads. Unlike general vector databases that treat data as unstructured point clouds, AI conversations have inherent hierarchy: |
|
|
|
|
|
``` |
|
|
Session (conversation boundary) |
|
|
βββ Document (topic or turn) |
|
|
βββ Chunk (individual message) |
|
|
``` |
|
|
|
|
|
### The Hippocampus Analogy |
|
|
|
|
|
<p align="center"> |
|
|
<img src="images/fig05_hippocampus.jpg" alt="Hippocampus Analogy" width="800"/> |
|
|
</p> |
|
|
|
|
|
HAT mirrors human memory architecture - functioning as an **artificial hippocampus** for AI systems. |
|
|
|
|
|
--- |
|
|
|
|
|
## How It Works |
|
|
|
|
|
### Beam Search Query |
|
|
|
|
|
<p align="center"> |
|
|
<img src="images/fig10_beam_search.jpg" alt="Beam Search" width="800"/> |
|
|
</p> |
|
|
|
|
|
HAT uses beam search through the hierarchy: |
|
|
|
|
|
``` |
|
|
1. Start at root |
|
|
2. At each level, score children by cosine similarity to query |
|
|
3. Keep top-b candidates (beam width) |
|
|
4. Return top-k from leaf level |
|
|
``` |
|
|
|
|
|
**Complexity:** O(b Β· d Β· c) = O(log n) when balanced |
|
|
|
|
|
### Consolidation Phases |
|
|
|
|
|
<p align="center"> |
|
|
<img src="images/fig08_consolidation.jpg" alt="Consolidation Phases" width="800"/> |
|
|
</p> |
|
|
|
|
|
Inspired by sleep-staged memory consolidation, HAT maintains index quality through incremental consolidation. |
|
|
|
|
|
--- |
|
|
|
|
|
## Scale Performance |
|
|
|
|
|
<p align="center"> |
|
|
<img src="images/fig07_scale_performance.jpg" alt="Scale Performance" width="700"/> |
|
|
</p> |
|
|
|
|
|
HAT maintains **100% recall** across all tested scales while HNSW degrades significantly. |
|
|
|
|
|
| Scale | HAT Build | HNSW Build | HAT R@10 | HNSW R@10 | |
|
|
|-------|-----------|------------|----------|-----------| |
|
|
| 500 | 16ms | 1.0s | **100%** | 55% | |
|
|
| 1000 | 25ms | 2.0s | **100%** | 44.5% | |
|
|
| 2000 | 50ms | 4.3s | **100%** | 67.5% | |
|
|
| 5000 | 127ms | 11.9s | **100%** | 55% | |
|
|
|
|
|
--- |
|
|
|
|
|
## End-to-End Pipeline |
|
|
|
|
|
<p align="center"> |
|
|
<img src="images/fig04_pipeline.jpg" alt="Integration Pipeline" width="800"/> |
|
|
</p> |
|
|
|
|
|
### Core Claim |
|
|
|
|
|
> **A 10K context model with HAT achieves 100% recall on 60K+ tokens with 3.1ms latency.** |
|
|
|
|
|
| Messages | Tokens | Context % | Recall | Latency | Memory | |
|
|
|----------|--------|-----------|--------|---------|--------| |
|
|
| 1000 | 30K | 33% | 100% | 1.7ms | 1.6MB | |
|
|
| 2000 | 60K | 17% | 100% | 3.1ms | 3.3MB | |
|
|
|
|
|
--- |
|
|
|
|
|
## Quick Start |
|
|
|
|
|
### Python |
|
|
|
|
|
```python |
|
|
from arms_hat import HatIndex |
|
|
|
|
|
# Create index (1536 dimensions for OpenAI embeddings) |
|
|
index = HatIndex.cosine(1536) |
|
|
|
|
|
# Add messages with automatic hierarchy |
|
|
index.add(embedding) # Returns ID |
|
|
|
|
|
# Session/document management |
|
|
index.new_session() # Start new conversation |
|
|
index.new_document() # Start new topic |
|
|
|
|
|
# Query |
|
|
results = index.near(query_embedding, k=10) |
|
|
for result in results: |
|
|
print(f"ID: {result.id}, Score: {result.score:.4f}") |
|
|
|
|
|
# Persistence |
|
|
index.save("memory.hat") |
|
|
loaded = HatIndex.load("memory.hat") |
|
|
``` |
|
|
|
|
|
### Rust |
|
|
|
|
|
```rust |
|
|
use hat::{HatIndex, HatConfig}; |
|
|
|
|
|
// Create index |
|
|
let config = HatConfig::default(); |
|
|
let mut index = HatIndex::new(config, 1536); |
|
|
|
|
|
// Add points |
|
|
let id = index.add(&embedding); |
|
|
|
|
|
// Query |
|
|
let results = index.search(&query, 10); |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## Installation |
|
|
|
|
|
### Python |
|
|
|
|
|
```bash |
|
|
pip install arms-hat |
|
|
``` |
|
|
|
|
|
### From Source (Rust) |
|
|
|
|
|
```bash |
|
|
git clone https://github.com/automate-capture/hat.git |
|
|
cd hat |
|
|
cargo build --release |
|
|
``` |
|
|
|
|
|
### Python Development |
|
|
|
|
|
```bash |
|
|
cd python |
|
|
pip install maturin |
|
|
maturin develop |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## Project Structure |
|
|
|
|
|
``` |
|
|
hat/ |
|
|
βββ src/ # Rust implementation |
|
|
β βββ lib.rs # Library entry point |
|
|
β βββ index.rs # HatIndex implementation |
|
|
β βββ container.rs # Tree node types |
|
|
β βββ consolidation.rs # Background maintenance |
|
|
β βββ persistence.rs # Save/load functionality |
|
|
βββ python/ # Python bindings (PyO3) |
|
|
β βββ arms_hat/ # Python package |
|
|
βββ benchmarks/ # Performance comparisons |
|
|
βββ examples/ # Usage examples |
|
|
βββ paper/ # Research paper (PDF) |
|
|
βββ images/ # Figures and diagrams |
|
|
βββ tests/ # Test suite |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## Reproducing Results |
|
|
|
|
|
```bash |
|
|
# Run HAT vs HNSW benchmark |
|
|
cargo test --test phase31_hat_vs_hnsw -- --nocapture |
|
|
|
|
|
# Run real embedding dimension tests |
|
|
cargo test --test phase32_real_embeddings -- --nocapture |
|
|
|
|
|
# Run persistence tests |
|
|
cargo test --test phase33_persistence -- --nocapture |
|
|
|
|
|
# Run end-to-end LLM demo |
|
|
python examples/demo_hat_memory.py |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## When to Use HAT |
|
|
|
|
|
**HAT is ideal for:** |
|
|
- AI conversation memory (chatbots, agents) |
|
|
- Session-based retrieval systems |
|
|
- Any hierarchically-structured vector data |
|
|
- Systems requiring deterministic behavior |
|
|
- Cold-start scenarios (no training needed) |
|
|
|
|
|
**Use HNSW instead for:** |
|
|
- Unstructured point clouds (random embeddings) |
|
|
- Static knowledge bases (handbooks, catalogs) |
|
|
- When approximate recall is acceptable |
|
|
|
|
|
--- |
|
|
|
|
|
## Beyond AI Memory: A New Database Paradigm |
|
|
|
|
|
HAT represents a fundamentally new approach to indexing: **exploiting known structure rather than learning it**. |
|
|
|
|
|
| Database Type | Structure | Semantics | |
|
|
|---------------|-----------|-----------| |
|
|
| Relational | Explicit (foreign keys) | None | |
|
|
| Document | Implicit (nesting) | None | |
|
|
| Vector (HNSW) | Learned from data | Yes | |
|
|
| **HAT** | **Explicit + exploited** | **Yes** | |
|
|
|
|
|
Traditional vector databases treat embeddings as unstructured point clouds, spending compute to *discover* topology. HAT inverts this: **known hierarchy is free information - use it.** |
|
|
|
|
|
### General Applications |
|
|
|
|
|
Any domain with **hierarchical structure + semantic similarity** benefits from HAT: |
|
|
|
|
|
- **Legal/Medical Documents:** Case β Filing β Paragraph β Sentence |
|
|
- **Code Search:** Repository β Module β Function β Line |
|
|
- **IoT/Sensor Networks:** Facility β Zone β Device β Reading |
|
|
- **E-commerce:** Catalog β Category β Product β Variant |
|
|
- **Research Corpora:** Journal β Paper β Section β Citation |
|
|
|
|
|
### The Core Insight |
|
|
|
|
|
> *"Position IS relationship. No foreign keys needed - proximity defines connection."* |
|
|
|
|
|
HAT combines the structural guarantees of document databases with the semantic power of vector search, without the computational overhead of learning topology from scratch. |
|
|
|
|
|
--- |
|
|
|
|
|
## Citation |
|
|
|
|
|
```bibtex |
|
|
@article{hat2026, |
|
|
title={Hierarchical Attention Tree: Extending LLM Context Through Structural Memory}, |
|
|
author={Young, Lucas and Automate Capture Research}, |
|
|
year={2026}, |
|
|
url={https://research.automate-capture.com/hat} |
|
|
} |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## Paper |
|
|
|
|
|
π **[Read the Full Paper (PDF)](paper/HAT_Context_Extension_Young_2026.pdf)** |
|
|
|
|
|
--- |
|
|
|
|
|
## License |
|
|
|
|
|
MIT License - see [LICENSE](LICENSE) for details. |
|
|
|
|
|
--- |
|
|
|
|
|
## Links |
|
|
|
|
|
- **Research Site:** [research.automate-capture.com/hat](https://research.automate-capture.com/hat) |
|
|
- **Main Site:** [automate-capture.com](https://automate-capture.com) |
|
|
|