HAT Benchmark Reproducibility Package
This directory contains everything needed to reproduce the benchmark results from the HAT paper.
Quick Start
# Run all benchmarks
./run_all_benchmarks.sh
# Run abbreviated version (faster)
./run_all_benchmarks.sh --quick
Benchmark Suite
Phase 3.1: HAT vs HNSW Comparison
Test file: tests/phase31_hat_vs_hnsw.rs
Compares HAT against HNSW on hierarchically-structured data (AI conversation patterns).
Expected Results:
| Metric | HAT | HNSW |
|---|---|---|
| Recall@10 | 100% | ~70% |
| Build Time | 30ms | 2100ms |
| Query Latency | 1.4ms | 0.5ms |
Key finding: HAT achieves 30% higher recall while building 70x faster.
Phase 3.2: Real Embedding Dimensions
Test file: tests/phase32_real_embeddings.rs
Tests HAT with production embedding sizes.
Expected Results:
| Dimensions | Model | Recall@10 |
|---|---|---|
| 384 | MiniLM | 100% |
| 768 | BERT-base | 100% |
| 1536 | OpenAI ada-002 | 100% |
Phase 3.3: Persistence Layer
Test file: tests/phase33_persistence.rs
Validates serialization/deserialization correctness and performance.
Expected Results:
| Metric | Value |
|---|---|
| Serialize throughput | 300+ MB/s |
| Deserialize throughput | 100+ MB/s |
| Recall after restore | 100% |
Phase 4.2: Attention State Format
Test file: tests/phase42_attention_state.rs
Tests the attention state serialization format.
Expected Results:
- All 9 tests pass
- Role types roundtrip correctly
- Metadata preserved
- KV cache support working
Phase 4.3: End-to-End Demo
Script: examples/demo_hat_memory.py
Full integration with sentence-transformers and optional LLM.
Expected Results:
| Metric | Value |
|---|---|
| Messages | 2000 |
| Tokens | ~60,000 |
| Recall accuracy | 100% |
| Retrieval latency | <5ms |
Running Individual Benchmarks
Rust Benchmarks
# HAT vs HNSW
cargo test --test phase31_hat_vs_hnsw -- --nocapture
# Real embeddings
cargo test --test phase32_real_embeddings -- --nocapture
# Persistence
cargo test --test phase33_persistence -- --nocapture
# Attention state
cargo test --test phase42_attention_state -- --nocapture
Python Tests
# Setup
python3 -m venv venv
source venv/bin/activate
pip install maturin pytest sentence-transformers
# Build extension
maturin develop --features python
# Run tests
pytest python/tests/ -v
# Run demo
python examples/demo_hat_memory.py
Hardware Requirements
- Minimum: 4GB RAM, any modern CPU
- Recommended: 8GB RAM for large-scale tests
- Storage: ~2GB for full benchmark suite
Expected Runtime
| Mode | Time |
|---|---|
Quick (--quick) |
~2 minutes |
| Full | ~10 minutes |
| With LLM demo | ~15 minutes |
Interpreting Results
Key Metrics
Recall@k: Percentage of true nearest neighbors found
- HAT target: 100% on hierarchical data
- HNSW baseline: ~70% on hierarchical data
Build Time: Time to construct the index
- HAT target: <100ms for 1000 points
- Should be 50-100x faster than HNSW
Query Latency: Time per query
- HAT target: <5ms
- Acceptable to be 2-3x slower than HNSW (recall matters more)
Throughput: Serialization/deserialization speed
- Target: 100+ MB/s
Success Criteria
The benchmarks validate the paper's claims if:
- HAT recall@10 ≥ 99% on hierarchical data
- HAT recall significantly exceeds HNSW on hierarchical data
- HAT builds faster than HNSW
- Persistence preserves 100% recall
- Python bindings pass all tests
- End-to-end demo achieves ≥95% retrieval accuracy
Troubleshooting
Build Errors
# Update Rust
rustup update
# Clean build
cargo clean && cargo build --release
Python Issues
# Ensure venv is activated
source venv/bin/activate
# Rebuild extension
maturin develop --features python --release
Memory Issues
For large-scale tests, ensure sufficient RAM:
# Check available memory
free -h
# Run with limited parallelism
RAYON_NUM_THREADS=2 cargo test --test phase31_hat_vs_hnsw
Output Files
Results are saved to benchmarks/results/:
results/
benchmark_results_YYYYMMDD_HHMMSS.txt # Full output
Citation
If you use these benchmarks, please cite:
@article{hat2026,
title={Hierarchical Attention Tree: Extending LLM Context Through Structural Memory},
author={AI Research Lab},
year={2026}
}