HAT / benchmarks /README.md
Andrew Young
Upload folder using huggingface_hub
8ef2d83 verified

HAT Benchmark Reproducibility Package

This directory contains everything needed to reproduce the benchmark results from the HAT paper.

Quick Start

# Run all benchmarks
./run_all_benchmarks.sh

# Run abbreviated version (faster)
./run_all_benchmarks.sh --quick

Benchmark Suite

Phase 3.1: HAT vs HNSW Comparison

Test file: tests/phase31_hat_vs_hnsw.rs

Compares HAT against HNSW on hierarchically-structured data (AI conversation patterns).

Expected Results:

Metric HAT HNSW
Recall@10 100% ~70%
Build Time 30ms 2100ms
Query Latency 1.4ms 0.5ms

Key finding: HAT achieves 30% higher recall while building 70x faster.

Phase 3.2: Real Embedding Dimensions

Test file: tests/phase32_real_embeddings.rs

Tests HAT with production embedding sizes.

Expected Results:

Dimensions Model Recall@10
384 MiniLM 100%
768 BERT-base 100%
1536 OpenAI ada-002 100%

Phase 3.3: Persistence Layer

Test file: tests/phase33_persistence.rs

Validates serialization/deserialization correctness and performance.

Expected Results:

Metric Value
Serialize throughput 300+ MB/s
Deserialize throughput 100+ MB/s
Recall after restore 100%

Phase 4.2: Attention State Format

Test file: tests/phase42_attention_state.rs

Tests the attention state serialization format.

Expected Results:

  • All 9 tests pass
  • Role types roundtrip correctly
  • Metadata preserved
  • KV cache support working

Phase 4.3: End-to-End Demo

Script: examples/demo_hat_memory.py

Full integration with sentence-transformers and optional LLM.

Expected Results:

Metric Value
Messages 2000
Tokens ~60,000
Recall accuracy 100%
Retrieval latency <5ms

Running Individual Benchmarks

Rust Benchmarks

# HAT vs HNSW
cargo test --test phase31_hat_vs_hnsw -- --nocapture

# Real embeddings
cargo test --test phase32_real_embeddings -- --nocapture

# Persistence
cargo test --test phase33_persistence -- --nocapture

# Attention state
cargo test --test phase42_attention_state -- --nocapture

Python Tests

# Setup
python3 -m venv venv
source venv/bin/activate
pip install maturin pytest sentence-transformers

# Build extension
maturin develop --features python

# Run tests
pytest python/tests/ -v

# Run demo
python examples/demo_hat_memory.py

Hardware Requirements

  • Minimum: 4GB RAM, any modern CPU
  • Recommended: 8GB RAM for large-scale tests
  • Storage: ~2GB for full benchmark suite

Expected Runtime

Mode Time
Quick (--quick) ~2 minutes
Full ~10 minutes
With LLM demo ~15 minutes

Interpreting Results

Key Metrics

  1. Recall@k: Percentage of true nearest neighbors found

    • HAT target: 100% on hierarchical data
    • HNSW baseline: ~70% on hierarchical data
  2. Build Time: Time to construct the index

    • HAT target: <100ms for 1000 points
    • Should be 50-100x faster than HNSW
  3. Query Latency: Time per query

    • HAT target: <5ms
    • Acceptable to be 2-3x slower than HNSW (recall matters more)
  4. Throughput: Serialization/deserialization speed

    • Target: 100+ MB/s

Success Criteria

The benchmarks validate the paper's claims if:

  1. HAT recall@10 ≥ 99% on hierarchical data
  2. HAT recall significantly exceeds HNSW on hierarchical data
  3. HAT builds faster than HNSW
  4. Persistence preserves 100% recall
  5. Python bindings pass all tests
  6. End-to-end demo achieves ≥95% retrieval accuracy

Troubleshooting

Build Errors

# Update Rust
rustup update

# Clean build
cargo clean && cargo build --release

Python Issues

# Ensure venv is activated
source venv/bin/activate

# Rebuild extension
maturin develop --features python --release

Memory Issues

For large-scale tests, ensure sufficient RAM:

# Check available memory
free -h

# Run with limited parallelism
RAYON_NUM_THREADS=2 cargo test --test phase31_hat_vs_hnsw

Output Files

Results are saved to benchmarks/results/:

results/
  benchmark_results_YYYYMMDD_HHMMSS.txt  # Full output

Citation

If you use these benchmarks, please cite:

@article{hat2026,
  title={Hierarchical Attention Tree: Extending LLM Context Through Structural Memory},
  author={AI Research Lab},
  year={2026}
}