HAT / benchmarks /README.md

Andrew Young

Upload folder using huggingface_hub

8ef2d83 verified 7 days ago

preview code

raw

history blame contribute delete

4.76 kB

HAT Benchmark Reproducibility Package

This directory contains everything needed to reproduce the benchmark results from the HAT paper.

Quick Start

# Run all benchmarks
./run_all_benchmarks.sh

# Run abbreviated version (faster)
./run_all_benchmarks.sh --quick

Benchmark Suite

Phase 3.1: HAT vs HNSW Comparison

Test file: tests/phase31_hat_vs_hnsw.rs

Compares HAT against HNSW on hierarchically-structured data (AI conversation patterns).

Expected Results:

Metric	HAT	HNSW
Recall@10	100%	~70%
Build Time	30ms	2100ms
Query Latency	1.4ms	0.5ms

Key finding: HAT achieves 30% higher recall while building 70x faster.

Phase 3.2: Real Embedding Dimensions

Test file: tests/phase32_real_embeddings.rs

Tests HAT with production embedding sizes.

Expected Results:

Dimensions	Model	Recall@10
384	MiniLM	100%
768	BERT-base	100%
1536	OpenAI ada-002	100%

Phase 3.3: Persistence Layer

Test file: tests/phase33_persistence.rs

Validates serialization/deserialization correctness and performance.

Expected Results:

Metric	Value
Serialize throughput	300+ MB/s
Deserialize throughput	100+ MB/s
Recall after restore	100%

Phase 4.2: Attention State Format

Test file: tests/phase42_attention_state.rs

Tests the attention state serialization format.

Expected Results:

All 9 tests pass
Role types roundtrip correctly
Metadata preserved
KV cache support working

Phase 4.3: End-to-End Demo

Script: examples/demo_hat_memory.py

Full integration with sentence-transformers and optional LLM.

Expected Results:

Metric	Value
Messages	2000
Tokens	~60,000
Recall accuracy	100%
Retrieval latency	<5ms

Running Individual Benchmarks

Rust Benchmarks

# HAT vs HNSW
cargo test --test phase31_hat_vs_hnsw -- --nocapture

# Real embeddings
cargo test --test phase32_real_embeddings -- --nocapture

# Persistence
cargo test --test phase33_persistence -- --nocapture

# Attention state
cargo test --test phase42_attention_state -- --nocapture

Python Tests

# Setup
python3 -m venv venv
source venv/bin/activate
pip install maturin pytest sentence-transformers

# Build extension
maturin develop --features python

# Run tests
pytest python/tests/ -v

# Run demo
python examples/demo_hat_memory.py

Hardware Requirements

Minimum: 4GB RAM, any modern CPU
Recommended: 8GB RAM for large-scale tests
Storage: ~2GB for full benchmark suite

Expected Runtime

Mode	Time
Quick (`--quick`)	~2 minutes
Full	~10 minutes
With LLM demo	~15 minutes

Interpreting Results

Key Metrics

Recall@k: Percentage of true nearest neighbors found
- HAT target: 100% on hierarchical data
- HNSW baseline: ~70% on hierarchical data
Build Time: Time to construct the index
- HAT target: <100ms for 1000 points
- Should be 50-100x faster than HNSW
Query Latency: Time per query
- HAT target: <5ms
- Acceptable to be 2-3x slower than HNSW (recall matters more)
Throughput: Serialization/deserialization speed
- Target: 100+ MB/s

Success Criteria

The benchmarks validate the paper's claims if:

HAT recall@10 ≥ 99% on hierarchical data
HAT recall significantly exceeds HNSW on hierarchical data
HAT builds faster than HNSW
Persistence preserves 100% recall
Python bindings pass all tests
End-to-end demo achieves ≥95% retrieval accuracy

Troubleshooting

Build Errors

# Update Rust
rustup update

# Clean build
cargo clean && cargo build --release

Python Issues

# Ensure venv is activated
source venv/bin/activate

# Rebuild extension
maturin develop --features python --release

Memory Issues

For large-scale tests, ensure sufficient RAM:

# Check available memory
free -h

# Run with limited parallelism
RAYON_NUM_THREADS=2 cargo test --test phase31_hat_vs_hnsw

Output Files

Results are saved to benchmarks/results/:

results/
  benchmark_results_YYYYMMDD_HHMMSS.txt  # Full output

Citation

If you use these benchmarks, please cite:

@article{hat2026,
  title={Hierarchical Attention Tree: Extending LLM Context Through Structural Memory},
  author={AI Research Lab},
  year={2026}
}