HAT

File size: 4,760 Bytes

8ef2d83

# HAT Benchmark Reproducibility Package

This directory contains everything needed to reproduce the benchmark results from the HAT paper.

## Quick Start

```bash

# Run all benchmarks

./run_all_benchmarks.sh



# Run abbreviated version (faster)

./run_all_benchmarks.sh --quick

```

## Benchmark Suite

### Phase 3.1: HAT vs HNSW Comparison

**Test file**: `tests/phase31_hat_vs_hnsw.rs`

Compares HAT against HNSW on hierarchically-structured data (AI conversation patterns).

**Expected Results**:

| Metric | HAT | HNSW |
|--------|-----|------|
| Recall@10 | 100% | ~70% |
| Build Time | 30ms | 2100ms |
| Query Latency | 1.4ms | 0.5ms |

**Key finding**: HAT achieves 30% higher recall while building 70x faster.

### Phase 3.2: Real Embedding Dimensions

**Test file**: `tests/phase32_real_embeddings.rs`

Tests HAT with production embedding sizes.

**Expected Results**:

| Dimensions | Model | Recall@10 |
|------------|-------|-----------|
| 384 | MiniLM | 100% |
| 768 | BERT-base | 100% |
| 1536 | OpenAI ada-002 | 100% |

### Phase 3.3: Persistence Layer

**Test file**: `tests/phase33_persistence.rs`

Validates serialization/deserialization correctness and performance.

**Expected Results**:

| Metric | Value |
|--------|-------|
| Serialize throughput | 300+ MB/s |
| Deserialize throughput | 100+ MB/s |
| Recall after restore | 100% |

### Phase 4.2: Attention State Format

**Test file**: `tests/phase42_attention_state.rs`

Tests the attention state serialization format.

**Expected Results**:
- All 9 tests pass
- Role types roundtrip correctly
- Metadata preserved
- KV cache support working

### Phase 4.3: End-to-End Demo

**Script**: `examples/demo_hat_memory.py`

Full integration with sentence-transformers and optional LLM.

**Expected Results**:

| Metric | Value |
|--------|-------|
| Messages | 2000 |
| Tokens | ~60,000 |
| Recall accuracy | 100% |
| Retrieval latency | <5ms |

## Running Individual Benchmarks

### Rust Benchmarks

```bash

# HAT vs HNSW

cargo test --test phase31_hat_vs_hnsw -- --nocapture



# Real embeddings

cargo test --test phase32_real_embeddings -- --nocapture



# Persistence

cargo test --test phase33_persistence -- --nocapture



# Attention state

cargo test --test phase42_attention_state -- --nocapture

```

### Python Tests

```bash

# Setup

python3 -m venv venv

source venv/bin/activate

pip install maturin pytest sentence-transformers



# Build extension

maturin develop --features python



# Run tests

pytest python/tests/ -v



# Run demo

python examples/demo_hat_memory.py

```

## Hardware Requirements

- **Minimum**: 4GB RAM, any modern CPU
- **Recommended**: 8GB RAM for large-scale tests
- **Storage**: ~2GB for full benchmark suite

## Expected Runtime

| Mode | Time |
|------|------|
| Quick (`--quick`) | ~2 minutes |
| Full | ~10 minutes |
| With LLM demo | ~15 minutes |

## Interpreting Results

### Key Metrics

1. **Recall@k**: Percentage of true nearest neighbors found
   - HAT target: 100% on hierarchical data
   - HNSW baseline: ~70% on hierarchical data

2. **Build Time**: Time to construct the index
   - HAT target: <100ms for 1000 points
   - Should be 50-100x faster than HNSW

3. **Query Latency**: Time per query
   - HAT target: <5ms
   - Acceptable to be 2-3x slower than HNSW (recall matters more)

4. **Throughput**: Serialization/deserialization speed
   - Target: 100+ MB/s

### Success Criteria

The benchmarks validate the paper's claims if:

1. HAT recall@10 ≥ 99% on hierarchical data
2. HAT recall significantly exceeds HNSW on hierarchical data
3. HAT builds faster than HNSW
4. Persistence preserves 100% recall
5. Python bindings pass all tests
6. End-to-end demo achieves ≥95% retrieval accuracy

## Troubleshooting

### Build Errors

```bash

# Update Rust

rustup update



# Clean build

cargo clean && cargo build --release

```

### Python Issues

```bash

# Ensure venv is activated

source venv/bin/activate



# Rebuild extension

maturin develop --features python --release

```

### Memory Issues

For large-scale tests, ensure sufficient RAM:

```bash

# Check available memory

free -h



# Run with limited parallelism

RAYON_NUM_THREADS=2 cargo test --test phase31_hat_vs_hnsw

```

## Output Files

Results are saved to `benchmarks/results/`:

```

results/

  benchmark_results_YYYYMMDD_HHMMSS.txt  # Full output

```

## Citation

If you use these benchmarks, please cite:

```bibtex

@article{hat2026,

  title={Hierarchical Attention Tree: Extending LLM Context Through Structural Memory},

  author={AI Research Lab},

  year={2026}

}

```