RavenBERT
RavenBERT is a SentenceTransformers embedding model specialized for smart-contract invariants (e.g., require(...), assert(...), if (...) revert) from Ethereum/Vyper sources.
It starts from web3se/SmartBERT-v2 and is contrastively fine-tuned so that cosine similarity reflects semantic intent of guards used in transaction-reverting checks.
- Architecture: BERT-family encoder (SmartBERT-v2) β MeanPooling β L2 Normalize
- Embedding dimension: 768
- Normalization: Enabled (unit-norm vectors; cosine β‘ dot product)
- Intended use: clustering / semantic search / dedup / taxonomy building for short guard predicates (and optional messages)
Quick start
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("MojtabaEshghie/RavenBERT")
sentences = [
"amountOut >= amountOutMin",
"deadline >= block.timestamp",
"balances[msg.sender] >= amount"
]
emb = model.encode(sentences, convert_to_numpy=True, show_progress_bar=False)
# emb are L2-normalized; use cosine similarity for comparisons
Training summary (contrastive)
- Base model:
web3se/SmartBERT-v2 - Objective:
CosineSimilarityLoss(positives near 1.0, negatives near 0.0) - Pair construction: L2-normalized seed embeddings β
positives if cosine β₯ 0.80, negatives if cosine β€ 0.20 (nearest-neighbor candidates,
top_k=10, max 5 positives/item) - This release stats: 1,647 unique texts β 16,470 pairs (8,235 pos / 8,235 neg)
- Hyperparams: epochs=1, batch_size=16, max_seq_len=512
- Saved as: canonical SentenceTransformers layout (
0_Transformer/,1_Pooling/,2_Normalize/)
A more detailed methodology and evaluation appear in the RAVEN paper (semantic clustering of revert-inducing invariants).
Intended uses & limitations
Good for
- Measuring semantic relatedness of short invariant predicates
- Clustering guards by intent (e.g., access control, slippage, timeouts)
- Deduplicating near-equivalent checks across contracts
Not ideal for
- Long code blocks or whole-function embeddings
- General code understanding outside invariant-style snippets
- Non-EVM ecosystems without adaptation
Evaluation (paper)
When paired with DBSCAN on predicate-only text, RavenBERT produced compact, well-separated clusters (e.g., Silhouette β 0.93, S_Dbw β 0.043 at ~52% coverage), surfacing meaningful categories of defenses from reverted transactions. See paper for full protocol, ablations, and metrics.
Reproducibility
- Pair thresholds: Οβ = 0.80, Οβ = 0.20
- Normalization: L2 via
sentence_transformers.models.Normalize() - Training log:
ravenbert_training_stats.json(included in repo)
Citation
If you use RavenBERT, please cite the RAVEN paper and this model:
TBD
License
MIT
- Downloads last month
- 5