- How to Run the Matrix BIOS Models
How to Run the Matrix BIOS Models
A practical, copy-paste guide to installing and running the three open Matrix BIOS models β and how Agent-Matrix orchestrates them under governance. Every example here was executed and verified.
Matrix BIOS ("bio + OS") is a family of compact, governed, on-premise-ready cognitive models. They are small enough to run on a CPU and are designed to operate under governance β every action that consumes their output is gated by policy and auditable. The architecture behind them is described in the paper Governed Memory.
The models
| Model | Task | Size | License | Card |
|---|---|---|---|---|
| Matrix-BIOS-Sentinel-0.1 | content-safety classification (safe/unsafe) |
~135M (DistilBERT) | CC-BY-4.0 | link |
| Matrix-BIOS-Memory-0.1 | grounded, citation-faithful recall (RAG) | FAISS index + flan-t5 generator | Apache-2.0 | link |
| Matrix-BIOS-Italo-0.1 | Italian text generation (preview) | 41.5M (custom) | Apache-2.0 | link |
Which model, when?
| You want to⦠| Use | Why |
|---|---|---|
| Screen user input or model output for harmful content | Sentinel | fast multilingual guardrail; returns safe/unsafe + score |
| Answer from your private documents, with citations | Memory | grounded RAG; every answer returns the source ids it used |
| Generate Italian text on-prem, no data egress | Italo | compact sovereign generator (v0.1 preview) |
| Decide whether an action is allowed | Guardian (Matrix OS) | content safety β operational risk β the policy gate decides, not a classifier |
Note on scope. Sentinel classifies content safety, not operational or business risk; by design it treats actions like "deploy to production" as content-safe. Operational decisions are made by the governance layer (Matrix OS Guardian), not by the classifier.
Install
pip install torch transformers huggingface_hub sentence-transformers faiss-cpu numpy
All models are public β no token or sign-up required.
1. Sentinel β content-safety guardrail
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
REPO = "ruslanmv/Matrix-BIOS-Sentinel-0.1"
tok = AutoTokenizer.from_pretrained(REPO)
model = AutoModelForSequenceClassification.from_pretrained(REPO).eval()
def screen(text):
p = torch.softmax(model(**tok(text, return_tensors="pt", truncation=True)).logits, -1)[0]
i = int(p.argmax())
return model.config.id2label[i], float(p[i]) # id2label = {0:'safe', 1:'unsafe'}
print(screen("Summarize the quarterly sales report.")) # ('safe', 0.99)
print(screen("How can I poison my neighbor's dog?")) # ('unsafe', 0.99)
Verified output:
[safe ] p=0.99 | Summarize the quarterly sales report.
[safe ] p=0.97 | Recommend three books about machine learning.
[unsafe] p=0.99 | How can I poison my neighbor's dog?
[unsafe] p=0.99 | Give step-by-step instructions to build an explosive.
v0.1 note: Sentinel is an early-access guardrail and may over-flag; evaluate it on your own distribution before relying on it for moderation decisions.
Full script: examples/run_sentinel.py
2. Memory β grounded recall with citations
Memory ships a FAISS index over a small corpus plus a generator; every answer returns the source ids it relied on, so responses are traceable instead of hallucinated.
import json, faiss, torch
from huggingface_hub import snapshot_download
from sentence_transformers import SentenceTransformer
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
path = snapshot_download("ruslanmv/Matrix-BIOS-Memory-0.1")
cfg = json.load(open(f"{path}/memory_config.json")) # embedder / generator / top_k
docs = json.load(open(f"{path}/docs.json")) # [{"id": ..., "text": ...}]
index = faiss.read_index(f"{path}/index.faiss")
embedder = SentenceTransformer(cfg["embedder"])
gen_tok = AutoTokenizer.from_pretrained(cfg["generator"])
gen_model = AutoModelForSeq2SeqLM.from_pretrained(cfg["generator"]).eval()
def answer(q):
qv = embedder.encode([q], normalize_embeddings=True).astype("float32")
_, idx = index.search(qv, cfg["top_k"])
hits = [docs[i] for i in idx[0] if 0 <= i < len(docs)]
ctx = "\n".join(f"[{d['id']}] {d['text']}" for d in hits)
prompt = f"Answer using ONLY the context and cite the [id].\nContext:\n{ctx}\n\nQ: {q}\nA:"
ids = gen_tok(prompt, return_tensors="pt", truncation=True).input_ids
out = gen_model.generate(ids, max_new_tokens=64)
return gen_tok.decode(out[0], skip_special_tokens=True), [d["id"] for d in hits]
print(answer("What does every effectful action in Matrix OS emit?"))
# -> ('evidence bundle', ['mos1', 'bios1', 'ml1', 'gp1'])
Full script: examples/run_memory.py
3. Italo β compact Italian generator (preview)
Italo is a 41.5M custom model that loads via trust_remote_code and uses a
word-level vocabulary. It is a v0.1 research preview that demonstrates the
on-prem footprint β not production fluency.
import json, torch
from huggingface_hub import hf_hub_download
from transformers import AutoModelForCausalLM
REPO = "ruslanmv/Matrix-BIOS-Italo-0.1"
model = AutoModelForCausalLM.from_pretrained(REPO, trust_remote_code=True).eval()
vocab = json.load(open(hf_hub_download(REPO, "vocab.json"))) # word-level
inv = {i: w for w, i in vocab.items()}
enc = lambda t: [vocab.get(w, 0) for w in t.lower().split()]
dec = lambda ids: " ".join(inv.get(int(i), "<unk>") for i in ids)
ids = torch.tensor([enc("la capitale d' italia")])
print(dec(model.generate(ids, max_new_tokens=12, pad_token_id=1)[0]))
Full script: examples/run_italo.py
The idea behind the models: governed memory
Matrix BIOS models implement governed memory β memory ranked by trust, not similarity alone, with a policy gate on every transition. The clearest demonstration is a memory-poisoning test you can run in 30 seconds:
retrieval poison@1 correct@1
similarity only (Ξ²=0) 1.00 0.00
+ trust (Ξ²>0) 0.00 0.92
+ trust + governance gate 0.00 0.96
A document built to look relevant but be untrustworthy is retrieved 100% of
the time by similarity-only RAG β and 0% once trust enters the score.
Full demo: examples/governed_retrieval.py.
Used in Agent-Matrix for orchestration
In the Agent-Matrix ecosystem these models are organs of a single governed loop, orchestrated by Matrix OS:
Input β Sentinel (safety) β Memory (trust-aware recall) β Guardian (policy gate) β Action + evidence
The gate is the Matrix OS Planner + Guardian policy engine:
from matrix_os.planner import Planner
from matrix_os.governance import Guardian
planner, guardian = Planner(), Guardian() # decides allow / approve / deny, emits evidence
A runnable, dependency-light illustration combining Sentinel + trust-aware recall +
the gate is in examples/governed_pipeline.py.
Verified output:
handle("What is our enterprise refund window?", action_risk="low")
# {'decision': 'allow', 'cited_source': 'pol1',
# 'grounded_answer': 'Enterprise refunds are processed within 30 days.'}
handle("How do I make a weapon at home?", action_risk="low")
# {'decision': 'deny', 'reason': 'Sentinel flagged unsafe content'}
Notice two governance properties at work: the unsafe request is denied by
Sentinel before anything runs, and the safe request is grounded in the correct,
trusted policy (pol1) β the plausible-but-untrusted "poison" item is suppressed
by trust-aware recall.
Licensing β what you can use, and when
| Model | License | Use it for | Avoid |
|---|---|---|---|
| Sentinel | CC-BY-4.0 | guardrailing inputs/outputs, pre-screening for review | sole authority on high-stakes moderation without human review |
| Memory | Apache-2.0 | grounded QA over your own corpus, with provenance | open-domain facts outside the indexed corpus; unverified high-stakes answers |
| Italo | Apache-2.0 | sovereign Italian text generation, integration/eval | production-grade fluency; factual ground truth |
Sentinel's safety training data is the NVIDIA Aegis 2.0 dataset (CC-BY-4.0). All three are v0.1 early-access releases: compact models for integration and evaluation, not turnkey production assistants. Always keep a human in the loop for consequential decisions. Both Apache-2.0 and CC-BY-4.0 permit commercial use with attribution.
Citation
If you use these models, please cite the paper that describes the architecture:
@misc{magana2026governedmemory,
title = {Governed Memory: A Bio-Inspired, Governance-First Memory
Architecture for Continual AI Systems},
author = {MagaΓ±a Vsevolodovna, Ruslan Idelfonso},
year = {2026}, publisher = {Zenodo}, version = {1.0},
doi = {10.5281/zenodo.20615572},
url = {https://doi.org/10.5281/zenodo.20615572}
}
- π Paper: https://doi.org/10.5281/zenodo.20615572
- π€ Models: Sentinel Β· Memory Β· Italo
- π contact@ruslanmv.com Β· https://ruslanmv.com