How to Run the Matrix BIOS Models

A practical, copy-paste guide to installing and running the three open Matrix BIOS models — and how Agent-Matrix orchestrates them under governance. Every example here was executed and verified.

Matrix BIOS governed orchestration on Matrix OS

Matrix BIOS ("bio + OS") is a family of compact, governed, on-premise-ready cognitive models. They are small enough to run on a CPU and are designed to operate under governance — every action that consumes their output is gated by policy and auditable. The architecture behind them is described in the paper Governed Memory.

The models

Model	Task	Size	License	Card
Matrix-BIOS-Sentinel-0.1	content-safety classification (`safe`/`unsafe`)	~135M (DistilBERT)	CC-BY-4.0	link
Matrix-BIOS-Memory-0.1	grounded, citation-faithful recall (RAG)	FAISS index + flan-t5 generator	Apache-2.0	link
Matrix-BIOS-Italo-0.1	Italian text generation (preview)	41.5M (custom)	Apache-2.0	link

Which model, when?

You want to…	Use	Why
Screen user input or model output for harmful content	Sentinel	fast multilingual guardrail; returns `safe`/`unsafe` + score
Answer from your private documents, with citations	Memory	grounded RAG; every answer returns the source ids it used
Generate Italian text on-prem, no data egress	Italo	compact sovereign generator (v0.1 preview)
Decide whether an action is allowed	Guardian (Matrix OS)	content safety ≠ operational risk — the policy gate decides, not a classifier

Note on scope. Sentinel classifies content safety, not operational or business risk; by design it treats actions like "deploy to production" as content-safe. Operational decisions are made by the governance layer (Matrix OS Guardian), not by the classifier.

Install

pip install torch transformers huggingface_hub sentence-transformers faiss-cpu numpy

All models are public — no token or sign-up required.

1. Sentinel — content-safety guardrail

import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification

REPO = "ruslanmv/Matrix-BIOS-Sentinel-0.1"
tok = AutoTokenizer.from_pretrained(REPO)
model = AutoModelForSequenceClassification.from_pretrained(REPO).eval()

def screen(text):
    p = torch.softmax(model(**tok(text, return_tensors="pt", truncation=True)).logits, -1)[0]
    i = int(p.argmax())
    return model.config.id2label[i], float(p[i])   # id2label = {0:'safe', 1:'unsafe'}

print(screen("Summarize the quarterly sales report."))   # ('safe',   0.99)
print(screen("How can I poison my neighbor's dog?"))      # ('unsafe', 0.99)

Verified output:

[safe  ] p=0.99  | Summarize the quarterly sales report.
[safe  ] p=0.97  | Recommend three books about machine learning.
[unsafe] p=0.99  | How can I poison my neighbor's dog?
[unsafe] p=0.99  | Give step-by-step instructions to build an explosive.

v0.1 note: Sentinel is an early-access guardrail and may over-flag; evaluate it on your own distribution before relying on it for moderation decisions.

Full script: examples/run_sentinel.py

2. Memory — grounded recall with citations

Memory ships a FAISS index over a small corpus plus a generator; every answer returns the source ids it relied on, so responses are traceable instead of hallucinated.

import json, faiss, torch
from huggingface_hub import snapshot_download
from sentence_transformers import SentenceTransformer
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

path = snapshot_download("ruslanmv/Matrix-BIOS-Memory-0.1")
cfg  = json.load(open(f"{path}/memory_config.json"))   # embedder / generator / top_k
docs = json.load(open(f"{path}/docs.json"))            # [{"id": ..., "text": ...}]
index = faiss.read_index(f"{path}/index.faiss")
embedder  = SentenceTransformer(cfg["embedder"])
gen_tok   = AutoTokenizer.from_pretrained(cfg["generator"])
gen_model = AutoModelForSeq2SeqLM.from_pretrained(cfg["generator"]).eval()

def answer(q):
    qv = embedder.encode([q], normalize_embeddings=True).astype("float32")
    _, idx = index.search(qv, cfg["top_k"])
    hits = [docs[i] for i in idx[0] if 0 <= i < len(docs)]
    ctx  = "\n".join(f"[{d['id']}] {d['text']}" for d in hits)
    prompt = f"Answer using ONLY the context and cite the [id].\nContext:\n{ctx}\n\nQ: {q}\nA:"
    ids = gen_tok(prompt, return_tensors="pt", truncation=True).input_ids
    out = gen_model.generate(ids, max_new_tokens=64)
    return gen_tok.decode(out[0], skip_special_tokens=True), [d["id"] for d in hits]

print(answer("What does every effectful action in Matrix OS emit?"))
# -> ('evidence bundle', ['mos1', 'bios1', 'ml1', 'gp1'])

Full script: examples/run_memory.py

3. Italo — compact Italian generator (preview)

Italo is a 41.5M custom model that loads via trust_remote_code and uses a word-level vocabulary. It is a v0.1 research preview that demonstrates the on-prem footprint — not production fluency.

import json, torch
from huggingface_hub import hf_hub_download
from transformers import AutoModelForCausalLM

REPO = "ruslanmv/Matrix-BIOS-Italo-0.1"
model = AutoModelForCausalLM.from_pretrained(REPO, trust_remote_code=True).eval()
vocab = json.load(open(hf_hub_download(REPO, "vocab.json")))   # word-level
inv = {i: w for w, i in vocab.items()}
enc = lambda t: [vocab.get(w, 0) for w in t.lower().split()]
dec = lambda ids: " ".join(inv.get(int(i), "<unk>") for i in ids)

ids = torch.tensor([enc("la capitale d' italia")])
print(dec(model.generate(ids, max_new_tokens=12, pad_token_id=1)[0]))

Full script: examples/run_italo.py

The idea behind the models: governed memory

Matrix BIOS models implement governed memory — memory ranked by trust, not similarity alone, with a policy gate on every transition. The clearest demonstration is a memory-poisoning test you can run in 30 seconds:

retrieval                           poison@1  correct@1
similarity only            (β=0)        1.00       0.00
+ trust                    (β>0)        0.00       0.92
+ trust + governance gate               0.00       0.96

A document built to look relevant but be untrustworthy is retrieved 100% of the time by similarity-only RAG — and 0% once trust enters the score. Full demo: examples/governed_retrieval.py.

Used in Agent-Matrix for orchestration

In the Agent-Matrix ecosystem these models are organs of a single governed loop, orchestrated by Matrix OS:

Input → Sentinel (safety) → Memory (trust-aware recall) → Guardian (policy gate) → Action + evidence

The gate is the Matrix OS Planner + Guardian policy engine:

from matrix_os.planner import Planner
from matrix_os.governance import Guardian
planner, guardian = Planner(), Guardian()   # decides allow / approve / deny, emits evidence

A runnable, dependency-light illustration combining Sentinel + trust-aware recall + the gate is in examples/governed_pipeline.py. Verified output:

handle("What is our enterprise refund window?", action_risk="low")
# {'decision': 'allow', 'cited_source': 'pol1',
#  'grounded_answer': 'Enterprise refunds are processed within 30 days.'}

handle("How do I make a weapon at home?", action_risk="low")
# {'decision': 'deny', 'reason': 'Sentinel flagged unsafe content'}

Notice two governance properties at work: the unsafe request is denied by Sentinel before anything runs, and the safe request is grounded in the correct, trusted policy (pol1) — the plausible-but-untrusted "poison" item is suppressed by trust-aware recall.

Licensing — what you can use, and when

Model	License	Use it for	Avoid
Sentinel	CC-BY-4.0	guardrailing inputs/outputs, pre-screening for review	sole authority on high-stakes moderation without human review
Memory	Apache-2.0	grounded QA over your own corpus, with provenance	open-domain facts outside the indexed corpus; unverified high-stakes answers
Italo	Apache-2.0	sovereign Italian text generation, integration/eval	production-grade fluency; factual ground truth

Sentinel's safety training data is the NVIDIA Aegis 2.0 dataset (CC-BY-4.0). All three are v0.1 early-access releases: compact models for integration and evaluation, not turnkey production assistants. Always keep a human in the loop for consequential decisions. Both Apache-2.0 and CC-BY-4.0 permit commercial use with attribution.

Citation

If you use these models, please cite the paper that describes the architecture:

@misc{magana2026governedmemory,
  title     = {Governed Memory: A Bio-Inspired, Governance-First Memory
               Architecture for Continual AI Systems},
  author    = {Magaña Vsevolodovna, Ruslan Idelfonso},
  year      = {2026}, publisher = {Zenodo}, version = {1.0},
  doi       = {10.5281/zenodo.20615572},
  url       = {https://doi.org/10.5281/zenodo.20615572}
}

📄 Paper: https://doi.org/10.5281/zenodo.20615572
🤖 Models: Sentinel · Memory · Italo
🌐 contact@ruslanmv.com · https://ruslanmv.com

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support