FastPLMs
Collection
20 items โข Updated
โข 2
The GitHub with the implementation and requirements can be found here.
Synthyra DPLM2 checkpoints are HuggingFace AutoModel compatible and include FastPLMs embedding helpers.
model_dict = {
"Synthyra/DPLM2-150M": "airkingbd/dplm2_150m",
"Synthyra/DPLM2-650M": "airkingbd/dplm2_650m",
"Synthyra/DPLM2-3B": "airkingbd/dplm2_3b",
}
import torch
from transformers import AutoModel, AutoModelForMaskedLM
model_path = "Synthyra/DPLM2-150M"
model = AutoModel.from_pretrained(model_path, trust_remote_code=True, dtype=torch.float16).eval()
tokenizer = model.tokenizer
batch = tokenizer(["MPRTEIN", "MSEQWENCE"], padding=True, return_tensors="pt")
with torch.no_grad():
hidden = model(**batch).last_hidden_state
mlm = AutoModelForMaskedLM.from_pretrained(model_path, trust_remote_code=True, dtype=torch.float16).eval()
with torch.no_grad():
logits = mlm(**batch).logits
DPLM2 infers type_ids automatically from input_ids and attention_mask when they are not provided.
sdpa (PyTorch Scaled Dot Product Attention) is the default.
| Backend | Key | Notes |
|---|---|---|
| PyTorch SDPA | "sdpa" |
Default. Exact numerics, stable on all hardware. |
| Flash Attention | "kernels_flash" |
Fastest on Ampere/Hopper GPUs. Requires pip install kernels (pre-built โ no hours-long compilation). Outputs are not bitwise identical to SDPA due to online softmax reordering; differences are often small but not guaranteed to be inconsequential โ use "sdpa" if exact numerics matter. |
| Flex Attention | "flex" |
Skips padding tokens via block mask โ faster on variable-length batches. Near-exact numerics. First use compiles a Triton kernel (30โ120 s). Best combined with torch.compile. |
| Auto | "auto" |
Picks the best available: kernels_flash โ flex โ sdpa. |
Set via config before loading, or change on the model after loading (DPLM2 propagates the change to all attention layers immediately):
from transformers import AutoConfig, AutoModel
# Option 1: set before loading
config = AutoConfig.from_pretrained("Synthyra/DPLM2-150M", trust_remote_code=True)
config.attn_backend = "flex"
model = AutoModel.from_pretrained("Synthyra/DPLM2-150M", config=config, trust_remote_code=True)
# Option 2: set after loading
model = AutoModel.from_pretrained("Synthyra/DPLM2-150M", trust_remote_code=True)
model.attn_backend = "flex" # propagates to all attention layers in-place
All DPLM2 models inherit EmbeddingMixin, so you can call model.embed_dataset(...) directly.