Josiefied Gemma 4 12B DLPO-ORPO

This repo contains the customized JOSIE-style preference model built on the MLX 4-bit conversion of Google's Gemma 4 12B IT.

This model is trianed using my extension of the pre-exising orpo prefernce algorythm.

What This Is

JOSIE Gemma 4 12B DLPO-ORPO is a LoRA preference-tuned assistant model. It is meant to be a no-BS, direct, creative, practical, and less padded than a normal instruction model, while still staying useful and grounded.

The training objective is dlpo-orpo:

ORPO teaches the model to prefer the chosen answer over the rejected answer without needing a separate frozen reference model.
DLPO: (Directional Latent Preference Optimization) adds a latent preference signal. Instead of only pushing token probabilities around, it also nudges the hidden-state geometry so chosen answers move in a more consistent preference direction than rejected answers.

In plain English: ORPO trains the visible answer. DLPO also trains the model's internal sense of what a better answer feels like.

Dataset

This model has been trained using a custom dataset with 12K preference pairs. The chosen responses carry the target JOSIE style: clear, candid, imaginative, and useful without performative softness. The rejected responses are the contrast set: weaker, flatter, over-refusing, padded, evasive, or less aligned with the intended assistant personality.

Training Run Stats:

algorithm: dlpo-orpo
max examples: 12000
validation: 256
batch size: 1
epochs: 1
learning rate: 2e-5
learning rate scheduler: cosine
max sequence: 1536
ORPO alpha: 0.1
latent weight: 0.08
latent variant: both
pooling: answer_mean
layer: late
LoRA layers: 24
LoRA rank: 16
LoRA scale: 32
LoRA dropout: 0.05

This training run can be recreated using the latest version 2.2.0 of MLX-LM-LoRA.

Research Paper & Benchmarks

A research paper introducing DLPO: Directional Latent Preference Optimization will be released later as well.

Benchmarks for this specific JOSIE Gemma 4 E2B DLPO-ORPO run will also be published after this release, comparing it against the base Gemma 4 12B IT model.

Safety

Unlike the sibling models this one is not uncensored.

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for Goekdeniz-Guelmez/Josiefied-Gemma-4-12B-DLPO-ORPO

Base model

google/gemma-4-12B

Finetuned

google/gemma-4-12B-it

Finetuned

(56)

this model