Josiefied Gemma 4 12B DLPO-ORPO

JOSIE logo

This repo contains the customized JOSIE-style preference model built on the MLX 4-bit conversion of Google's Gemma 4 12B IT.

This model is trianed using my extension of the pre-exising orpo prefernce algorythm.

What This Is

JOSIE Gemma 4 12B DLPO-ORPO is a LoRA preference-tuned assistant model. It is meant to be a no-BS, direct, creative, practical, and less padded than a normal instruction model, while still staying useful and grounded.

The training objective is dlpo-orpo:

  • ORPO teaches the model to prefer the chosen answer over the rejected answer without needing a separate frozen reference model.
  • DLPO: (Directional Latent Preference Optimization) adds a latent preference signal. Instead of only pushing token probabilities around, it also nudges the hidden-state geometry so chosen answers move in a more consistent preference direction than rejected answers.

In plain English: ORPO trains the visible answer. DLPO also trains the model's internal sense of what a better answer feels like.

Dataset

This model has been trained using a custom dataset with 12K preference pairs. The chosen responses carry the target JOSIE style: clear, candid, imaginative, and useful without performative softness. The rejected responses are the contrast set: weaker, flatter, over-refusing, padded, evasive, or less aligned with the intended assistant personality.

Training Run Stats:

  • algorithm: dlpo-orpo
  • max examples: 12000
  • validation: 256
  • batch size: 1
  • epochs: 1
  • learning rate: 2e-5
  • learning rate scheduler: cosine
  • max sequence: 1536
  • ORPO alpha: 0.1
  • latent weight: 0.08
  • latent variant: both
  • pooling: answer_mean
  • layer: late
  • LoRA layers: 24
  • LoRA rank: 16
  • LoRA scale: 32
  • LoRA dropout: 0.05

This training run can be recreated using the latest version 2.2.0 of MLX-LM-LoRA.

Research Paper & Benchmarks

A research paper introducing DLPO: Directional Latent Preference Optimization will be released later as well.

Benchmarks for this specific JOSIE Gemma 4 E2B DLPO-ORPO run will also be published after this release, comparing it against the base Gemma 4 12B IT model.

Safety

Unlike the sibling models this one is not uncensored.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Goekdeniz-Guelmez/Josiefied-Gemma-4-12B-DLPO-ORPO

Finetuned
(56)
this model