KoHRM-Text-1.4B

Language / ์–ธ์–ด: English | ํ•œ๊ตญ์–ด

English

KoHRM-Text-1.4B is a scratch-pretrained Korean/English/code/terminal/tool-use model built from the sapientinc/HRM-Text PrefixLM training stack.

This is not a continued finetune of sapientinc/HRM-Text-1B. It uses a new Korean/terminal-oriented 131K byte-level BPE tokenizer and a new scratch training run.

Current Status

This repository is the public KoHRM-Text 1.4B base / pre-SFT model family anchor. Terminal-specialized LoRA adapters and full-SFT checkpoints are published as separate Hugging Face repos that point back to this model through base_model.

The main branch is the base model export. For terminal next-action use, prefer the fine-tuned checkpoints listed below rather than the base checkpoint directly.

Terminal Fine-Tuning Lineage

The base model itself is weak on TB2-lite terminal next-action JSON without task-specific fine-tuning. The useful terminal behavior comes from adapters and full SFT on top of this base.

Model / Adapter Relation TB2-lite Score Cmd F1 Precision Recall First Cmd Valid JSON
KoHRM-Text-1.4B-stage4d direct base/direct eval 11.48 0.1148 0.1995 0.0961 5.9% 38.9%
KoHRM-Text-1.4B-stage4d + terminal-tool-core-r64 LoRA PEFT adapter 29.11 0.2911 0.3988 0.2768 22.1% 63.4%
LLM-OS-Models/KoHRM-Text-1.4B-FullSFT-Top2-Terminal-Tool-Merge-Epoch1 full fine-tune 31.59 0.3159 0.3859 0.3415 24.8% 73.3%
LLM-OS-Models/KoHRM-Text-1.4B-FullSFT-LFM25-Terminal-ToolBench-Epoch1 full fine-tune 38.56 0.3856 0.4262 0.4341 37.0% 55.1%
LLM-OS-Models/KoHRM-Text-1.4B-FullSFT-LFM25-Terminal-ToolBench-Epoch2 full fine-tune 45.90 0.4590 0.5031 0.5098 44.9% 68.3%
LLM-OS-Models/KoHRM-Text-1.4B-FullSFT-LFM25-Terminal-ToolBench-Epoch3 full fine-tune 43.57 0.4357 0.4703 0.5003 45.5% 61.7%

Score = 100 * avg_command_f1 on the corrected 303-step TB2-lite full replay set.

The current best KoHRM terminal checkpoint is:

https://huggingface.co/LLM-OS-Models/KoHRM-Text-1.4B-FullSFT-LFM25-Terminal-ToolBench-Epoch2

Epoch2 remains the current best KoHRM terminal checkpoint. Epoch3 was evaluated as a continuation from Epoch2 and scored 43.57, -2.33 versus Epoch2. Epoch3 slightly improved First Cmd to 45.5%, but Cmd F1, precision, recall, and Valid JSON all regressed, so Epoch2 is kept as the representative terminal checkpoint.

Strong Epoch2 areas are data_querying (0.6881 F1), data_science (0.4901), debugging (0.4857), math (0.4845), software_engineering (0.4770), and file_operations (0.4710). Remaining weak areas are swe (0.3590), data_processing (0.4017), dependency_management (0.4025), security (0.4220), and model_training (0.4283). The main remaining gap to the LFM2.5 top checkpoints is first-action accuracy and late-step command coverage.

Training Method At A Glance

KoHRM-Text is best understood as instruction pretraining from scratch.

It is not ordinary raw-text causal LM pretraining, and it is not only a small SFT pass on top of an existing base model.

raw data -> tokenizer -> V1Dataset -> PrefixLM batches
         -> HRM H/L recurrence -> LM head -> response-only loss

The input context is handled as a PrefixLM prefix:

instruction / prefix: bidirectional attention, no loss
response:             causal attention, response-only CE loss

The architecture keeps the upstream HRM-Text recurrent design:

H module: slower strategic state
L module: faster execution state
schedule: H2L3 recurrent computation

For a readable full explanation of the training method, architecture, PT/SFT distinction, staged continuation, and checkpoint naming, see the project document:

MODEL_TRAINING_ARCHITECTURE_GUIDE_2026-05-28.md in https://github.com/LLM-OS-Models/KoHRM-text

Important Compatibility Note

The public repo currently contains the converted model weights and tokenizer, but it does not yet include a Hugging Face trust_remote_code modeling implementation for HrmTextForCausalLM.

What works today:

  • Download the latest public weights.
  • Load the tokenizer directly with tokenizers.Tokenizer.from_file("tokenizer.json").
  • Inspect config.json.
  • Verify model.safetensors on CPU or Colab T4.

What is not supported yet in plain Transformers:

  • AutoModelForCausalLM.from_pretrained("LLM-OS-Models/KoHRM-Text-1.4B")
  • One-line hosted text generation from this repo

Expected reason: model_type: "hrm_text" is a custom HRM-Text architecture. Public generation will require adding the compatible HrmTextForCausalLM remote-code files to this model repo or releasing a standard wrapper.

Model Details

Field Value
Model id LLM-OS-Models/KoHRM-Text-1.4B
Standard name KoHRM-Text-1.4B
Training origin scratch
Architecture family HRM-Text PrefixLM
Architecture size XL
Parameters 1,384,120,320
Context length 4,096 tokens
Training dtype bfloat16
Public export dtype bfloat16 EMA safetensors
Tokenizer byte-level BPE, NFC normalization
Vocabulary size 131,072
Objective PrefixLM response-only loss
Optimizer Adam-atan2 from upstream HRM-Text
EMA 0.9999

Converted config highlights:

{
  "model_type": "hrm_text",
  "architectures": ["HrmTextForCausalLM"],
  "vocab_size": 131072,
  "hidden_size": 1536,
  "num_hidden_layers": 32,
  "num_attention_heads": 12,
  "max_position_embeddings": 4096,
  "prefix_lm": true
}

Compared With The HRM-Text Paper

This run can take longer than the paper recipe even on 8 x H200 because the setup is not identical:

  • The paper reference used 16 x H100; this run uses 8 x H200.
  • KoHRM uses a larger 131K tokenizer vocabulary, compared with the upstream 65K tokenizer.
  • The public KoHRM size is about 1.38B parameters.
  • The stable long-run batch is 180,224 tokens/step after OOM probing; larger batches were possible briefly but not chosen for reliability.
  • The continuation includes extra Korean, terminal, tool-call, legal, finance, wiki, and repeated HRM-cleaned stages.

This does not automatically guarantee better benchmark scores. The expected upside is domain-specific: Korean tokenization efficiency, Korean legal/finance/wiki coverage, terminal trajectories, tool-call formatting, and code-oriented behavior should have a better chance than the upstream English/general checkpoint. Final claims require evaluation after the planned continuation and SFT finish.

Tokenizer

The tokenizer was trained for Korean, English, code, shell/terminal text, and JSON/tool-call formats. It keeps common chat/tool special tokens as stable single tokens where possible.

Sample bucket chars/token
Korean general text 2.60
Korean legal text 2.36
Korean terminal instruction 2.18
shell command 2.68
tool-call JSON 3.32
Python code 3.37
English 4.40

Formatting tokens:

<|im_start|>         instruction start
<|im_end|>           instruction end
<|box_end|>          response/end marker
<|object_ref_start|> direct condition
<|object_ref_end|>   chain-of-thought style condition
<|quad_start|>       noisy condition
<|quad_end|>         synthetic condition

Prompt format used by the project-side inference code:

<|im_start|><|object_ref_start|>YOUR_PROMPT_HERE<|im_end|>

Colab T4 Long Knowledge Probe

A ready-to-run Colab notebook is available in the project repo:

https://github.com/LLM-OS-Models/KoHRM-text/blob/main/notebooks/KoHRM_Text_1_4B_Colab_T4_Long_Knowledge_Probe.ipynb

The notebook downloads the latest public files and runs long-form generation prompts that match the current pretraining data style. It is intended to inspect knowledge signal, Korean fluency, repetition, and runtime correctness after pretraining-stage checkpoints.

This is not a final chat/SFT benchmark. It intentionally avoids format-constrained SFT-style tests because the public checkpoint is still a pretraining-stage model and has not been behavior-aligned by SFT/LoRA/RL.

It intentionally avoids transformers, AutoTokenizer, and AutoModelForCausalLM. Instead, it uses:

  • tokenizers.Tokenizer.from_file("tokenizer.json")
  • safetensors.torch.load_file("model.safetensors")
  • kohrm_colab_generate.py, a small PyTorch SDPA runtime for the HRM-Text architecture
!pip -q install -U huggingface_hub hf_transfer safetensors
!pip -q install --force-reinstall -q "tokenizers>=0.22.0,<0.23.1"
from pathlib import Path
import json
import importlib.util
import sys
from huggingface_hub import snapshot_download
from tokenizers import Tokenizer

repo_id = "LLM-OS-Models/KoHRM-Text-1.4B"

repo_dir = Path(snapshot_download(
    repo_id,
    revision="main",
    allow_patterns=[
        "README.md",
        "config.json",
        "tokenizer.json",
        "tokenizer_config.json",
        "special_tokens_map.json",
        "model.safetensors",
        "kohrm_colab_generate.py",
    ],
))

print("Downloaded to:", repo_dir)
config = json.loads((repo_dir / "config.json").read_text())
print("model_type:", config["model_type"])
print("hidden_size:", config["hidden_size"])
print("vocab_size:", config["vocab_size"])
print("context:", config["max_position_embeddings"])

spec = importlib.util.spec_from_file_location(
    "kohrm_colab_generate",
    repo_dir / "kohrm_colab_generate.py",
)
kohrm = importlib.util.module_from_spec(spec)
sys.modules["kohrm_colab_generate"] = kohrm
spec.loader.exec_module(kohrm)

model, tokenizer, cfg = kohrm.load_kohrm(repo_dir, max_gpu_memory_gib=14.0)

settings = dict(
    max_seq_len=1536,
    temperature=0.65,
    top_p=0.92,
    repetition_penalty=1.05,
    no_repeat_ngram_size=0,
    condition="direct",
)

prompts = {
    "finance": "ํ™˜์œจ ๋ณ€๋™์ด ๊ฐœ์ธ ํˆฌ์ž์— ๋ฏธ์น˜๋Š” ์˜ํ–ฅ๊ณผ ๋Œ€๋น„ ์ „๋žต์€ ๋ฌด์—‡์ธ๊ฐ€์š”?",
    "kowiki_style": """๋‹ค์Œ์€ ํ•œ๊ตญ์–ด ์œ„ํ‚ค๋ฐฑ๊ณผ ๋ฌธ์„œ ์›๋ฌธ ์ผ๋ถ€์ž…๋‹ˆ๋‹ค. ๋ฐฑ๊ณผ์‚ฌ์ „์‹ ํ•œ๊ตญ์–ด, ๊ณ ์œ ๋ช…์‚ฌ, ๋‚ ์งœ, ๊ธฐ์ˆ /์‚ฌํšŒ/๋ฌธํ™” ์ง€์‹์„ ๊ทธ๋Œ€๋กœ ํ•™์Šตํ•˜์‹ญ์‹œ์˜ค.

[๋ฌธ์„œ๋ช…]
ํ›ˆ๋ฏผ์ •์Œ

[๋ถ€๋ถ„]
1/1""",
    "legal_style": """๋‹ค์Œ์€ ๋Œ€ํ•œ๋ฏผ๊ตญ ๋ฒ•๋ น/์ž์น˜๋ฒ•๊ทœ ์›๋ฌธ ์ผ๋ถ€์ž…๋‹ˆ๋‹ค. ๋ฒ•๋ฅ  ํ•œ๊ตญ์–ด, ์กฐ๋ฌธ ๊ตฌ์กฐ, ๋ฒˆํ˜ธ ์ฒด๊ณ„, ๊ธฐ๊ด€๋ช…, ์‹œํ–‰์ผ์ž ํ‘œํ˜„์„ ๊ทธ๋Œ€๋กœ ํ•™์Šตํ•˜์‹ญ์‹œ์˜ค.

[์ž๋ฃŒ์ข…๋ฅ˜]
law

[๋ฌธ์„œ๋ช…]
ํ˜•๋ฒ•

[๊ฒฝ๋กœ]
kr/ํ˜•๋ฒ•/๋ฒ•๋ฅ .md

[๋ถ€๋ถ„]
1/1""",
}

for name, prompt in prompts.items():
    print("=" * 80)
    print(name)
    output = kohrm.generate_from_loaded(
        model,
        tokenizer,
        cfg,
        prompt,
        max_new_tokens=384,
        min_new_tokens=160,
        **settings,
    )
    print(output)

Expected result:

  • model_type should be hrm_text.
  • vocab_size should be 131072.
  • The helper should load the 1.38B public model.safetensors export.
  • On Colab T4, generation runs in fp16 through PyTorch scaled-dot-product attention.
  • First generation can take a few minutes because it downloads and loads the full weight file.
  • This is a rolling pretraining checkpoint. Compare later checkpoints with the same long prompts before drawing final conclusions.

Prompt format used by the helper, matching upstream InferenceCheckpoint.tokenize_prompt():

<|im_start|><|object_ref_start|>PROMPT<|im_end|>

Plain AutoModelForCausalLM.generate() is still not the supported path. This model is a custom hrm_text architecture, so ordinary Transformers generation requires a future trust_remote_code wrapper. Use the notebook/helper above for public model.safetensors generation today.

Internal Raw-Checkpoint Generation

For training-machine debugging and exact raw FSDP2 checkpoint recovery, the project still includes the upstream-style inference path:

  • simple_inference_engine.py
  • raw checkpoints from LLM-OS-Models/KoHRM-Text-1.4B-raw-checkpoints
  • CUDA/FlashAttention-oriented execution

That path is mainly for internal continuation/evaluation, not the easiest Colab test.

Training Data

Prepared data artifacts are uploaded to:

https://huggingface.co/datasets/LLM-OS-Models/KoHRM-Text-1.4B-prepared-data

The training objective is PrefixLM response-only loss. Instruction/prompt tokens are visible as context, while loss is applied to the response span.

Major prepared data groups:

Dataset group Tokens Use
koterm_pretrain_mix_v1 711.3M stage-0/stage0b
HRM cleaned fast-cap stage1/stage1b 14.55B HRM-style instruction pretraining
HRM cleaned full/no-cap stage2 14.55B completed continuation
HRM cleaned full/no-cap extra stage2b 14.55B active continuation
Local terminal conversations 9.39B terminal/code/tool-heavy continuation
Korean tool/legal/wiki/finance mix 3.02B Korean domain and tool continuation
BCAI Finance Korean 857.7M Korean finance/domain data
Korean legal/admin task data 629.0M Korean legal/admin data
Korean Wikipedia 462.5M Korean general text
ToolBench train tool-call data 127.0M tool-call pretraining
SWE-ZERO + GLM reasoning subsets 251.2M code/reasoning data

Evaluation-like datasets are excluded where identified, including ToolBench eval, Terminal Bench style evaluation data, and benchmark-oriented chi-bench data.

Training Run

The current run uses staged continuation:

stage0
-> stage0b
-> stage1
-> stage2
-> stage3
-> stage4
-> stage1b
-> stage2b
-> stage3b
-> stage4b
-> stage1c
-> stage2c
-> stage3c
-> stage4c

The checkpoint carries model weights, optimizer state, EMA weights, and recurrent carry state. resume_step_offset and total_steps_override are used so the learning-rate schedule follows the intended longer run instead of resetting at each stage.

As of 2026-05-27, stage2b is active. The continuation watcher is scheduled to launch stage3b -> stage4b -> stage1c -> stage2c -> stage3c -> stage4c after each completed checkpoint. The handoff reads the actual epoch_1_info.json global_step from each completed checkpoint before starting the next stage.

Intended Use

This checkpoint is intended for:

  • continued pretraining experiments
  • Korean tokenizer and HRM-Text architecture experiments
  • terminal/tool-call/code pretraining research
  • checkpoint conversion and evaluation work

It is not yet intended as a finished assistant model.

Limitations

  • This is an intermediate checkpoint, not a final aligned instruct model.
  • The full planned continuation has not finished.
  • Final SFT and safety tuning have not been completed.
  • Public benchmark scores for this new checkpoint are not final.
  • Plain Transformers generation requires adding the custom hrm_text modeling wrapper or remote-code files.
  • Tool-call JSON validity and terminal action safety must be evaluated before production use.

Citation

This work builds on HRM-Text:

ํ•œ๊ตญ์–ด

KoHRM-Text-1.4B๋Š” sapientinc/HRM-Text์˜ PrefixLM ํ•™์Šต ์Šคํƒ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ์ฒ˜์Œ๋ถ€ํ„ฐ ํ•™์Šต ์ค‘์ธ ํ•œ๊ตญ์–ด/์˜์–ด/์ฝ”๋“œ/ํ„ฐ๋ฏธ๋„/ํˆด์ฝœ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค.

์ด ๋ชจ๋ธ์€ sapientinc/HRM-Text-1B๋ฅผ ์ด์–ด์„œ ํŒŒ์ธํŠœ๋‹ํ•œ ๋ชจ๋ธ์ด ์•„๋‹™๋‹ˆ๋‹ค. ํ•œ๊ตญ์–ด์™€ ํ„ฐ๋ฏธ๋„/ํˆด์ฝœ ํ˜•์‹์— ๋งž์ถฐ ์ƒˆ๋กœ ๋งŒ๋“  131K byte-level BPE tokenizer๋ฅผ ์‚ฌ์šฉํ•˜๋ฉฐ, ๊ฐ€์ค‘์น˜๋„ scratch pretraining์œผ๋กœ ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค.

ํ˜„์žฌ ์ƒํƒœ

์ด ์ €์žฅ์†Œ๋Š” ์ตœ์‹  ๊ณต๊ฐœ ๋ณ€ํ™˜๋ณธ์„ ๊ณ„์† ๋ฎ์–ด์“ฐ๋Š” rolling latest model repo์ž…๋‹ˆ๋‹ค. ํ•™์Šต์€ ์•„์ง ์ง„ํ–‰ ์ค‘์ž…๋‹ˆ๋‹ค.

  • ๋ฉ”์ธ ๋ชจ๋ธ repo: LLM-OS-Models/KoHRM-Text-1.4B
  • ํ˜„์žฌ ๊ณต๊ฐœ ํŒŒ์ผ: model.safetensors, config.json, tokenizer ํŒŒ์ผ, README.md
  • raw FSDP2 resume checkpoint: LLM-OS-Models/KoHRM-Text-1.4B-raw-checkpoints
  • prepared data: LLM-OS-Models/KoHRM-Text-1.4B-prepared-data
  • ํ”„๋กœ์ ํŠธ ์ฝ”๋“œ: https://github.com/LLM-OS-Models/KoHRM-text
  • ์›๋ณธ HRM-Text ์ฝ”๋“œ: https://github.com/sapientinc/HRM-Text
  • HRM-Text ๋…ผ๋ฌธ: https://arxiv.org/html/2605.20613
  • tokenizer repo: LLM-OS-Models/HRM-Text-Ko-Terminal-Tokenizer-131K

์ตœ์‹  ๊ณต๊ฐœ weight๋ฅผ ํ…Œ์ŠคํŠธํ•˜๋ ค๋ฉด revision="main"์œผ๋กœ ๋‹ค์šด๋กœ๋“œํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค. ํ•™์Šต ์ค‘ 10,000 step ๋‹จ์œ„๋กœ ์ƒˆ checkpoint๊ฐ€ ๋ณ€ํ™˜๋˜์–ด ์˜ฌ๋ผ์˜ค๋ฉด ๊ฐ™์€ ํŒŒ์ผ๋ช…์ด ์ตœ์‹  EMA safetensors๋กœ ๊ฐฑ์‹ ๋ฉ๋‹ˆ๋‹ค.

ํ•™์Šต ๋ฐฉ์‹ ํ•œ๋ˆˆ์— ๋ณด๊ธฐ

KoHRM-Text๋Š” scratch instruction pretraining์œผ๋กœ ๋ณด๋Š” ๊ฒƒ์ด ๊ฐ€์žฅ ์ •ํ™•ํ•ฉ๋‹ˆ๋‹ค.

์ผ๋ฐ˜์ ์ธ raw-text causal LM ์‚ฌ์ „ํ•™์Šต๋„ ์•„๋‹ˆ๊ณ , ์ด๋ฏธ ์™„์„ฑ๋œ base model ์œ„์— ์งง๊ฒŒ ์–น๋Š” SFT๋งŒ๋„ ์•„๋‹™๋‹ˆ๋‹ค.

raw data -> tokenizer -> V1Dataset -> PrefixLM batches
         -> HRM H/L recurrence -> LM head -> response-only loss

์ž…๋ ฅ ์ปจํ…์ŠคํŠธ๋Š” PrefixLM prefix๋กœ ์ฒ˜๋ฆฌํ•ฉ๋‹ˆ๋‹ค.

instruction / prefix: ์–‘๋ฐฉํ–ฅ attention, loss ์—†์Œ
response:             causal attention, response-only CE loss

์•„ํ‚คํ…์ฒ˜๋Š” ์›๋ณธ HRM-Text recurrent design์„ ์œ ์ง€ํ•ฉ๋‹ˆ๋‹ค.

H module: ๋А๋ฆฌ๊ฒŒ ๋ณ€ํ•˜๋Š” ์ „๋žต state
L module: ๋น ๋ฅด๊ฒŒ ๋ณ€ํ•˜๋Š” ์‹คํ–‰ state
schedule: H2L3 recurrent computation

ํ•™์Šต ๋ฐฉ์‹, ์•„ํ‚คํ…์ฒ˜, PT/SFT ์ฐจ์ด, staged continuation, checkpoint ์ด๋ฆ„์„ ์‰ฝ๊ฒŒ ํ’€์–ด ์“ด ์ „์ฒด ์„ค๋ช…์€ ํ”„๋กœ์ ํŠธ ๋ฌธ์„œ๋ฅผ ๊ธฐ์ค€์œผ๋กœ ๋ณด๋ฉด ๋ฉ๋‹ˆ๋‹ค.

MODEL_TRAINING_ARCHITECTURE_GUIDE_2026-05-28.md in https://github.com/LLM-OS-Models/KoHRM-text

์ค‘์š”ํ•œ ํ˜ธํ™˜์„ฑ ์•ˆ๋‚ด

ํ˜„์žฌ ๊ณต๊ฐœ repo์—๋Š” ๋ณ€ํ™˜๋œ model weight์™€ tokenizer๊ฐ€ ์žˆ์ง€๋งŒ, ์•„์ง Hugging Face trust_remote_code์šฉ HrmTextForCausalLM ๊ตฌํ˜„ ํŒŒ์ผ์€ ํฌํ•จ๋˜์–ด ์žˆ์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

ํ˜„์žฌ ๋ฐ”๋กœ ๊ฐ€๋Šฅํ•œ ๊ฒƒ:

  • ์ตœ์‹  ๊ณต๊ฐœ weight ๋‹ค์šด๋กœ๋“œ
  • tokenizers.Tokenizer.from_file("tokenizer.json")๋กœ tokenizer ๋กœ๋“œ
  • config.json ํ™•์ธ
  • CPU ๋˜๋Š” Colab T4์—์„œ model.safetensors ๋ฌด๊ฒฐ์„ฑ ํ™•์ธ

์•„์ง ์ผ๋ฐ˜ Transformers์—์„œ ๋ฐ”๋กœ ์•ˆ ๋˜๋Š” ๊ฒƒ:

  • AutoModelForCausalLM.from_pretrained("LLM-OS-Models/KoHRM-Text-1.4B")
  • ์ด repo๋งŒ์œผ๋กœ one-line text generation ์‹คํ–‰

์ด์œ ๋Š” model_type: "hrm_text"๊ฐ€ custom HRM-Text architecture์ด๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค. ๊ณต๊ฐœ generation์„ ํ•˜๋ ค๋ฉด ์ด model repo์— HrmTextForCausalLM remote-code wrapper๊ฐ€ ์ถ”๊ฐ€๋˜์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

๋ชจ๋ธ ์ƒ์„ธ

ํ•ญ๋ชฉ ๊ฐ’
๋ชจ๋ธ ID LLM-OS-Models/KoHRM-Text-1.4B
ํ‘œ์ค€ ์ด๋ฆ„ KoHRM-Text-1.4B
ํ•™์Šต ์ถœ๋ฐœ์  scratch
์•„ํ‚คํ…์ฒ˜ ๊ณ„์—ด HRM-Text PrefixLM
์•„ํ‚คํ…์ฒ˜ ํฌ๊ธฐ XL
ํŒŒ๋ผ๋ฏธํ„ฐ 1,384,120,320
์ปจํ…์ŠคํŠธ ๊ธธ์ด 4,096 tokens
ํ•™์Šต dtype bfloat16
๊ณต๊ฐœ ๋ณ€ํ™˜๋ณธ dtype bfloat16 EMA safetensors
tokenizer byte-level BPE, NFC normalization
vocabulary size 131,072
objective PrefixLM response-only loss
optimizer HRM-Text์˜ Adam-atan2
EMA 0.9999

๋ณ€ํ™˜๋œ config ์ฃผ์š” ๊ฐ’:

{
  "model_type": "hrm_text",
  "architectures": ["HrmTextForCausalLM"],
  "vocab_size": 131072,
  "hidden_size": 1536,
  "num_hidden_layers": 32,
  "num_attention_heads": 12,
  "max_position_embeddings": 4096,
  "prefix_lm": true
}

HRM-Text ๋…ผ๋ฌธ ๋Œ€๋น„

ํ˜„์žฌ run์€ ๋…ผ๋ฌธ recipe๋ณด๋‹ค ๋” ์˜ค๋ž˜ ๊ฑธ๋ฆด ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์„ค์ •์ด ์™„์ „ํžˆ ๊ฐ™์ง€ ์•Š๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.

  • ๋…ผ๋ฌธ ๊ธฐ์ค€์€ 16 x H100์ด๊ณ , ํ˜„์žฌ run์€ 8 x H200์ž…๋‹ˆ๋‹ค.
  • KoHRM์€ ์›๋ณธ 65K tokenizer๋ณด๋‹ค ํฐ 131K tokenizer vocab์„ ์”๋‹ˆ๋‹ค.
  • ๊ณต๊ฐœ KoHRM ํฌ๊ธฐ๋Š” ์•ฝ 1.38B parameters์ž…๋‹ˆ๋‹ค.
  • ์•ˆ์ • ์žฅ๊ธฐ run batch๋Š” OOM probe ์ดํ›„ 180,224 tokens/step์œผ๋กœ ์žก์•˜์Šต๋‹ˆ๋‹ค. ๋” ํฐ batch๋Š” ์ดˆ๋ฐ˜์— ๊ฐ€๋Šฅํ•ด ๋ณด์—ฌ๋„ ์žฅ๊ธฐ ์•ˆ์ •์„ฑ์ด ๋–จ์–ด์กŒ์Šต๋‹ˆ๋‹ค.
  • ํ•œ๊ตญ์–ด, ํ„ฐ๋ฏธ๋„, ํˆด์ฝœ, ๋ฒ•๋ฅ , ๊ธˆ์œต, ์œ„ํ‚ค, HRM-cleaned ๋ฐ˜๋ณต stage๊ฐ€ ์ถ”๊ฐ€๋์Šต๋‹ˆ๋‹ค.

์ด๊ฒƒ์ด ์ž๋™์œผ๋กœ ๋ชจ๋“  benchmark ์ ์ˆ˜ ์ƒ์Šน์„ ๋ณด์žฅํ•˜์ง€๋Š” ์•Š์Šต๋‹ˆ๋‹ค. ๋‹ค๋งŒ ํ•œ๊ตญ์–ด ํ† ํฌ๋‚˜์ด์ € ํšจ์œจ, ํ•œ๊ตญ์–ด ๋ฒ•๋ฅ /๊ธˆ์œต/์œ„ํ‚ค coverage, ํ„ฐ๋ฏธ๋„ trajectory, tool-call formatting, code-oriented behavior ์ชฝ์€ ์›๋ณธ ์˜์–ด/general checkpoint๋ณด๋‹ค ์ข‹์•„์งˆ ๊ฐ€๋Šฅ์„ฑ์ด ์žˆ์Šต๋‹ˆ๋‹ค. ์ตœ์ข… ์ฃผ์žฅ์€ continuation๊ณผ SFT๊ฐ€ ๋๋‚œ ๋’ค ํ‰๊ฐ€๋กœ ํ™•์ธํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

ํ† ํฌ๋‚˜์ด์ €

ํ† ํฌ๋‚˜์ด์ €๋Š” ํ•œ๊ตญ์–ด, ์˜์–ด, ์ฝ”๋“œ, shell/terminal ํ…์ŠคํŠธ, JSON/tool-call ํ˜•์‹์„ ๊ณ ๋ คํ•ด์„œ ๋งŒ๋“ค์—ˆ์Šต๋‹ˆ๋‹ค. ์ž์ฃผ ์“ฐ๋Š” chat/tool special token์€ ๊ฐ€๋Šฅํ•œ ํ•œ ์•ˆ์ •์ ์ธ ๋‹จ์ผ token์œผ๋กœ ์œ ์ง€ํ•ฉ๋‹ˆ๋‹ค.

์ƒ˜ํ”Œ ์ข…๋ฅ˜ chars/token
ํ•œ๊ตญ์–ด ์ผ๋ฐ˜ 2.60
ํ•œ๊ตญ์–ด ๋ฒ•๋ฅ  2.36
ํ•œ๊ตญ์–ด ํ„ฐ๋ฏธ๋„ ์ง€์‹œ 2.18
shell command 2.68
tool-call JSON 3.32
Python code 3.37
์˜์–ด 4.40

ํฌ๋งท token:

<|im_start|>         instruction ์‹œ์ž‘
<|im_end|>           instruction ์ข…๋ฃŒ
<|box_end|>          response/end marker
<|object_ref_start|> direct condition
<|object_ref_end|>   chain-of-thought style condition
<|quad_start|>       noisy condition
<|quad_end|>         synthetic condition

ํ”„๋กœ์ ํŠธ ๋‚ด๋ถ€ inference code๊ฐ€ ์“ฐ๋Š” prompt ํ˜•์‹:

<|im_start|><|object_ref_start|>์—ฌ๊ธฐ์—_ํ”„๋กฌํ”„ํŠธ๋ฅผ_๋„ฃ์Šต๋‹ˆ๋‹ค<|im_end|>

Colab T4 ๊ธด ์ง€์‹ ์ƒ์„ฑ ํ™•์ธ

๋ฐ”๋กœ ์‹คํ–‰ํ•  ์ˆ˜ ์žˆ๋Š” Colab ๋…ธํŠธ๋ถ์€ project repo์— ์žˆ์Šต๋‹ˆ๋‹ค.

https://github.com/LLM-OS-Models/KoHRM-text/blob/main/notebooks/KoHRM_Text_1_4B_Colab_T4_Long_Knowledge_Probe.ipynb

์ด ๋…ธํŠธ๋ถ์€ Colab T4์—์„œ ์ตœ์‹  ๊ณต๊ฐœ ํŒŒ์ผ์„ ๋‹ค์šด๋กœ๋“œํ•˜๊ณ  ํ˜„์žฌ ์‚ฌ์ „ํ•™์Šต ๋ฐ์ดํ„ฐ์™€ ๊ฐ™์€ ์Šคํƒ€์ผ์˜ ๊ธด ์ƒ์„ฑ prompt๋ฅผ ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค. ๋ชฉ์ ์€ pretraining stage checkpoint์˜ ์ง€์‹ ์‹ ํ˜ธ, ํ•œ๊ตญ์–ด ์œ ์ฐฝ์„ฑ, ๋ฐ˜๋ณต ์—ฌ๋ถ€, ๊ณต๊ฐœ model.safetensors runtime ๋™์ž‘์„ ์ง์ ‘ ํ™•์ธํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

์ด ๋…ธํŠธ๋ถ์€ ์ตœ์ข… chat/SFT benchmark๊ฐ€ ์•„๋‹™๋‹ˆ๋‹ค. ๊ณต๊ฐœ checkpoint๋Š” ์•„์ง SFT/LoRA/RL๋กœ ํ–‰๋™ ์ •๋ ฌ์„ ๋๋‚ธ ๋ชจ๋ธ์ด ์•„๋‹ˆ๋ฏ€๋กœ, ํฌ๋งท ์ค€์ˆ˜ ์ค‘์‹ฌ์˜ SFT์‹ ๊ณผ์ œ๋Š” ์˜๋„์ ์œผ๋กœ ์ œ์™ธํ–ˆ์Šต๋‹ˆ๋‹ค.

์ผ๋ถ€ Colab ํ™˜๊ฒฝ์—์„œ transformers๊ฐ€ torchvision::nms import ์˜ค๋ฅ˜๋ฅผ ๋‚ด๊ฑฐ๋‚˜ custom architecture๋ฅผ ๋ชป ์ฐพ๋Š” ๋ฌธ์ œ๊ฐ€ ์ƒ๊ธธ ์ˆ˜ ์žˆ์œผ๋ฏ€๋กœ, ์ด ๋…ธํŠธ๋ถ์€ AutoTokenizer์™€ AutoModelForCausalLM์„ ์“ฐ์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ๋Œ€์‹  ์•„๋ž˜ ๊ฒฝ๋กœ๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

  • tokenizers.Tokenizer.from_file("tokenizer.json")
  • safetensors.torch.load_file("model.safetensors")
  • HRM-Text ๊ตฌ์กฐ๋ฅผ ์ง์ ‘ ๊ตฌํ˜„ํ•œ kohrm_colab_generate.py
!pip -q install -U huggingface_hub hf_transfer safetensors
!pip -q install --force-reinstall -q "tokenizers>=0.22.0,<0.23.1"
from pathlib import Path
import json
import importlib.util
import sys
from huggingface_hub import snapshot_download
from tokenizers import Tokenizer

repo_id = "LLM-OS-Models/KoHRM-Text-1.4B"

repo_dir = Path(snapshot_download(
    repo_id,
    revision="main",
    allow_patterns=[
        "README.md",
        "config.json",
        "tokenizer.json",
        "tokenizer_config.json",
        "special_tokens_map.json",
        "model.safetensors",
        "kohrm_colab_generate.py",
    ],
))

print("Downloaded to:", repo_dir)
config = json.loads((repo_dir / "config.json").read_text())
print("model_type:", config["model_type"])
print("hidden_size:", config["hidden_size"])
print("vocab_size:", config["vocab_size"])
print("context:", config["max_position_embeddings"])

spec = importlib.util.spec_from_file_location(
    "kohrm_colab_generate",
    repo_dir / "kohrm_colab_generate.py",
)
kohrm = importlib.util.module_from_spec(spec)
sys.modules["kohrm_colab_generate"] = kohrm
spec.loader.exec_module(kohrm)

model, tokenizer, cfg = kohrm.load_kohrm(repo_dir, max_gpu_memory_gib=14.0)

settings = dict(
    max_seq_len=1536,
    temperature=0.65,
    top_p=0.92,
    repetition_penalty=1.05,
    no_repeat_ngram_size=0,
    condition="direct",
)

prompts = {
    "finance": "ํ™˜์œจ ๋ณ€๋™์ด ๊ฐœ์ธ ํˆฌ์ž์— ๋ฏธ์น˜๋Š” ์˜ํ–ฅ๊ณผ ๋Œ€๋น„ ์ „๋žต์€ ๋ฌด์—‡์ธ๊ฐ€์š”?",
    "kowiki_style": """๋‹ค์Œ์€ ํ•œ๊ตญ์–ด ์œ„ํ‚ค๋ฐฑ๊ณผ ๋ฌธ์„œ ์›๋ฌธ ์ผ๋ถ€์ž…๋‹ˆ๋‹ค. ๋ฐฑ๊ณผ์‚ฌ์ „์‹ ํ•œ๊ตญ์–ด, ๊ณ ์œ ๋ช…์‚ฌ, ๋‚ ์งœ, ๊ธฐ์ˆ /์‚ฌํšŒ/๋ฌธํ™” ์ง€์‹์„ ๊ทธ๋Œ€๋กœ ํ•™์Šตํ•˜์‹ญ์‹œ์˜ค.

[๋ฌธ์„œ๋ช…]
ํ›ˆ๋ฏผ์ •์Œ

[๋ถ€๋ถ„]
1/1""",
    "legal_style": """๋‹ค์Œ์€ ๋Œ€ํ•œ๋ฏผ๊ตญ ๋ฒ•๋ น/์ž์น˜๋ฒ•๊ทœ ์›๋ฌธ ์ผ๋ถ€์ž…๋‹ˆ๋‹ค. ๋ฒ•๋ฅ  ํ•œ๊ตญ์–ด, ์กฐ๋ฌธ ๊ตฌ์กฐ, ๋ฒˆํ˜ธ ์ฒด๊ณ„, ๊ธฐ๊ด€๋ช…, ์‹œํ–‰์ผ์ž ํ‘œํ˜„์„ ๊ทธ๋Œ€๋กœ ํ•™์Šตํ•˜์‹ญ์‹œ์˜ค.

[์ž๋ฃŒ์ข…๋ฅ˜]
law

[๋ฌธ์„œ๋ช…]
ํ˜•๋ฒ•

[๊ฒฝ๋กœ]
kr/ํ˜•๋ฒ•/๋ฒ•๋ฅ .md

[๋ถ€๋ถ„]
1/1""",
}

for name, prompt in prompts.items():
    print("=" * 80)
    print(name)
    output = kohrm.generate_from_loaded(
        model,
        tokenizer,
        cfg,
        prompt,
        max_new_tokens=384,
        min_new_tokens=160,
        **settings,
    )
    print(output)

์ •์ƒ ๊ฒฐ๊ณผ:

  • model_type์€ hrm_text์ž…๋‹ˆ๋‹ค.
  • vocab_size๋Š” 131072์ž…๋‹ˆ๋‹ค.
  • helper๊ฐ€ 1.38B ๊ณต๊ฐœ model.safetensors ๋ณ€ํ™˜๋ณธ์„ ๋กœ๋“œํ•ฉ๋‹ˆ๋‹ค.
  • Colab T4์—์„œ๋Š” fp16 PyTorch scaled-dot-product attention์œผ๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.
  • ์ฒซ ์‹คํ–‰์€ 2.8 GiB๊ธ‰ weight ๋‹ค์šด๋กœ๋“œ์™€ ๋กœ๋“œ ๋•Œ๋ฌธ์— ๋ช‡ ๋ถ„ ๊ฑธ๋ฆด ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • ํ˜„์žฌ repo๋Š” rolling pretraining checkpoint์ž…๋‹ˆ๋‹ค. ๊ฐ™์€ ๊ธด prompt๋กœ ์ดํ›„ checkpoint์™€ ๋น„๊ตํ•ด์„œ ์ง€์‹, ๋ฌธ์ฒด, ๋ฐ˜๋ณต ์—ฌ๋ถ€๋ฅผ ๋ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

helper๊ฐ€ ์“ฐ๋Š” prompt ํ˜•์‹์€ upstream InferenceCheckpoint.tokenize_prompt()์™€ ๋งž์ถฅ๋‹ˆ๋‹ค.

<|im_start|><|object_ref_start|>PROMPT<|im_end|>

์ผ๋ฐ˜ AutoModelForCausalLM.generate()๋Š” ์•„์ง ์ง€์› ๊ฒฝ๋กœ๊ฐ€ ์•„๋‹™๋‹ˆ๋‹ค. ์ด ๋ชจ๋ธ์€ custom hrm_text architecture์ด๋ฏ€๋กœ, ์ผ๋ฐ˜ Transformers generation์€ ์ถ”ํ›„ trust_remote_code wrapper๊ฐ€ ์ถ”๊ฐ€๋œ ๋’ค ์ง€์›ํ•˜๋Š” ๊ฒƒ์ด ๋งž์Šต๋‹ˆ๋‹ค. ์ง€๊ธˆ ๊ณต๊ฐœ model.safetensors๋กœ ๋ฐ”๋กœ ์ƒ์„ฑํ•˜๋ ค๋ฉด ์œ„ ๋…ธํŠธ๋ถ/helper๋ฅผ ์“ฐ๋ฉด ๋ฉ๋‹ˆ๋‹ค.

๋‚ด๋ถ€ raw-checkpoint ์ƒ์„ฑ

ํ•™์Šต ๋จธ์‹ ์—์„œ ๋””๋ฒ„๊น…ํ•˜๊ฑฐ๋‚˜ raw FSDP2 checkpoint๋ฅผ ์ •ํ™•ํžˆ ๋ณต๊ตฌํ•ด์„œ ํ‰๊ฐ€ํ•  ๋•Œ๋Š” upstream ์Šคํƒ€์ผ inference ๊ฒฝ๋กœ๋„ ์œ ์ง€ํ•ฉ๋‹ˆ๋‹ค.

  • simple_inference_engine.py
  • LLM-OS-Models/KoHRM-Text-1.4B-raw-checkpoints์˜ raw checkpoints
  • CUDA/FlashAttention ์ค‘์‹ฌ ์‹คํ–‰

์ด ๊ฒฝ๋กœ๋Š” ๋‚ด๋ถ€ continuation/evaluation์šฉ์— ๊ฐ€๊น๊ณ , Colab์—์„œ ๊ฐ€์žฅ ์‰ฝ๊ฒŒ ํ™•์ธํ•˜๋ ค๋ฉด ์œ„ ๊ณต๊ฐœ model.safetensors helper๋ฅผ ์“ฐ๋Š” ๊ฒƒ์ด ๋‚ซ์Šต๋‹ˆ๋‹ค.

ํ•™์Šต ๋ฐ์ดํ„ฐ

prepared data๋Š” ์•„๋ž˜ dataset repo์— ์—…๋กœ๋“œํ•ฉ๋‹ˆ๋‹ค.

https://huggingface.co/datasets/LLM-OS-Models/KoHRM-Text-1.4B-prepared-data

ํ•™์Šต objective๋Š” PrefixLM response-only loss์ž…๋‹ˆ๋‹ค. instruction/prompt token์€ context๋กœ ๋ณด๊ณ , loss๋Š” response span์—๋งŒ ์ ์šฉํ•ฉ๋‹ˆ๋‹ค.

์ฃผ์š” prepared data group:

๋ฐ์ดํ„ฐ ๊ทธ๋ฃน Tokens ์šฉ๋„
koterm_pretrain_mix_v1 711.3M stage-0/stage0b
HRM cleaned fast-cap stage1/stage1b 14.55B HRM-style instruction pretraining
HRM cleaned full/no-cap stage2 14.55B ์™„๋ฃŒ๋œ continuation
HRM cleaned full/no-cap extra stage2b 14.55B ์ง„ํ–‰ ์ค‘์ธ continuation
local terminal conversations 9.39B terminal/code/tool-heavy continuation
Korean tool/legal/wiki/finance mix 3.02B ํ•œ๊ตญ์–ด domain/tool continuation
BCAI Finance Korean 857.7M ํ•œ๊ตญ์–ด ๊ธˆ์œต/domain data
Korean legal/admin task data 629.0M ํ•œ๊ตญ์–ด ๋ฒ•๋ฅ /ํ–‰์ • data
Korean Wikipedia 462.5M ํ•œ๊ตญ์–ด ์ผ๋ฐ˜ ํ…์ŠคํŠธ
ToolBench train tool-call data 127.0M tool-call pretraining
SWE-ZERO + GLM reasoning subsets 251.2M code/reasoning data

ํ‰๊ฐ€ ์„ฑ๊ฒฉ ๋ฐ์ดํ„ฐ๋Š” ํ™•์ธ๋˜๋Š” ๋ฒ”์œ„์—์„œ train์—์„œ ์ œ์™ธํ•ฉ๋‹ˆ๋‹ค. ์˜ˆ์‹œ๋Š” ToolBench eval, Terminal Bench ๊ณ„์—ด ํ‰๊ฐ€ ๋ฐ์ดํ„ฐ, benchmark ์„ฑ๊ฒฉ์˜ chi-bench์ž…๋‹ˆ๋‹ค.

ํ•™์Šต ์ง„ํ–‰

ํ˜„์žฌ run์€ staged continuation ๋ฐฉ์‹์ž…๋‹ˆ๋‹ค.

stage0
-> stage0b
-> stage1
-> stage2
-> stage3
-> stage4
-> stage1b
-> stage2b
-> stage3b
-> stage4b
-> stage1c
-> stage2c
-> stage3c
-> stage4c

checkpoint๋Š” model weights, optimizer state, EMA weights, recurrent carry state๋ฅผ ์ด์–ด๊ฐ‘๋‹ˆ๋‹ค. resume_step_offset๊ณผ total_steps_override๋ฅผ ์จ์„œ stage๋งˆ๋‹ค learning-rate schedule์ด ๋ฆฌ์…‹๋˜์ง€ ์•Š๊ณ  ๊ธด pretraining run์ฒ˜๋Ÿผ ์ด์–ด์ง€๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค.

2026-05-27 ๊ธฐ์ค€ stage2b๊ฐ€ ์ง„ํ–‰ ์ค‘์ž…๋‹ˆ๋‹ค. continuation watcher๊ฐ€ ์ดํ›„ stage3b -> stage4b -> stage1c -> stage2c -> stage3c -> stage4c๋ฅผ ์ด์–ด์„œ ์‹คํ–‰ํ•˜๋„๋ก ์˜ˆ์•ฝ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค. handoff๋Š” ๊ฐ stage์˜ ์‹ค์ œ epoch_1_info.json global_step์„ ์ฝ๊ณ  ๋‹ค์Œ stage๋ฅผ ์‹œ์ž‘ํ•ฉ๋‹ˆ๋‹ค.

์‚ฌ์šฉ ๋ชฉ์ 

์ด checkpoint๋Š” ๋‹ค์Œ ๋ชฉ์ ์— ์ ํ•ฉํ•ฉ๋‹ˆ๋‹ค.

  • continued pretraining ์‹คํ—˜
  • ํ•œ๊ตญ์–ด tokenizer ๋ฐ HRM-Text architecture ์‹คํ—˜
  • terminal/tool-call/code pretraining ์—ฐ๊ตฌ
  • checkpoint conversion ๋ฐ evaluation ์ž‘์—…

์•„์ง ์™„์„ฑ๋œ assistant model์€ ์•„๋‹™๋‹ˆ๋‹ค.

์ œํ•œ ์‚ฌํ•ญ

  • ์ค‘๊ฐ„ checkpoint์ด๋ฉฐ ์ตœ์ข… aligned instruct model์ด ์•„๋‹™๋‹ˆ๋‹ค.
  • ์ „์ฒด planned continuation์ด ์•„์ง ๋๋‚˜์ง€ ์•Š์•˜์Šต๋‹ˆ๋‹ค.
  • ์ตœ์ข… SFT์™€ safety tuning์ด ์•„์ง ๋๋‚˜์ง€ ์•Š์•˜์Šต๋‹ˆ๋‹ค.
  • ์ƒˆ checkpoint์˜ public benchmark score๋Š” ์•„์ง final์ด ์•„๋‹™๋‹ˆ๋‹ค.
  • ์ผ๋ฐ˜ Transformers generation์€ custom hrm_text modeling wrapper ๋˜๋Š” remote-code file์ด ์ถ”๊ฐ€๋˜์–ด์•ผ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.
  • tool-call JSON ์œ ํšจ์„ฑ๊ณผ terminal action safety๋Š” ์‹ค์ œ ์‚ฌ์šฉ ์ „์— ๋ณ„๋„ ํ‰๊ฐ€๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.

์ธ์šฉ

์ด ์ž‘์—…์€ HRM-Text architecture์™€ training stack์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•ฉ๋‹ˆ๋‹ค.

Downloads last month
2,465
Safetensors
Model size
1B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for LLM-OS-Models/KoHRM-Text-1.4B

Adapters
8 models
Finetunes
3 models
Quantizations
1 model