Instructions to use LVSTCK/domestic-yak-8B-instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use LVSTCK/domestic-yak-8B-instruct with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="LVSTCK/domestic-yak-8B-instruct")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForMultimodalLM

tokenizer = AutoTokenizer.from_pretrained("LVSTCK/domestic-yak-8B-instruct")
model = AutoModelForMultimodalLM.from_pretrained("LVSTCK/domestic-yak-8B-instruct")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use LVSTCK/domestic-yak-8B-instruct with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "LVSTCK/domestic-yak-8B-instruct"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "LVSTCK/domestic-yak-8B-instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/LVSTCK/domestic-yak-8B-instruct

SGLang

How to use LVSTCK/domestic-yak-8B-instruct with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "LVSTCK/domestic-yak-8B-instruct" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "LVSTCK/domestic-yak-8B-instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "LVSTCK/domestic-yak-8B-instruct" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "LVSTCK/domestic-yak-8B-instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use LVSTCK/domestic-yak-8B-instruct with Docker Model Runner:
```
docker model run hf.co/LVSTCK/domestic-yak-8B-instruct
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

🐂 domestic-yak, a Macedonian LM (instruct version)

This repository contains the model of the paper Towards Open Foundation Language Model and Corpus for Macedonian: A Low-Resource Language.

Code: https://github.com/LVSTCK

Model Summary

This is the instruct-tuned version of domestic-yak-8B. It has been fine-tuned specifically to improve instruction-following capabilities in Macedonian. It was fine-tuned on the sft-mk dataset for three epochs. Building on the foundation of domestic-yak-8B, this version is optimized for generating coherent, task-specific responses to user queries, making it ideal for chatbots, virtual assistants, and other interactive applications.

📊 Results

The table below compares the performance of our model, domestic-yak-8B-instruct with 4 other models. As we can see our model is on par with Llama 70B, and even beats it on three of the benchmarks. It is also worth noting that this model is currently the best in the 8B parameter range.

The results were obtained using the macedonian-llm-eval benchmark.

wn.png)

🔑 Key Details

Language: Macedonian (mk)
Base Model: domestic-yak-8B
Dataset: ~100k samples across multiple categories (Question answering (QA), chat-like conversations, reasoning, essays, and code) consolidated from translating publicly available datasets and custom synthetic data. Dataset can be found here.
Fine-tuning Objective: Supervised fine-tuning (SFT) on Macedonian-specific instruction-following data

Usage

Pipeline automatically uses apply_chat_template which formats the input appropriately. The model was trained using the default Llama 3.1 format.

import transformers
import torch

model_id = "LVSTCK/domestic-yak-8B-instruct"

pipeline = transformers.pipeline(
    "text-generation",
    model=model_id,
    model_kwargs={"torch_dtype": torch.bfloat16},
    device_map="auto",
)

messages = [
    {"role": "system", "content": "Ти си виртуелен асистент кој помага на корисници на македонски јазик. Одговарај на прашања на јасен, разбирлив и професионален начин. Користи правилна граматика и обиди се одговорите да бидат што е можно покорисни и релевантни."},
    {"role": "user", "content": "Кој е највисок врв во Македонија?"},
]

outputs = pipeline(
    messages,
    max_new_tokens=256, # You can increase this
    temperature=0.1,
)
print(outputs[0]["generated_text"][-1])

📬 Contact

For inquiries, feedback, or contributions, please feel free to reach out to the core team:

Citation

@article{krsteski2025towards,
  title={Towards Open Foundation Language Model and Corpus for Macedonian: A Low-Resource Language},
  author={Krsteski, Stefan and Tashkovska, Matea and Sazdov, Borjan and Gjoreski, Hristijan and Gerazov, Branislav},
  journal={arXiv preprint arXiv:2506.09560},
  year={2025}
}