Model Card for gemma-2-27b-amharic-sft

This model is a supervised fine-tuned version of b1n1yam/gemma-2-27b-amharic-cpt. It was trained on instruction-following data to enable conversational capabilities in Amharic.

Note: This is an early research version open-sourced solely for research purposes on low-resource language LLMs. For production-ready models with multimodality features, please visit platform.addisassistant.com.

Quick start

from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "b1n1yam/gemma-2-27b-amharic-sft"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")

messages = [
    {"role": "user", "content": "የኢትዮጵያ ዋና ከተማ ምንድነው?"}
]

inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)
outputs = model.generate(inputs, max_new_tokens=128)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Training procedure

This model was trained with supervised fine-tuning (SFT) on top of the continual pretrained base:

Base model: gemma-2-27b-amharic-cpt (with expanded tokenizer for Amharic)
Training approach: Supervised fine-tuning on instruction-following datasets
Chat format: Supports conversational interactions in Amharic

Framework versions

TRL: 0.24.0
Transformers: 4.57.2
PyTorch: 2.6.0.dev20241112+cu121
Datasets: 3.6.0
Tokenizers: 0.22.1

Citation

If you use this model, please cite:

@misc{daniel2024addisai,
  title = {Addis AI: Supervised Fine-tuned Gemma 2 27B for Amharic},
  author = {Daniel, Biniyam},
  year = {2024},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/b1n1yam/gemma-2-27b-amharic-sft}}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for b1n1yam/gemma-2-27b-amharic-alpaca-sft

Base model

unsloth/gemma-2-27b-bnb-4bit

Finetuned

b1n1yam/gemma-2-27b-amharic-cpt

Finetuned

(1)

this model