Whisper Tiny Galician

Model summary

Whisper Tiny Galician is an automatic speech recognition (ASR) model for Galician (gl) speech. It is fine-tuned from [openai/whisper-tiny] on the Galician portion of Mozilla Common Voice 13.0, achieving a Word Error Rate (WER) of 26.13% on the Common Voice evaluation split.

This model provides lightweight transcription capabilities for Galician speech, suitable for low-resource applications or devices with limited computational capacity.

Model description

Architecture: Transformer-based encoder–decoder (Whisper)
Base model: openai/whisper-tiny
Language: Galician (gl)
Task: Automatic Speech Recognition (ASR)
Output: Text transcription in Galician
Decoding: Autoregressive sequence-to-sequence decoding

This tiny model leverages Whisper’s multilingual pretraining and is fine-tuned on Galician speech data to provide basic transcription functionality for a low-resource language, ideal for experimentation and lightweight applications.

Intended use

Primary use cases

Lightweight transcription of Galician audio recordings
Low-resource or offline ASR pipelines
Educational and research purposes

Intended users

Researchers working on Galician or low-resource ASR
Developers building Galician speech applications
Academic or institutional users

Out-of-scope use

High-accuracy transcription requirements
Real-time or low-latency ASR without optimization
Speech translation tasks

Limitations and known issues

Performance may degrade on:
- Noisy or low-quality recordings
- Conversational or spontaneous speech
- Accents underrepresented in Common Voice
Transcription errors are expected due to the small model size
Dataset biases from Common Voice may be reflected in outputs

Users are encouraged to evaluate the model on their own data before deployment.

Training and evaluation data

Training data

Dataset: Mozilla Common Voice 13.0 (Galician subset)
Data type: Crowd-sourced, read speech
Preprocessing:
- Audio resampled to 16 kHz
- Text normalized using Whisper tokenizer
- Filtering of invalid or problematic samples

Evaluation data

Dataset: Mozilla Common Voice 13.0 (Galician evaluation split)
Metric: Word Error Rate (WER)

Evaluation results

Metric	Value
WER (eval)	26.13%

This reflects the expected performance of a tiny Whisper model fine-tuned for Galician.

Training procedure

Training hyperparameters

Learning rate: 3.75e-5
Optimizer: Adam (β1=0.9, β2=0.999, ε=1e-8)
LR scheduler: Linear
Warmup steps: 500
Training steps: 5,000
Train batch size: 256
Evaluation batch size: 128
Seed: 42
Mixed precision training: Native AMP

Training results (summary)

Training Loss	Epoch	Step	Validation Loss	WER
0.3626	20.0	1000	0.5407	30.8464
0.1103	40.0	2000	0.5370	27.0402
0.0473	60.0	3000	0.5769	26.7263
0.03	80.0	4000	0.5936	26.1382
0.0244	100.0	5000	0.6003	26.1331

Framework versions

Transformers 4.37.2
PyTorch 2.2.0+cu121
Datasets 2.16.1
Tokenizers 0.15.1

How to use

from transformers import pipeline

hf_model = "HiTZ/whisper-tiny-gl"  # replace with actual repo ID
device = 0  # set to -1 for CPU

pipe = pipeline(
    task="automatic-speech-recognition",
    model=hf_model,
    device=device
)

result = pipe("audio.wav")
print(result["text"])

Ethical considerations and risks

This model transcribes speech and may process personal data.
Users should ensure compliance with applicable data protection laws (e.g., GDPR).
The model should not be used for surveillance or non-consensual audio processing.

Citation

If you use this model in your research, please cite:

@misc{dezuazo2025whisperlmimprovingasrmodels,
  title={Whisper-LM: Improving ASR Models with Language Models for Low-Resource Languages},
  author={Xabier de Zuazo and Eva Navas and Ibon Saratxaga and Inma Hernáez Rioja},
  year={2025},
  eprint={2503.23542},
  archivePrefix={arXiv},
  primaryClass={cs.CL}
}

Please, check the related paper preprint in arXiv:2503.23542 for more details.

License

This model is available under the Apache-2.0 License. You are free to use, modify, and distribute this model as long as you credit the original creators.