Instructions to use megagonlabs/omnes-flores-84-lang-99-treebank-non-commercial-v0 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use megagonlabs/omnes-flores-84-lang-99-treebank-non-commercial-v0 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="megagonlabs/omnes-flores-84-lang-99-treebank-non-commercial-v0")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForMultimodalLM

tokenizer = AutoTokenizer.from_pretrained("megagonlabs/omnes-flores-84-lang-99-treebank-non-commercial-v0")
model = AutoModelForMultimodalLM.from_pretrained("megagonlabs/omnes-flores-84-lang-99-treebank-non-commercial-v0")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use megagonlabs/omnes-flores-84-lang-99-treebank-non-commercial-v0 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "megagonlabs/omnes-flores-84-lang-99-treebank-non-commercial-v0"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "megagonlabs/omnes-flores-84-lang-99-treebank-non-commercial-v0",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/megagonlabs/omnes-flores-84-lang-99-treebank-non-commercial-v0

SGLang

How to use megagonlabs/omnes-flores-84-lang-99-treebank-non-commercial-v0 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "megagonlabs/omnes-flores-84-lang-99-treebank-non-commercial-v0" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "megagonlabs/omnes-flores-84-lang-99-treebank-non-commercial-v0",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "megagonlabs/omnes-flores-84-lang-99-treebank-non-commercial-v0" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "megagonlabs/omnes-flores-84-lang-99-treebank-non-commercial-v0",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use megagonlabs/omnes-flores-84-lang-99-treebank-non-commercial-v0 with Docker Model Runner:
```
docker model run hf.co/megagonlabs/omnes-flores-84-lang-99-treebank-non-commercial-v0
```

omnes-flores (technology preview)

The omnes-flores is a unified NLP framework for LLMs consisting of three components:

The LS component takes documents as input, and outputs results of language identification and sentence segmentation tasks
- Corresponding model is omnes-flores-84-lang-99-treebank-non-commercial-v0-ls.
The WX component takes a sentence and its language, and outputs results of word segmentation and language-specific part-of-speech tagging tasks
- Corresponding model is omnes-flores-84-lang-99-treebank-non-commercial-v0-wx.
The UD component takes a sentence and its language, constituent word list and language, and outputs results of dependency parsing task
- Corresponding model is on this page.

By executing these three tasks in sequence using the Python library omnes-flores, you can obtain dependency parsing results corresponding to the language of the input text simply by inputting text, regardless of the language.

For details, please read the Requirements and Install sections in omnes-flores repository.

99 Treebanks Used for LoRA SFT

This model was trained using training data from 84 UD languages, consisting of 99 treebanks.

The Japanese word unit is LUW.
(日本語の単語分割基準は国語研長単位です。)

The following 40 UD treebanks, which have both a commercially available license and over 40k UD tokens in the train set, were select to train the LoRA models of omnes-flores-84-lang-99-treebank-non-commercial-v0.

UD_Armenian-ArmTDP UD_Belarusian-HSE UD_Bororo-BDT UD_Chinese-GSD UD_Chinese-GSDSimp UD_Croatian-SET UD_Czech-CAC UD_Danish-DDT UD_Dutch-Alpino UD_English-EWT UD_Estonian-EWT UD_Finnish-TDT UD_French-GSD UD_German-GSD UD_Haitian_Creole-Adolphe UD_Hebrew-IAHLTwiki UD_Icelandic-GC UD_Indonesian-GSD UD_Irish-IDT UD_Japanese-GSDLUW UD_Korean-Kaist UD_Latvian-LVTB UD_Lithuanian-ALKSNIS UD_Naija-NSC UD_Norwegian-Nynorsk UD_Persian-PerDT UD_Portuguese-Porttinari UD_Romanian-RRT UD_Russian-GSD UD_Scottish_Gaelic-ARCOSG UD_Serbian-SET UD_Sindhi-Isra UD_Slovak-SNK UD_Slovenian-SSJ UD_Spanish-GSD UD_Swedish-Talbanken UD_Thai-TUD UD_Turkish-BOUN UD_Ukrainian-ParlaMint UD_Western_Armenian-ArmTDP

In addition, the following 59 treebanks have been added to the training in this model for academic purposes:

UD_Ancient_Greek-PTNK UD_Ancient_Greek-PROIEL UD_Ancient_Greek-Perseus UD_Ancient_Hebrew-PTNK UD_Basque-BDT UD_Bulgarian-BTB UD_Classical_Armenian-CAVaL UD_Classical_Chinese-Kyoto UD_Coptic-Scriptorium UD_Coptic-Bohairic UD_Egyptian-PC UD_Erzya-JR UD_Estonian-EDT UD_Galician-CTG UD_Galician-TreeGal UD_Georgian-GLC UD_Gothic-PROIEL UD_Greek-GDT UD_Hindi-HDTB UD_Hungarian-Szeged UD_Icelandic-IcePaHC UD_Icelandic-Modern UD_Italian-ISDT UD_Italian-Old UD_Khoekhoe-KDT UD_Kyrgyz-KTMU UD_Latin-CIRCSE UD_Latin-ITTB UD_Latin-LLCT UD_Latin-Perseus UD_Latin-PROIEL UD_Latin-UDante UD_Low_Saxon-LSDC UD_Maltese-MUDT UD_Manx-Cadhan UD_Middle_French-PROFITEROLE UD_Nheengatu-CompLin UD_North_Sami-Giella UD_Occitan-TTB UD_Old_Church_Slavonic-PROIEL UD_Old_East_Slavic-RNC UD_Old_East_Slavic-Ruthenian UD_Old_East_Slavic-TOROT UD_Old_East_Slavic-Birchbark UD_Old_French-PROFITEROLE UD_Old_Occitan-CorAG UD_Ottoman_Turkish-DUDU UD_Ottoman_Turkish-BOUN UD_Polish-MPDT UD_Pomak-Philotis UD_Sanskrit-Vedic UD_Sindhi-Isra UD_Urdu-UDTB UD_Uyghur-UDT UD_Vietnamese-VTB UD_Welsh-CCG UD_Wolof-WTB UD_Yiddish-YiTB UD_Zaar-Autogramm

Acknowledgements

This work was conducted as part of a collaborative research project between Recruit Co., Ltd. and the National Institute for Japanese Language and Linguistics.

Citations

You are encouraged to cite one of the following papers if you use omnes-flores models:

@inproceedings{matsuda-etal-2025-step,
    title = "Step-by-step Instructions and a Simple Tabular Output Format Improve the Dependency Parsing Accuracy of {LLM}s",
    author = "Matsuda, Hiroshi  and
      Ma, Chunpeng  and
      Asahara, Masayuki",
    editor = "Sagae, Kenji  and
      Oepen, Stephan",
    booktitle = "Proceedings of the 18th International Conference on Parsing Technologies (IWPT, SyntaxFest 2025)",
    month = aug,
    year = "2025",
    address = "Ljubljana, Slovenia",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.iwpt-1.2/",
    pages = "11--19",
    ISBN = "979-8-89176-294-7",
    abstract = "Recent advances in large language models (LLMs) have enabled impressive performance in various tasks. However, standard prompting often struggles to produce structurally valid and accurate outputs, especially in dependency parsing. We propose a novel step-by-step instruction strategy, where universal part-of-speech tagging precedes the prediction of syntactic heads and dependency labels, and a simplified CoNLL-U like output format, our method achieves state-of-the-art accuracy on Universal Dependencies datasets across 17 languages without hallucination or contamination. We further show that multilingual fine-tuning simultaneously improves cross-language generalization performance. Our results highlight the effectiveness of explicit reasoning steps in LLM-based parsing and offer a scalable, format-consistent alternative to bracket-based approaches."
}

@misc{matsuda2025stepbystepinstructionssimpletabular,
      title={Step-by-step Instructions and a Simple Tabular Output Format Improve the Dependency Parsing Accuracy of LLMs}, 
      author={Hiroshi Matsuda and Chunpeng Ma and Masayuki Asahara},
      year={2025},
      eprint={2506.09983},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2506.09983}, 
}

Downloads last month: 2

Model tree for megagonlabs/omnes-flores-84-lang-99-treebank-non-commercial-v0

Base model

google/gemma-2-9b

Finetuned

(317)

this model

Dataset used to train megagonlabs/omnes-flores-84-lang-99-treebank-non-commercial-v0

Paper for megagonlabs/omnes-flores-84-lang-99-treebank-non-commercial-v0

Step-by-step Instructions and a Simple Tabular Output Format Improve the Dependency Parsing Accuracy of LLMs

Paper • 2506.09983 • Published Jun 11, 2025