Text Generation
Transformers
Safetensors
gemma2
natural-language-processing
linguistics
conversational
text-generation-inference

omnes-flores (technology preview)

The omnes-flores is a unified NLP framework for LLMs consisting of three components:

  • The LS component takes documents as input, and outputs results of language identification and sentence segmentation tasks
  • The WX component takes a sentence and its language, and outputs results of word segmentation and language-specific part-of-speech tagging tasks
  • The UD component takes a sentence and its language, constituent word list and language, and outputs results of dependency parsing task
    • Corresponding model is on this page.

By executing these three tasks in sequence using the Python library omnes-flores, you can obtain dependency parsing results corresponding to the language of the input text simply by inputting text, regardless of the language.

For details, please read the Requirements and Install sections in omnes-flores repository.

99 Treebanks Used for LoRA SFT

This model was trained using training data from 84 UD languages, consisting of 99 treebanks.

The Japanese word unit is LUW.
(日本語の単語分割基準は国語研長単位です。)

The following 40 UD treebanks, which have both a commercially available license and over 40k UD tokens in the train set, were select to train the LoRA models of omnes-flores-84-lang-99-treebank-non-commercial-v0.

In addition, the following 59 treebanks have been added to the training in this model for academic purposes:

Acknowledgements

This work was conducted as part of a collaborative research project between Recruit Co., Ltd. and the National Institute for Japanese Language and Linguistics.

Citations

You are encouraged to cite one of the following papers if you use omnes-flores models:

@inproceedings{matsuda-etal-2025-step,
    title = "Step-by-step Instructions and a Simple Tabular Output Format Improve the Dependency Parsing Accuracy of {LLM}s",
    author = "Matsuda, Hiroshi  and
      Ma, Chunpeng  and
      Asahara, Masayuki",
    editor = "Sagae, Kenji  and
      Oepen, Stephan",
    booktitle = "Proceedings of the 18th International Conference on Parsing Technologies (IWPT, SyntaxFest 2025)",
    month = aug,
    year = "2025",
    address = "Ljubljana, Slovenia",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.iwpt-1.2/",
    pages = "11--19",
    ISBN = "979-8-89176-294-7",
    abstract = "Recent advances in large language models (LLMs) have enabled impressive performance in various tasks. However, standard prompting often struggles to produce structurally valid and accurate outputs, especially in dependency parsing. We propose a novel step-by-step instruction strategy, where universal part-of-speech tagging precedes the prediction of syntactic heads and dependency labels, and a simplified CoNLL-U like output format, our method achieves state-of-the-art accuracy on Universal Dependencies datasets across 17 languages without hallucination or contamination. We further show that multilingual fine-tuning simultaneously improves cross-language generalization performance. Our results highlight the effectiveness of explicit reasoning steps in LLM-based parsing and offer a scalable, format-consistent alternative to bracket-based approaches."
}

@misc{matsuda2025stepbystepinstructionssimpletabular,
      title={Step-by-step Instructions and a Simple Tabular Output Format Improve the Dependency Parsing Accuracy of LLMs}, 
      author={Hiroshi Matsuda and Chunpeng Ma and Masayuki Asahara},
      year={2025},
      eprint={2506.09983},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2506.09983}, 
}
Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for megagonlabs/omnes-flores-84-lang-99-treebank-non-commercial-v0

Finetuned
(317)
this model

Dataset used to train megagonlabs/omnes-flores-84-lang-99-treebank-non-commercial-v0

Paper for megagonlabs/omnes-flores-84-lang-99-treebank-non-commercial-v0