Update README.md

36dc731 verified 8 months ago

2.46 kB

metadata

datasets:
  - samirmsallem/wiki_definitions_de_multitask
language:
  - de
base_model:
  - deepset/gbert-base
pipeline_tag: text-classification
library_name: transformers
tags:
  - science
  - ner
  - def_extraction
  - definitions
metrics:
  - accuracy
model-index:
  - name: checkpoints
    results:
      - task:
          name: Text Classification
          type: text-classification
        dataset:
          name: samirmsallem/wiki_definitions_de_multitask
          type: samirmsallem/wiki_definitions_de_multitask
        metrics:
          - name: Accuracy
            type: accuracy
            value: 0.9630331753554502

Text classification model for definition recognition in German scientific texts

gbert-base-definition_classification is a text classification model in the scientific domain in German, finetuned from the model gbert-base. It was trained using a custom annotated dataset of around 10,000 training and 2,000 test examples containing definition- and non-definition-related sentences from wikipedia articles in german. The model was selected because it overall achieved the best score in the NER task

The model is specifically designed to recognize and classify sentences as definition or non-definition sentences:

Text Classification Tag	Text Classification Label	Description
0	NON_DEF_SENTENCE	Text equals a non-definitional sentence
1	DEF_SENTENCE	Text equals a definitional sentence

Training was conducted using a standard Text classification objective. The model achieves an accuracy of approximately 96% on the evaluation set.

Here are the overall final metrics on the test dataset after 4 epochs of training:

Accuracy: 0.9630331753554502
Loss: 0.17300711572170258

Usage

from transformers import pipeline

pipe = pipeline("text-classification", model="samirmsallem/gbert-base-definition_classification")

results = pipe(['Natural Language Processing ist ein Verfahren der künstlichen Intelligenz.',
                'Rosen sind rot, Veilchen sind blau.'])
print(results)

# [{'label': 'DEF_SENTENCE', 'score': 0.9995753169059753}, {'label': 'NON_DEF_SENTENCE', 'score': 0.999630331993103}]