π λͺ¨λΈ μμΈ μ 보
1. π§Ύ κ°μ
μ΄ λͺ¨λΈμ νκ΅μ΄ λ¬Έμ₯ λ΄ μ ν΄ ννμ μ 무 λ° μ ν΄ ννμ μ ν(μΉ΄ν
κ³ λ¦¬)λ₯Ό λΆλ₯νκΈ° μν΄ νμ΅λ λͺ¨λΈμ
λλ€.mult-label classificationμ μννλ©°, μ ν΄ννμ΄ ν¬ν¨λλμ§, μ ν΄ννμ΄λΌλ©΄ κ·Έ μ νμ νλ¨(λΆλ₯) ν©λλ€.
AI-Taskλ‘λ text-classificationμ ν΄λΉν©λλ€.
μ¬μ©νλ λ°μ΄ν°μ
μ TTA-DQA/hate_sentenceμ
λλ€.
- ν΄λμ€ κ΅¬μ±:
"0":insult"1":abuse"2":obscenity"3":TVPC(Threats of violence/promotion of crime)"4":sexuality"5":age"6":race and region"7":disabled"8":religion"9":politics"10":job"11":no_hate
2. π§ νμ΅ μ 보
- Base Model: KcElectra (a pre-trained Korean language model based on Electra)
- Source: beomi/KcELECTRA
- Model Type: Casual Language Model
- Pre-training (Korean): μ½ 17GB (over 180 million sentences)
- Fine-tuning (Hate Dataset): μ½ 22.3MB (
TTA-DQA/hate_sentence) - Learning Rate:
5e-6 - Weight Decay:
0.01 - Epochs:
30 - Batch Size:
16 - Data Loader Workers:
2 - Tokenizer:
BertWordPieceTokenizer - Model Size: μ½
511MB
3. π§© μꡬμ¬ν
pytorch ~= 1.8.0transformers ~= 4.0.0emoji ~= 0.6.0soynlp ~= 0.0.493
4. π Quick Start
from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline
model_name = "TTA-DQA/HateDetection_MultiLabel_KcElectra_FineTuning"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
classifier = pipeline("text-classification", model=model, tokenizer=tokenizer)
sentences = ["μ€λ μ μ¬ λ λ¨ΉμκΉ?", "μ΄ λμ λμ."]
results = classifier(sentences)'
5.π Citation
μ΄ λͺ¨λΈμ μ΄κ±°λAI νμ΅μ© λ°μ΄ν° νμ§κ²μ¦ μ¬μ (2024λ λ μ΄κ±°λAI νμ΅μ© νμ§κ²μ¦)μ μν΄μ ꡬμΆλμμ΅λλ€.
6. β οΈ Bias, Risks, and Limitations
λ³Έ λͺ¨λΈμ κ° ν΄λμ€μ λ°μ΄ν°λ₯Ό νΈν₯λκ² νμ΅νμ§λ μμμΌλ,
μΈμ΄μ Β·λ¬Ένμ νΉμ±μ μν΄ λ μ΄λΈμ λν μ΄κ²¬μ΄ μμ μ μμ΅λλ€.
μ ν΄ ννμ μΈμ΄, λ¬Έν, μ μ© λΆμΌ, κ°μΈμ 견ν΄μ λ°λΌ μ£Όκ΄μ μΈ λΆλΆμ΄ μ‘΄μ¬νμ¬,
κ²°κ³Όμ λν νΈν₯ λλ λ
Όλμ΄ λ°μν μ μμ΅λλ€.
β λ³Έ λͺ¨λΈμ κ²°κ³Όλ μ λμ μΈ μ ν΄ νν κΈ°μ€μ΄ μλμ μ μν΄ μ£ΌμΈμ.
π Results
- Task: binary classification (text-classification)
- F1-score: 0.8279
- Accuracy: 0.7013
- Downloads last month
- 5
Model tree for TTA-DQA/HateDetection_MultiLabel_KcElectra_FineTuning
Base model
beomi/KcELECTRA-base-v2022