YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
π Usage
Using LoRA Adapters
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch
# Load base model and apply LoRA adapters
model_name = "cogni-x/CogniXpert-DeepSeek-R1-Distill-Llama8B-English-LoRA"
base_model = 'unsloth/DeepSeek-R1-Distill-Llama-8B'
tokenizer = AutoTokenizer.from_pretrained(model_name)
base_model = AutoModelForCausalLM.from_pretrained(
base_model,
device_map='auto',
load_in_4bit=True # Optional: for memory efficiency
)
model = PeftModel.from_pretrained(base_model, model_name)
# Example: English mental health conversation
messages = [
{"role": "user", "content": "I've been feeling really anxious about work lately."}
]
inputs = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
return_tensors="pt"
).to(model.device)
outputs = model.generate(
inputs,
max_new_tokens=256,
temperature=0.7,
top_p=0.9,
do_sample=True,
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
Example in Swahili
messages = [
{"role": "user", "content": "Nimekuwa na wasiwasi mwingi kuhusu kazi yangu."}
]
# Model will respond in Swahili
Example in Sheng
messages = [
{"role": "user", "content": "Niko na stress mob ju ya job yangu bana."}
]
# Model will respond using appropriate Sheng/Swahili mix
π Training Metrics
| Metric | Value |
|---|---|
| Training Loss | 0.8424 |
| Evaluation Loss | 0.8149 |
| Perplexity | 2.26 |
| Training Time | 3977.78 minutes |
π§ Training Details
Training Data
The model was fine-tuned on a combination of:
- English Mental Health Counseling Dataset - Professional therapeutic conversations
- Swahili Therapeutic Dataset - Culturally-adapted mental health dialogues
- Sheng Lexical Dataset - Urban Kenyan youth language patterns
Training Configuration
- Base Model: DeepSeek-R1-Distill-Llama-8B (4-bit quantized)
- Method: LoRA (Low-Rank Adaptation)
- LoRA Rank: 32
- LoRA Alpha: 64
- Target Modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
- Sequence Length: 2048 tokens
- Training Framework: Unsloth + TRL
- Optimizer: AdamW 8-bit
- Learning Rate: 2e-4
- Batch Size: Effective batch size of 64 (multi-GPU)
Multi-turn Conversation Handling
- Training: All conversation turns included with context for coherence
- Evaluation: Only first turns used to avoid bias from assuming perfect prior responses
- Response Masking: Loss computed only on assistant responses, not prompts
π Language Support
English
Professional mental health counseling with evidence-based therapeutic techniques.
Swahili (Kiswahili)
Culturally-sensitive therapeutic conversations adapted for East African context.
Sheng
Urban Kenyan youth slang for relatable, authentic support conversations.
Language Detection: Automatic - responds in the same language as input.
βοΈ Ethical Considerations
Intended Users
- Individuals seeking emotional support and self-reflection tools
- Mental health organizations looking to provide preliminary support
- Researchers studying multilingual therapeutic AI
Out-of-Scope Use
- Crisis intervention (use emergency services instead)
- Clinical diagnosis or treatment
- Replacement for licensed mental health professionals
- Legal or medical advice
Bias and Limitations
- May reflect biases present in training data
- Cultural nuances may not be fully captured
- Sheng language is informal and evolving - may not match all regional variations
- Should be used as a supplement, not replacement, for professional care
π Citation
If you use this model, please cite:
@misc{cognixpert-deepseek-mental-health,
title={CogniXpert DeepSeek Multilingual Mental Health Model},
author={CogniX Ltd},
year={2025},
publisher={HuggingFace},
url={https://huggingface.co/cogni-x/CogniXpert-DeepSeek-R1-Distill-Llama8B-English-LoRA}
}
π§ Contact
Organization: CogniX Ltd
Project: CogniXpert AI
Repository: GitHub
For questions, issues, or collaboration opportunities, please visit our GitHub repository.
π Acknowledgments
- Built on Unsloth for efficient training
- Base model: DeepSeek-R1-Distill-Llama-8B
- Training framework: Hugging Face TRL and Transformers
Last Updated: 2025-12-11
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support