🧠 PsyEvent: Life Event Recognition System

📖 Model Overview

What is PsyEvent?

PsyEvent is a specialized NLP tool designed to extract and analyze major life events from unstructured social media text. Unlike general sentiment analysis, it focuses on identifying specific, objective occurrences (e.g., career, health) that significantly impact mental health trajectories (see Figure 1).

Figure 1: User post example.

This repository contains the PsyEvent models described in the paper "Tracking Life's Ups and Downs: Mining Life Events from Social Media Posts for Mental Health Analysis" (ACL 2025).

The system consists of two distinct models housed in this repository:

Life Events Detection (LE_detection): A multi-label classifier that identifies 12 categories of life events from social media posts.
Self-Status Determination (Self-status_determination): A binary classifier that determines whether the detected life event is currently being experienced by the user themselves (Self) or someone else.

Architecture

Both models are based on BERT-large (340M parameters) with a custom classification head.

📂 Repository Structure

This repository uses subfolders to store the weights for each model. You must specify the subfolder argument when loading.

Subfolder	Task Description	Type
`LE_detection/`	Detects which life events are present.	Multi-label Classification
`Self-status_determination/`	Detects who is experiencing the event.	Binary Classification

Both models share the same architecture (BERTDiseaseClassifier) defined in model.py.

🚀 Quick Start (Copy & Run)

Since these models use a custom architecture (BERT + Linear Head on [CLS] token without pooling), you must define or import the model class locally before loading the weights.

1. Installation

pip install transformers torch huggingface_hub

2. Define the Model Architecture

You can download the model.py file from this repository, or simply define the class in your code as shown below:

import torch
from torch import nn
from transformers import AutoModel, AutoConfig, AutoTokenizer

class BERTDiseaseClassifier(nn.Module):
    def __init__(self, model_type, num_symps) -> None:
        super().__init__()
        self.model_type = model_type
        self.num_symps = num_symps
        self.encoder = AutoModel.from_pretrained(model_type)
        self.dropout = nn.Dropout(self.encoder.config.hidden_dropout_prob)
        self.clf = nn.Linear(self.encoder.config.hidden_size, num_symps)
    
    def forward(self, input_ids=None, attention_mask=None, token_type_ids=None, **kwargs):
        outputs = self.encoder(input_ids, attention_mask, token_type_ids)
        x = outputs.last_hidden_state[:, 0, :]   # [CLS] pooling
        x = self.dropout(x)
        logits = self.clf(x)
        return logits

3. Load the Models

Use the subfolder parameter to select which model you want to load.

import torch
from transformers import AutoConfig, AutoTokenizer
from huggingface_hub import hf_hub_download
# from model import BERTDiseaseClassifier

repo_id = "shallowblueQAQ/PsyEvent-model"
subfolder = "LE_detection"
# subfolder = "Self-status_determination"

# 1. Load Config & Tokenizer
config = AutoConfig.from_pretrained(repo_id, subfolder=subfolder)
tokenizer = AutoTokenizer.from_pretrained(repo_id, subfolder=subfolder)

# 2. Initialize Model Architecture
# NOTE: If you are running offline, you can replace "bert-large-uncased" with your local path (e.g., "/path/to/bert-large-uncased").
model = BERTDiseaseClassifier(model_type="bert-large-uncased", num_symps=len(config.id2label))

# 3. Load Weights
weights_path = hf_hub_download(repo_id=repo_id, subfolder=subfolder, filename="pytorch_model.bin")
model.load_state_dict(torch.load(weights_path, map_location="cpu"))
model.eval()

# 4. Inference
text = "I lost my job yesterday and I feel terrible."
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128)

with torch.no_grad():
    logits = model(**inputs)
    probs = torch.sigmoid(logits)

# Display Predictions (Multi-label)
threshold = 0.5
for i, prob in enumerate(probs[0]):
    if prob > threshold:
        print(f"Detected: {config.id2label[i]} ({prob:.4f})")

📊 Dataset & Categories

The model was trained on PsyEvent, a dataset of 7,965 annotated sentences derived from SMHD. It covers 12 major life event categories:

Life Event Categories	Representative Examples (from paper Appendix D)
`🏥 Health`	personal injury , accident or illness; became disabled; mental illenss.
`💰 Financial`	loan; home purchase; car purchase; other major purchase.
`🏠 Relocation`	move to a different town/city; move out of parent's home; lost home / became homeless; major travel.
`💼 Career`	started a new job; promotion; voluntary/involuntary job loss; retirement.
`🎓 Education`	begin or end school/college; change in school/college; left school (without graduating).
`💔 Relationship Change`	began/ended serious romantic relationship; marriage; divorce; serious argument.
`🕯️ Death`	death of spouse/child/parent/friend/pet.
`👶 New Birth`	gave birth / became a parent; adopted a child; became a grandparent.
`⚖️ Legal`	got arrested; lawsuit or legal action; went to jail or prison; released from jail or prison.
`🌈 Identity`	came out as LGBTQ+; gender transition; change in political/religious/spiritual beliefs.
`🌱 Lifestyle Change`	change in physical habits; new pet; joined the military; vacation.
`🌍 Societal`	natural disaster; war; major political event that had personal impact.

Performance (AUC)

LE detection model performance on each life event category:

Life Event Categories	AUC(%)
Health	92.1
Financial	95.7
Relocation	97.7
Legal	96.1
Relationship Change	95.0
New Birth	92.6
Death	99.7
Career	93.5
Education	99.2
Lifestyle Change	87.9
Identity	95.5
Societal	97.4
Avg.	95.2

⚠️ Ethical Considerations & Limitations

1. No Clinical Diagnosis: This model is designed for research purposes only. It is not a clinical diagnostic tool and should not be used as a substitute for professional medical advice, diagnosis, or treatment.

2. No Automated Decision Making: The model must not be used for automated decision-making in high-stakes scenarios, including but not limited to:

Employment screening or hiring decisions.
Insurance eligibility or claims processing.
Legal assessment or administrative decision-making regarding individuals.

3. Bias & Errors: Like all models trained on social media data, this model may reflect biases present in the training corpus. It may generate false positives or misinterpret metaphorical language. Users should critically evaluate the model's outputs.

Data Availability & Privacy Statement

This model was trained on PsyEvent, a subset of the SMHD (Self-reported Mental Health Diagnoses) dataset.

Due to the strict Data Usage Agreement of SMHD, we are prohibited from publishing or sharing any proportion of the original dataset (including our annotated subset). Researchers interested in reproducing this work or using the data must apply for access directly from the original creators of SMHD (Cohan et al., 2018). We only provide the model weights and inference code here.

Citation

If you use this model or dataset, please cite our paper:

@inproceedings{lv2025tracking,
  title={Tracking life’s ups and downs: Mining life events from social media posts for mental health analysis},
  author={Lv, Minghao and Chen, Siyuan and Jin, Haoan and Yuan, Minghao and Ju, Qianqian and Peng, Yujia and Zhu, Kenny and Wu, Mengyue},
  booktitle={Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
  pages={6950--6965},
  year={2025}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support