Papers
arxiv:2512.15031

Toxicity Ahead: Forecasting Conversational Derailment on GitHub

Published on Dec 17
· Submitted by
Mia Mohammad Imran
on Dec 24
Authors:
,
,
,

Abstract

A novel framework using Large Language Models and a two-step prompting pipeline effectively predicts and prevents conversational derailment in open source software communities on GitHub.

AI-generated summary

Toxic interactions in Open Source Software (OSS) communities reduce contributor engagement and threaten project sustainability. Preventing such toxicity before it emerges requires a clear understanding of how harmful conversations unfold. However, most proactive moderation strategies are manual, requiring significant time and effort from community maintainers. To support more scalable approaches, we curate a dataset of 159 derailed toxic threads and 207 non-toxic threads from GitHub discussions. Our analysis reveals that toxicity can be forecast by tension triggers, sentiment shifts, and specific conversational patterns. We present a novel Large Language Model (LLM)-based framework for predicting conversational derailment on GitHub using a two-step prompting pipeline. First, we generate Summaries of Conversation Dynamics (SCDs) via Least-to-Most (LtM) prompting; then we use these summaries to estimate the likelihood of derailment. Evaluated on Qwen and Llama models, our LtM strategy achieves F1-scores of 0.901 and 0.852, respectively, at a decision threshold of 0.3, outperforming established NLP baselines on conversation derailment. External validation on a dataset of 308 GitHub issue threads (65 toxic, 243 non-toxic) yields an F1-score up to 0.797. Our findings demonstrate the effectiveness of structured LLM prompting for early detection of conversational derailment in OSS, enabling proactive and explainable moderation.

Community

Paper author Paper submitter

Toxic interactions in Open Source Software (OSS) communities reduce contributor engagement and threaten project sustainability. Preventing such toxicity before it emerges requires a clear understanding of how harmful conversations unfold. However, most proactive moderation strategies are manual, requiring significant time and effort from community maintainers. To support more scalable approaches, we curate a dataset of 159 derailed toxic threads and 207 non-toxic threads from GitHub discussions. Our analysis reveals that toxicity can be forecasted by tension triggers, sentiment shifts, and specific conversational patterns.

We present a novel Large Language Model (LLM)-based framework for predicting conversational derailment on GitHub using a two-step prompting pipeline. First, we generate Summaries of Conversation Dynamics (SCDs) via Least-to-Most (LtM) prompting; then we use these summaries to estimate the likelihood of derailment. Evaluated on Qwen and Llama models, our LtM strategy achieves F1-scores of 0.901 and 0.852, respectively, at a decision threshold of 0.3, outperforming established NLP baselines on conversation derailment. External validation on a dataset of 308 GitHub issue threads (65 toxic, 243 non-toxic) yields an F1-score up to 0.797. Our findings demonstrate the effectiveness of structured LLM prompting for early detection of conversational derailment in OSS, enabling proactive and explainable moderation.

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2512.15031 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2512.15031 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2512.15031 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.