Toxic Comments Classification

A machine learning project that classifies comments into various categories of toxicity using both traditional and deep learning approaches.

Project Overview

This project tackles the challenge of content moderation by building models to automatically classify comments into multiple toxicity categories: toxic, severe toxic, obscene, threat, insult, and identity hate. This is a multi-label classification problem where a single comment can belong to multiple categories.

Objectives

Build a baseline logistic regression model for toxic comment classification
Develop a deep learning model using DistilBERT as the base architecture
Compare model performance and conduct error analysis
Address data imbalance and preprocessing challenges

Dataset

The project uses the Jigsaw Toxic Comment Classification Dataset (available on HuggingFace: tcapelle/jigsaw-toxic-comment-classification-challenge):

Source: Wikipedia talk page comments
Size: 172,057 total comments
Features: 6 toxicity categories (binary labels)
Challenge: Highly imbalanced dataset with significantly more non-toxic comments

Figure 1: Distribution of labels

Figure 2: Imbalance in toxic and non toxic comments

Dataset Statistics

Maximum comment length: 2,321 words
Minimum comment length: 1 word
Average comment length: 485 words
Balanced subset used: 15,000 toxic + 15,000 non-toxic comments

Models Implemented

1. Baseline Model: Logistic Regression

Approach: One-vs-Rest classification for multi-label prediction
Features: TF-IDF vectorization with uni-grams and bi-grams
Preprocessing: Lowercase conversion, special character removal, stopword removal
Performance: ROC-AUC = 0.71

2. Deep Learning Model: DistilBERT

Base Model: distilbert-base-cased
Architecture: DistilBERT + Classification layer + Dropout
Loss Function: BCEWithLogitsLoss
Optimizer: AdamW
Performance: ROC-AUC = 0.94

Hardware Requirements

GPU used for DistilBERT training
Training time: ~32 minutes (5 epochs on MPS GPU)

Results

Detailed Performance Metrics

Model	ROC-AUC	Micro Avg Recall	Micro Avg F1
Logistic Regression	0.71	0.69	0.76
DistilBERT	0.94	0.82	0.80

DistilBERT Model Performance by Category:

Category	Recall	F1
Toxic	0.93	0.89
Severe Toxic	0.53	0.40
Obscene	0.80	0.80
Threat	0.57	0.58
Insult	0.75	0.75
Identity Hate	0.55	0.62

Error Analysis

Model Strengths

DistilBERT successfully identifies:
- Comments with spelling errors ("U r such a looooser")
- Sarcastic toxic comments with emojis ("wow you're such a genius! XD")

Model Limitations & Bias

Both models show bias toward certain demographics, incorrectly classifying neutral comments as toxic:

"Are you a muslim?"
"She is a black woman."
"I am a proud gay man."

This highlights the importance of bias detection and mitigation in NLP models.

Key Insights

Preprocessing Impact: Dataset balancing and text cleaning significantly improve performance
Model Architecture: DistilBERT's contextual understanding vastly outperforms traditional approaches
Case Sensitivity: Using cased models helps detect uppercase emphasis in toxic language
Bias Awareness: Models inherit biases from training data, requiring careful evaluation

Future Improvements

Multilingual Support: Extend classification to multiple languages
Data Augmentation: Advanced techniques to improve minority class representation
Bias Mitigation: Implement fairness-aware training techniques
Real-time Deployment: Optimize for production environments

Important Notes

This model is intended for research and content moderation purposes

📄 License

This project is open source and available under the MIT License.

📚 References

Brassard-Gourdeau, E. and Khoury, R. (2019). Subversive toxicity detection using sentiment information.
Davidson, T., et al. (2017). Automated hate speech detection and the problem of offensive language.
Kiilu, K. K., et al. (2018). Using naïve bayes algorithm in detection of hate tweets.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
figures		figures
toxic-comment-classifier		toxic-comment-classifier
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Toxic Comments Classification

Project Overview

Objectives

Dataset

Dataset Statistics

Models Implemented

1. Baseline Model: Logistic Regression

2. Deep Learning Model: DistilBERT

Hardware Requirements

Results

Detailed Performance Metrics

Error Analysis

Model Strengths

Model Limitations & Bias

Key Insights

Future Improvements

Important Notes

📄 License

📚 References

About

Uh oh!

Releases

Packages

Languages

paulamib123/Toxic-Comment-Classifier

Folders and files

Latest commit

History

Repository files navigation

Toxic Comments Classification

Project Overview

Objectives

Dataset

Dataset Statistics

Models Implemented

1. Baseline Model: Logistic Regression

2. Deep Learning Model: DistilBERT

Hardware Requirements

Results

Detailed Performance Metrics

Error Analysis

Model Strengths

Model Limitations & Bias

Key Insights

Future Improvements

Important Notes

📄 License

📚 References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages