Sentiment_analysis

Sentiment Analysis Using DistilBERT Overview This project implements a sentiment analysis system using DistilBERT, a lighter, faster version of BERT, for classifying movie reviews as positive or negative. The IMDB movie review dataset, accessed via Hugging Face Datasets, serves as the primary dataset for training and evaluation.

Table of Contents Overview Dataset# Sentiment Analysis Using DistilBERT

Overview

This project implements a sentiment analysis system using DistilBERT, a lighter, faster version of BERT, for classifying movie reviews as positive or negative. The IMDB movie review dataset, accessed via Hugging Face Datasets, serves as the primary dataset for training and evaluation.

Dataset

The dataset used for this project is the IMDB movie review dataset, which is available through the Hugging Face Datasets library. It contains 50,000 movie reviews, evenly split between positive and negative sentiments. The dataset is widely used in NLP for sentiment classification tasks.

Model Architecture

The model leverages DistilBERT, which is a streamlined version of BERT, maintaining about 97% of BERT’s performance while being more efficient:

Layers: 6 transformer layers.
Hidden Units: 768 units per layer.
Attention Heads: 12 attention heads.

Preprocessing

Minimal preprocessing was necessary due to the dataset's pre-cleaned nature. The steps included:

Tokenization: Using WordPiece tokenizer from the DistilBERT architecture.
Padding/Truncation: Input sequences were standardized to a maximum length of 128 tokens.
Label Encoding: Sentiment labels were already encoded as 1 (positive) and 0 (negative).

Training and Evaluation

The model was trained over 25 epochs with performance measured using metrics such as accuracy, precision, recall, f1-score, and ROC AUC. Hyperparameters like learning rate and batch size were tuned for optimal performance.

Results

The final evaluation yielded:

Training Accuracy: 83.25%
Validation Accuracy: 82%
Classification Report:

Sentiment Precision Recall F1-score Support

Negative 0.85 0.79 0.82 52

Positive 0.79 0.85 0.82 48

Overall 0.82 0.82 0.82 100
ROC AUC Score: 0.89

Conclusion

The sentiment analysis model built with DistilBERT achieved strong performance, demonstrating its capability to provide efficient and effective natural language understanding in resource-constrained environments.

References

Hugging Face Datasets: [link]
DistilBERT Paper: [link]

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
sentiment_model		sentiment_model
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
Sentiment_analysis_BERT.ipynb		Sentiment_analysis_BERT.ipynb
app.py		app.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

Sentiment_analysis

Overview

Table of Contents

Dataset

Model Architecture

Preprocessing

Training and Evaluation

Results

Conclusion

References

About

Uh oh!

Releases

Packages

Languages

Sentiment	Precision	Recall	F1-score	Support
Negative	0.85	0.79	0.82	52
Positive	0.79	0.85	0.82	48
Overall	0.82	0.82	0.82	100

Uh oh!

Uh oh!

Karanraj-6/Sentiment_analysis

Folders and files

Latest commit

History

Repository files navigation

Sentiment_analysis

Overview

Table of Contents

Dataset

Model Architecture

Preprocessing

Training and Evaluation

Results

Conclusion

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages