Topic Modelling in English using LDA

Overview

This project implements Latent Dirichlet Allocation (LDA) for topic modeling on English text documents. The implementation extracts hidden topics from a collection of academic papers using unsupervised machine learning techniques.

Features

Data preprocessing and text cleaning
Word cloud visualization
LDA model training with configurable topic numbers
Interactive topic visualization using pyLDAvis

Requirements

Python 3.x
pandas
gensim
nltk
pyLDAvis
wordcloud
matplotlib

Workflow

Data Loading: Loads academic papers from CSV file
Text Preprocessing:
- Removes punctuation and converts to lowercase
- Tokenizes text using Gensim's simple_preprocess
- Removes English stopwords
Visualization: Generates word cloud for exploratory data analysis
Dictionary & Corpus Creation:
- Creates word-to-id mapping using Gensim Dictionary
- Converts documents to bag-of-words representation
LDA Training: Trains multicore LDA model with 10 topics
Results: Displays topic keywords and interactive visualization

Output

Topic keywords for each discovered topic
Interactive pyLDAvis visualization for topic exploration
Word cloud showing frequent terms in the dataset

Usage

Run the Jupyter notebook Topic_Modelling_In_English_LDA.ipynb sequentially to execute the complete pipeline.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
Topic_Modelling_In_English_LDA.ipynb		Topic_Modelling_In_English_LDA.ipynb
Topic_Modelling_In_English_LDA.pdf		Topic_Modelling_In_English_LDA.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Topic Modelling in English using LDA

Overview

Features

Requirements

Workflow

Output

Usage

About

Uh oh!

Releases

Packages

Languages

License

talhamasood0000/Topic_Modelling_In_English_LDA

Folders and files

Latest commit

History

Repository files navigation

Topic Modelling in English using LDA

Overview

Features

Requirements

Workflow

Output

Usage

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages