Web Scraper with Go for Cyberbullying Dataset Creation

🚀 Project Overview

This project focuses on creating a Cyberbullying Dataset by utilizing advanced web scraping techniques and semantic ontologies to enrich the collected data. The goal is to extract valuable information from the web, process it, and enhance it using ontological concepts to address modern cyberbullying patterns.

🛠 Technologies Used

Go: For performing web scraping and handling data extraction.
Python: For data processing (lemmatization) via API.
MySQL: To store raw and processed data.
MongoDB: For storing enriched data with semantic ontologies.
Semantic Ontologies: To classify and enhance data related to cyberbullying.

📑 Project Workflow

Web Scraping:
- Developed a custom web scraper in Go to gather relevant information about cyberbullying from various websites.
Data Cleaning:
- Removed unnecessary elements like HTML, CSS, JavaScript tags, and advertisements.
- Tokenization and stop word removal were applied to the raw text.
Lemmatization:
- Implemented lemmatization via an API developed in Python to process the cleaned data.
Ontology Creation:
- Built semantic ontologies to categorize and enhance the extracted data based on cyberbullying-related terms.
Data Enrichment:
- Enriched the cleaned and processed records using the ontologies, then stored them in a NoSQL MongoDB database for further analysis.

💡 Key Features

Real-time Web Data Extraction: Continuously extracts up-to-date cyberbullying information from the web.
Data Cleaning Pipeline: Automated pipeline to clean and process raw web data.
Semantic Enrichment: Enhances data with ontologies to add context and value, making the dataset more relevant for research.

🚧 How to Run the Project

Prerequisites:

Go 1.16+ installed
Python 3.x installed
MySQL and MongoDB instances

Step-by-Step Instructions:

Clone the repository:

git clone https://github.com/EdwardMelendezM/Webscraper-with-go.git
cd Webscraper-with-go

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
.idea		.idea
dev-tools/web-app		dev-tools/web-app
infra		infra
src		src
.gitignore		.gitignore
Readme.md		Readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Web Scraper with Go for Cyberbullying Dataset Creation

🚀 Project Overview

🛠 Technologies Used

📑 Project Workflow

💡 Key Features

🚧 How to Run the Project

Prerequisites:

Step-by-Step Instructions:

About

Uh oh!

Releases

Packages

Uh oh!

Languages

EdwardMelendezM/Webscraper-with-go

Folders and files

Latest commit

History

Repository files navigation

Web Scraper with Go for Cyberbullying Dataset Creation

🚀 Project Overview

🛠 Technologies Used

📑 Project Workflow

💡 Key Features

🚧 How to Run the Project

Prerequisites:

Step-by-Step Instructions:

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages