Currently just working for the site Geeks for Geeks, but the idea is to scrape sites I frequent for Python specific articles and return them in it's own html file with the headline, summary, and link to the full article.
- BeautifulSoup 4 for web scraping.
- Requests library for making HTTP requests.
- lxml parser used in the BeautifulSoup object.
- os for opening the html file.
- Scrape more than just the one site for articles.
- Work on the over all display of the html file, it's not overly well formatted currently.
- Test on Linux, I know it will have trouble when trying to run the os.startfile() line.
Download the files, go to terminal and simply run:
python geeksScrape.py