0% found this document useful (0 votes)

6 views

11Python Reading HTML Pages

Uploaded by

David Osei

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views

11Python Reading HTML Pages

Uploaded by

David Osei

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

PYTHON READING HTML PAGES

https://www.tutorialspoint.com/python/python_reading_html_pages.htm Copyright © tutorialspoint.com

conda install Beaustifulsoap

Reading the HTML file

In the below example we make a request to an url to be loaded into the python environment. Then use the html
parser parameter to read the entire html file. Next, we print first few lines of the html page.

import urllib2
from bs4 import BeautifulSoup

# Fetch the html file

response = urllib2.urlopen('http://tutorialspoint.com/python/python_overview.htm')
html_doc = response.read()

# Parse the html file

soup = BeautifulSoup(html_doc, 'html.parser')

# Format the parsed html file

strhtm = soup.prettify()

# Print the first few characters

print (strhtm[:225])

When we execute the above code, it produces the following result.

<!DOCTYPE html>
<!‐‐[if IE 8]><html class="ie ie8"> <![endif]‐‐>
<!‐‐[if IE 9]><html class="ie ie9"> <![endif]‐‐>
<!‐‐[if gt IE 9]><!‐‐>
<html>
<!‐‐<![endif]‐‐>
<head>
<!‐‐ Basic ‐‐>
<meta charset="utf‐8"/>
<title>

Extracting Tag Value

We can extract tag value from the first instance of the tag using the following code.
import urllib2
from bs4 import BeautifulSoup

response = urllib2.urlopen('http://tutorialspoint.com/python/python_overview.htm')
html_doc = response.read()

soup = BeautifulSoup(html_doc, 'html.parser')

print (soup.title)
print(soup.title.string)
print(soup.a.string)
print(soup.b.string)

When we execute the above code, it produces the following result.

Python Overview
None
Python is Interpreted

Extracting All Tags

We can extract tag value from all the instances of a tag using the following code.

import urllib2
from bs4 import BeautifulSoup

response = urllib2.urlopen('http://tutorialspoint.com/python/python_overview.htm')
html_doc = response.read()
soup = BeautifulSoup(html_doc, 'html.parser')

for x in soup.find_all('b'): print(x.string)

When we execute the above code, it produces the following result.

Python is Interpreted
Python is Interactive
Python is Object‐Oriented
Python is a Beginner's Language
Easy‐to‐learn
Easy‐to‐read
Easy‐to‐maintain
A broad standard library
Interactive Mode
Portable
Extendable
Databases
GUI Programming
Scalable

HTML SRC List
0% (1)
HTML SRC List
4 pages
How To Scrape Websites With Python and BeautifulSoup PDF
100% (2)
How To Scrape Websites With Python and BeautifulSoup PDF
10 pages
Reportlab Userguide PDF
No ratings yet
Reportlab Userguide PDF
25 pages
Pearson India Launches Book Python Programming
0% (3)
Pearson India Launches Book Python Programming
1 page
Python BeautifulSoup - Parse HTML, XML Documents in Python
100% (1)
Python BeautifulSoup - Parse HTML, XML Documents in Python
21 pages
HTML CSS Notes Tronix
No ratings yet
HTML CSS Notes Tronix
114 pages
4a82c633-5051-45ef-a932-6a6495641a0e_4F_IntroToWebScraping
No ratings yet
4a82c633-5051-45ef-a932-6a6495641a0e_4F_IntroToWebScraping
6 pages
PY Mod5@AzDOCUMENTS - in
No ratings yet
PY Mod5@AzDOCUMENTS - in
26 pages
Strip HTML Tags Using Python
No ratings yet
Strip HTML Tags Using Python
8 pages
Finn Al Complete Thesis Book
No ratings yet
Finn Al Complete Thesis Book
2 pages
Wad Lab Manual
No ratings yet
Wad Lab Manual
69 pages
Python For Web Scraping - Week 3: 1 Installing A Module
No ratings yet
Python For Web Scraping - Week 3: 1 Installing A Module
4 pages
Carátula Escolar Doodle Papel Blanco y Negro
No ratings yet
Carátula Escolar Doodle Papel Blanco y Negro
17 pages
Wad - Lab - Manual (1) - 1
No ratings yet
Wad - Lab - Manual (1) - 1
66 pages
2.1-HTML
No ratings yet
2.1-HTML
48 pages
Unit3 (1)
No ratings yet
Unit3 (1)
21 pages
Dinesh Power Point Presentation
No ratings yet
Dinesh Power Point Presentation
16 pages
HTML Notes
No ratings yet
HTML Notes
22 pages
Web Scraping
No ratings yet
Web Scraping
53 pages
Implementing Web Scraping in Python With Beautifulsoup
No ratings yet
Implementing Web Scraping in Python With Beautifulsoup
6 pages
HTML-Notes 1
No ratings yet
HTML-Notes 1
27 pages
4 Django Template
No ratings yet
4 Django Template
5 pages
Web Scraping with C
No ratings yet
Web Scraping with C
28 pages
HTML Basics
No ratings yet
HTML Basics
5 pages
WT For Bcom 4thnew and 5thold Sem
No ratings yet
WT For Bcom 4thnew and 5thold Sem
91 pages
Web Scraping for SEO with Python
From Everand
Web Scraping for SEO with Python
Enrique Vicente
No ratings yet
3252_ids_10
No ratings yet
3252_ids_10
5 pages
Htmlcourse 2
No ratings yet
Htmlcourse 2
31 pages
Web Scraping and HTML Basics
No ratings yet
Web Scraping and HTML Basics
4 pages
Computing Fundamentals Lab: UET Lahore
No ratings yet
Computing Fundamentals Lab: UET Lahore
8 pages
llms-ctx
No ratings yet
llms-ctx
65 pages
frontend1
No ratings yet
frontend1
4 pages
HTML Tutorial
No ratings yet
HTML Tutorial
49 pages
Chapter 3 Lec1
No ratings yet
Chapter 3 Lec1
8 pages
Web Technoloy
No ratings yet
Web Technoloy
9 pages
Unit1 HTML
No ratings yet
Unit1 HTML
55 pages
FREE AI Python Code Explainer_ Explain Python Code Online
No ratings yet
FREE AI Python Code Explainer_ Explain Python Code Online
3 pages
Unit - 2 Half HTML
No ratings yet
Unit - 2 Half HTML
20 pages
Me Project Stocks Dash
No ratings yet
Me Project Stocks Dash
14 pages
HTML Notes (1)
No ratings yet
HTML Notes (1)
17 pages
Python Web Scraping Data Extraction
No ratings yet
Python Web Scraping Data Extraction
4 pages
HTML CSS JavaScript Basics 1
No ratings yet
HTML CSS JavaScript Basics 1
227 pages
Django Simple Coding Notes
No ratings yet
Django Simple Coding Notes
3 pages
Sagar City Project
No ratings yet
Sagar City Project
32 pages
HTML Notes
No ratings yet
HTML Notes
179 pages
HTML Files With Python
No ratings yet
HTML Files With Python
6 pages
Key Concepts: 2.1 Introduction To Hyper Text Markup Language (HTML)
No ratings yet
Key Concepts: 2.1 Introduction To Hyper Text Markup Language (HTML)
66 pages
Web Design Ppt
No ratings yet
Web Design Ppt
288 pages
First Hello World Program in JavaScript
No ratings yet
First Hello World Program in JavaScript
2 pages
HTML Notes 2
No ratings yet
HTML Notes 2
16 pages
Easy Stufff
No ratings yet
Easy Stufff
157 pages
HTML
No ratings yet
HTML
25 pages
Chap 2 PDF
No ratings yet
Chap 2 PDF
17 pages
WP - Chapter Two
No ratings yet
WP - Chapter Two
91 pages
Web Scrapping On R
No ratings yet
Web Scrapping On R
6 pages
Shishu Pal Hotel
No ratings yet
Shishu Pal Hotel
25 pages
Web Scraping Takeaways
No ratings yet
Web Scraping Takeaways
2 pages
HTLM Intro
No ratings yet
HTLM Intro
28 pages
WDL Lab Manual As On 26 July PDF
No ratings yet
WDL Lab Manual As On 26 July PDF
89 pages
Lesson 1 Notes - Introduction to the Internet
No ratings yet
Lesson 1 Notes - Introduction to the Internet
5 pages
HTML Notes Part-1
No ratings yet
HTML Notes Part-1
24 pages
S12 Web Scraping
No ratings yet
S12 Web Scraping
13 pages
4Python Heat Maps
No ratings yet
4Python Heat Maps
1 page
8Python Web Scraping Dealing with Text
No ratings yet
8Python Web Scraping Dealing with Text
7 pages
11Python Web Scraping Testing with Scrapers
No ratings yet
11Python Web Scraping Testing with Scrapers
5 pages
3Python Web Scraping Getting Started with Python
No ratings yet
3Python Web Scraping Getting Started with Python
4 pages
2Python Web Scraping Introduction
No ratings yet
2Python Web Scraping Introduction
4 pages
10Python Web Scraping Form based Websites
No ratings yet
10Python Web Scraping Form based Websites
3 pages
Reference Consent Form
No ratings yet
Reference Consent Form
1 page
Learning Theories 2016-1 (Notes)
No ratings yet
Learning Theories 2016-1 (Notes)
69 pages
User Guide Sandag Emme
No ratings yet
User Guide Sandag Emme
46 pages
Python Test Class 7
100% (1)
Python Test Class 7
2 pages
Practical File IP Class 12 2023 24
No ratings yet
Practical File IP Class 12 2023 24
65 pages
BPLCK105B-205B
No ratings yet
BPLCK105B-205B
3 pages
Cbse - Department of Skill Education Curriculum For Session 2023-2024
No ratings yet
Cbse - Department of Skill Education Curriculum For Session 2023-2024
20 pages
Draw Heart Using Turtle Graphics in Python - GeeksforGeeks
No ratings yet
Draw Heart Using Turtle Graphics in Python - GeeksforGeeks
6 pages
Pawan Report
No ratings yet
Pawan Report
32 pages
Opencv Tutorials 2.4.3
No ratings yet
Opencv Tutorials 2.4.3
409 pages
Test Data: Amt 10000, Int 3.5, Years 7 Expected Output: 12722.79
No ratings yet
Test Data: Amt 10000, Int 3.5, Years 7 Expected Output: 12722.79
2 pages
Module 2 - Introduction To Python Programming
No ratings yet
Module 2 - Introduction To Python Programming
23 pages
701oswaal CBSE 12th Revision Notes - Computer Science
100% (1)
701oswaal CBSE 12th Revision Notes - Computer Science
32 pages
ML - AI Roadmap
No ratings yet
ML - AI Roadmap
14 pages
TE 2019 DSBDA Lab Manual Sem II 2023 Final
No ratings yet
TE 2019 DSBDA Lab Manual Sem II 2023 Final
170 pages
Software Development Brochure
No ratings yet
Software Development Brochure
18 pages
Python Microproject
No ratings yet
Python Microproject
7 pages
PYTHON 505
No ratings yet
PYTHON 505
22 pages
Mining Social Media Finding Stories in Internet Data 1st Edition Lam Thuy Vo - The complete ebook version is now available for download
100% (2)
Mining Social Media Finding Stories in Internet Data 1st Edition Lam Thuy Vo - The complete ebook version is now available for download
68 pages
2.3 Lab - Explore YANG Models Using The Pyang Tool
No ratings yet
2.3 Lab - Explore YANG Models Using The Pyang Tool
2 pages
Help
No ratings yet
Help
12 pages
Python Introduction
No ratings yet
Python Introduction
281 pages
opencv
No ratings yet
opencv
51 pages
Balaji
No ratings yet
Balaji
34 pages
python full stack
No ratings yet
python full stack
37 pages
Resume Niall Dalton E4
No ratings yet
Resume Niall Dalton E4
1 page
Eeq 5 V 4 JCDQR 4 Emxk 21
No ratings yet
Eeq 5 V 4 JCDQR 4 Emxk 21
4 pages
Rudra's Resume
No ratings yet
Rudra's Resume
1 page
Python For IoT
No ratings yet
Python For IoT
44 pages
Module 5
No ratings yet
Module 5
133 pages

11Python Reading HTML Pages

Uploaded by

11Python Reading HTML Pages

Uploaded by

PYTHON ­ READING HTML PAGES

https://www.tutorialspoint.com/python/python_reading_html_pages.htm Copyright © tutorialspoint.com

conda install Beaustifulsoap

Reading the HTML file

# Fetch the html file

# Parse the html file

# Format the parsed html file

# Print the first few characters

When we execute the above code, it produces the following result.

Extracting Tag Value

soup = BeautifulSoup(html_doc, 'html.parser')

When we execute the above code, it produces the following result.

Extracting All Tags

for x in soup.find_all('b'): print(x.string)

When we execute the above code, it produces the following result.

You might also like

PYTHON READING HTML PAGES