11Python Reading HTML Pages
11Python Reading HTML Pages
Advertisements
library known as beautifulsoup. Using this library, we can search for the values of html tags and get specific data
like title of the page and the list of headers in the page.
Install Beautifulsoup
Use the Anaconda package manager to install the required package and its dependent packages.
import urllib2
from bs4 import BeautifulSoup
<!DOCTYPE html>
<!‐‐[if IE 8]><html class="ie ie8"> <![endif]‐‐>
<!‐‐[if IE 9]><html class="ie ie9"> <![endif]‐‐>
<!‐‐[if gt IE 9]><!‐‐>
<html>
<!‐‐<![endif]‐‐>
<head>
<!‐‐ Basic ‐‐>
<meta charset="utf‐8"/>
<title>
response = urllib2.urlopen('http://tutorialspoint.com/python/python_overview.htm')
html_doc = response.read()
print (soup.title)
print(soup.title.string)
print(soup.a.string)
print(soup.b.string)
Python Overview
None
Python is Interpreted
import urllib2
from bs4 import BeautifulSoup
response = urllib2.urlopen('http://tutorialspoint.com/python/python_overview.htm')
html_doc = response.read()
soup = BeautifulSoup(html_doc, 'html.parser')
Python is Interpreted
Python is Interactive
Python is Object‐Oriented
Python is a Beginner's Language
Easy‐to‐learn
Easy‐to‐read
Easy‐to‐maintain
A broad standard library
Interactive Mode
Portable
Extendable
Databases
GUI Programming
Scalable