0% found this document useful (0 votes)
52 views

WEB Programming

The document discusses web programming using Python. It covers creating simple web clients with Python by using modules like urllib to download and access information from the web. It also discusses web servers and how Python can be used for both client-side programming to access websites, as well as server-side programming to provide websites and applications.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views

WEB Programming

The document discusses web programming using Python. It covers creating simple web clients with Python by using modules like urllib to download and access information from the web. It also discusses web servers and how Python can be used for both client-side programming to access websites, as well as server-side programming to provide websites and applications.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 111

UNIT-IV

WEB Programming
BY LOKESH JOEL
Unit 4
⚫ WEB Programming:
⚫ Introduction,
⚫ Wed Surfing with Python,
⚫ Creating Simple Web Clients,
⚫ Advanced Web Clients,
⚫ CGI-Helping Servers Process Client Data,
⚫ Building CGI Application,
⚫ Advanced CGI,
⚫ Web (HTTP) Servers

2
Introduction
⚫ Web Surfing: Client/Server Computing
⚫ The Internet

https://flylib.com/books/en/2.108.1.242/1/

3
Web client and Web server on the Internet. A client sends a request out over the Internet to
the server, which then responds with the requested data back to the client.

4
https://flylib.com/books/en/2.108.1.242/1/
Transport Control Protocol (TCP)
• Built on top of IP (Internet Protocol)

• Assumes IP might lose some data -


stores and retransmits data if it seems
to be lost

• Handles “flow control” using a transmit


window

• Provides a nice reliable pipe


Source: http://en.wikipedia.org/wiki/Internet_Protocol_Suite
TCP Connections / Sockets
TCP Port Numbers

http://en.wikipedia.org/wiki/TCP_and_UDP_port
www.mgit.ac.in
Incoming 25
E-Mail

blah blah
Login 23 blah blah
74.208.28.177
80
Web Server
443

Personal 109
Mail Box
110

Clipart: http://www.clker.com/search/networksym/1
Common TCP Ports

http://en.wikipedia.org/wiki/List_of_TCP_and_UDP_port_numbers
Sometimes we see the
port number in the URL
if the web server is
running on a “non-
standard” port.
Sockets in Python

Python has built-in support for TCP Sockets

http://docs.python.org/library/socket.html
Application Protocol

l Since TCP (and Python) gives us a reliable socket, what do we want to


do with the socket? What problem do we want to solve?

l Application Protocols

- Mail

- World Wide Web

Source:
http://en.wikipedia.org/wiki/Internet_Protocol_Suite
HTTP - Hypertext Transfer
Protocol

http://en.wikipedia.org/wiki/Http
HTTP

The HyperText Transfer Protocol is the set of rules to


allow browsers to retrieve web documents from
servers over the Internet
What is a Protocol?
⚫ A set of rules that all parties follow so we can
predict each other’s behavior

⚫ And not bump into each other

- On two-way roads in USA, drive on the right-hand


side of the road

- On two-way roads in the INDIA, drive on the left-


hand side of the road
Getting Data From The Server
⚫ Each time the user clicks on an anchor tag with an
href= value to switch to a new page, the browser makes
a connection to the web server and issues a “GET”
request - to GET the content of the page at the
specified URL
⚫ The server returns the HTML document to the browser,
which formats and displays the document to the user
Web Server
80

Browser
Web Server
80

Browser
Click
Request Web Server
80

GET http://www.dr-chuck.com/page2.htm

Browser
Click
Request Web Server
80

GET http://www.dr-chuck.com/page2.htm

Browser
Click
Request Web Server Response

80
<h1>The Second
Page</h1><p>If you like, you
can switch back to the <a
href="page1.htm">First
GET http://www.dr-chuck.com/page2.htm Page</a>.</p>

Browser
Click
Request Web Server Response

80
<h1>The Second
Page</h1><p>If you like, you
can switch back to the <a
href="page1.htm">First
GET http://www.dr-chuck.com/page2.htm Page</a>.</p>

Browser
Click Parse/
Render
Let’s Write a Web Browser!
An HTTP Request in Python
import socket

mysock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)


mysock.connect(('data.pr4e.org', 80))
cmd = 'GET http://data.pr4e.org/romeo.txt HTTP/1.0\r\n\r\n'.encode()
mysock.send(cmd)

while True:
data = mysock.recv(512)
if (len(data) < 1):
break
print(data.decode(),end='')
mysock.close()
HTTP/1.1 200 OK HTTP Header
Date: Sun, 14 Mar 2010 23:52:41 GMT
Server: Apache
Last-Modified: Tue, 29 Dec 2009 01:31:22 GMT
ETag: "143c1b33-a7-4b395bea" while True:
Accept-Ranges: bytes data = mysock.recv(512)
Content-Length: 167 if ( len(data) < 1 ) :
Connection: close break
Content-Type: text/plain print(data.decode())

But soft what light through yonder window breaks


It is the east and Juliet is the sun
Arise fair sun and kill the envious moon
Who is already sick and pale with grief HTTP Body
An HTTP Request in Python
import socket

mysock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)


mysock.connect(('data.pr4e.org', 80))
cmd = 'GET http://data.pr4e.org/romeo.txt HTTP/1.0\n\n'.encode()
mysock.send(cmd)

while True:
data = mysock.recv(512)
if (len(data) < 1):
break
print(data.decode())
mysock.close()
https://docs.python.org/3/library/stdtypes.html#bytes.decode
https://docs.python.org/3/library/stdtypes.html#str.encode
decode() Bytes recv()
UTF-8
String Socket Network
Unicode
Bytes
encode() send()
UTF-8

import socket

mysock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)


mysock.connect(('data.pr4e.org', 80))
cmd = 'GET http://data.pr4e.org/romeo.txt HTTP/1.0\n\n'.encode()
mysock.send(cmd)

while True:
data = mysock.recv(512)
if (len(data) < 1):
break
print(data.decode())
mysock.close()
Figure 20-2. A grand view of the Internet. The
left side illustrates where you would find Web
clients while the right side hints as to where
Web servers are typically located.

30
Important Internet Applications/Protocol
⚫ Telnet
⚫ allows users to log in to a remote host on the Internet and still in use today
⚫ File Transfer Protocol (FTP)
⚫ the File Transfer Protocol that enables users to share files and data via uploading or
downloading and also still in use today.
⚫ Electronic Mail (Email)
⚫ SMTP or Simple Mail Transfer Protocol (the protocol used for one of the oldest and most widely
used Internet applications: electronic mail)
⚫ Gopher
⚫ Gopher (the precursor to the Web search engine "gopher"-like piece of software that "tunneled
the Internet" looking for the data that you were interested in)
⚫ Internet Relay Chat (IRC)
⚫ Usenet News
⚫ NNTP (News-to-News Transfer Protocol)
⚫ World Wide Web (WWW)
Wed Surfing with Python
⚫ Creating Simple Web Clients,
⚫ Uniform Resource Locators
⚫ urlparse Module
⚫ urllib Module
⚫ urllib2 Module

32
web programming
● Python can be used to support web programming that falls
into one of two general categories
– client programming – accesses web sites or web
applications
– server programming – script executed by the server to
provide a web site, perform server-side applications, etc.

33
What is a web client?
● Any program that retrieves data from a web server using the
HTTP protocol
● Examples:
– web browsers – contact web servers using HTTP protocol
and display HTTP responses
– web crawlers – traverse the web automatically to gather
information
– web service clients (service requester) – request and
process data from a web service provider; the web service
provider responds using some web service protocol such as
RSS (Rich Site Summary) or RPC (remote procedure call) 34
⚫ Applications that use the urllib module to download or access
information on the Web
⚫ [using either urllib.urlopen() or urllib.urlretrieve()] can be considered
a simple Web client.
⚫ All you need to do is provide a valid Web address.

35
Python web client programming
● modules that come standard with Python
– urllib – interface for fetching data across the web
– urllib2 – an interface for fetching data across the web that
allows you to specify HTTP request headers, handle
authentication and cookies
– httplib – makes http requests; is used by urllib and urllib2
– HTMLParser – for parsing HTML and XHTML files
– xmlrpclib – allows clients to call methods on a remote server
– cookielib (used to be Clientcookie) – provides classes for
handling HTTP cookies 36
URL [Uniform Resource Locator]
⚫ Simple Web surfing involves using Web addresses called URLs
(Uniform Resource Locators).
⚫ Such addresses are used to locate a document on the Web or to call a CGI
program to generate a document for your client.
⚫ URLs are part of a larger set of identifiers known as URIs (Uniform
Resource Identifiers).
⚫ This superset was created in anticipation of other naming conventions that
have yet to be developed. A URL is simply a URI which uses an existing
protocol or scheme (i.e., http, ftp, etc.) as part of its addressing.
⚫ URIs are sometimes known as URNs (Uniform Resource Names),
but because URLs are the only URIs in use today,
37
URL [Uniform Resource Locator]

Also a resource (an image)


http://www.food.com/

A resource(food.com home page)


http://www.geniuskitchen.com/recipe/b http://www.geniuskitchen.com/recipe/roas
roccoli-salad-10733 ted-cauliflower-102194
ftp://server.com/download/mgit.pdf

mailto://[email protected]
http://food.com/images/logo.jpg
.php //.asp

http://www.geniuskitchen.com/recipe/roas
ted-cauliflower-102194
http://www.geniuskitchen.com/recipe/roas
ted-cauliflower-102194
http://hr-intranet/search?firstName=Lokesh&lastName=joel
https://en.wikipedia.org/wiki/Jabuticaba#Cultural_aspects
Final syntax for URL

http://host:8080/path?q=query#fragment

http://host:port/path:parameters?query#fragment
https 443 HTTP with Secure Sockets Layer (SSL)

https
https
https
72
urlparse Module
⚫ The urlparse module provides basic functionality with which to
manipulate URL strings.
Or
⚫ The URL parsing functions focus on splitting a URL string into
its components, or on combining URL components into a URL
string.
⚫ These functions include urlparse(), urlunparse(), and
urljoin().

73
https://docs.python.org/3/library/urllib.parse.html
Note:
⚫ The module has been designed to match the Internet RFC on
Relative Uniform Resource Locators. It supports the following
URL schemes: file, ftp, gopher, hdl, http, https, imap, mailto,
mms, news, nntp, prospero, rsync, rtsp, rtspu, sftp, shttp, sip,
sips, snews, svn, svn+ssh, telnet, wais, ws, wss.
⚫ The urllib.parse module defines functions that fall into two
broad categories: URL parsing and URL quoting.

74
syntax

75
76
urlunparse(parts)
⚫ This function constructs a URL from a tuple as returned by
urlparse(). The parts argument can be any six-item iterable.
Or
⚫ Urlunparse() does the exact opposite of urlparse() it merges a
6-tuple (prot_sch, net_loc, path, params, query, frag) urltup,
which could be the output of urlparse(), into a single URL string
and returns it.
⚫ Accordingly, we state the following equivalence:
urlunparse(urlparse(urlstr))
⚫ syntax of urlunparse() is as follows: 77

urlunparse(urltup)
78
urllib.parse.urlsplit
⚫ This is similar to urlparse(),
⚫ but does not split the params from the URL.
⚫ This should generally be used instead of urlparse() if the more recent
URL syntax allowing parameters to be applied to each segment of
the path portion of the URL is wanted.
⚫ A separate function is needed to separate the path segments and
parameters. This function returns a 5-item named tuple:

79
80
urlparse.urljoin()
⚫ The urljoin() function is useful in cases where many related
URLs are needed, for example, the URLs for a set of pages
to be generated for a Web site.
⚫ The syntax for urljoin() is:
urljoin(baseurl, newurl, allowFrag=True)

81
Urllib mobule
⚫ Python language is used extensively for web programming.
When we browser website we use the web address which is
also known as URL or uniform resource locator. Python has
inbuilt materials which can handle the calls to the URL as well
as pass the result that comes out of visiting the URL.

82
Opening an URL
⚫ The request.urlopen method opens a Web connection to the
given URL string and returns a file-like object.
⚫ Syntax:
urlopen(urlstr, postQueryData=None)

83
84
85
86
urllib.urlretrieve()
⚫ urlretrieve() will do some quick and dirty work for you if you
are interested in working with a URL document as a whole.
⚫ Here is the syntax for urlretrieve():

urlretrieve(urlstr, localfile=None, downloadStatusHook=None)

87
urllib.urlretrieve()
⚫ copy a network object denoted by a URL to a local file. If the URL points to a
local file, the object will not be copied unless filename is supplied.
⚫ Return a tuple (filename, headers)
⚫ where filename is the local file name under which the object can be found, and
⚫ headers is whatever the info() method of the object returned by urlopen() returned (for a
remote object). Exceptions are the same as for urlopen().
⚫ The second argument, if present, specifies the file location to copy to (if
absent, the location will be a tempfile with a generated name).
⚫ The third argument, if present, is a callable that will be called once on
establishment of the network connection and once after each block read
thereafter. The callable will be passed three arguments; a count of blocks
transferred so far, a block size in bytes, and the total size of the file. The third
argument may be -1 on older FTP servers which do not return a file size in
88
response to a retrieval request.
89
90
urllib.quote() and urllib.quote_plus()
⚫ The URL quoting functions focus on taking program data and
making it safe for use as URL components by quoting special
characters and appropriately encoding non-ASCII text.
⚫ quote()
⚫ This function replaces special characters in string using the
%xx(where "xx" is the hexadecimal representation of a character's
ASCII value.) escape. Letters, digits, and the characters '_.-~' are
never quoted.
⚫ quote*() functions have the following syntax:

91
• Note:
• quote_plus() is similar to quote() except that it also encodes
spaces to plus signs ( + ).

92
93
urllib.unquote() and urllib.unquote_plus()
⚫ Replace %xx escapes with their single-octet equivalent, and
return a bytes object.
⚫ string may be either a str or a bytes object.
⚫ If it is a str, unescaped non-ASCII characters in string are
encoded into UTF-8 bytes.

94
95
urllib.urlencode()
⚫ urlencode(), added to Python back in 1.5.2, takes a dictionary
of key-value pairs and encodes them to be included as part of
a query in a CGI request URL string.
⚫ The pairs are in "key=value" format and are delimited by
ampersands ( & ). Furthermore, the keys and their values are
sent to quote_plus() for proper encoding.
⚫ Syntax:

96
97
Secure Socket Layer support
⚫ The urllib module was given support for opening HTTP
connections using the Secure Socket Layer (SSL) in 1.6.
⚫ The core change to add SSL is implemented in the socket
module.
⚫ Consequently, the urllib and httplib modules were updated to
support URLs using the "https" connection scheme.
⚫ In addition to those two modules, other protocol client
modules with SSL support include: imaplib, poplib, and
smtplib.
98
Summary urllib module

99
100
101
102
103
104

http://www.voidspace.org.uk/python/articles/urllib2.shtml
urllib2 Module
• urllib2 is a Python module for fetching URLs.
• Offers a very simple interface, in the form of the urlopen function.
• Capable of fetching URLs using a variety of different protocols
(http, ftp, file, etc)
• Also offers a slightly more complex interface for handling common
situations:
– Basic authentication
– Cookies
– Proxies
– etc 105
urllib2 Module
⚫ urllib2 can handle more complex URL opening. One example is for
Web sites with basic authentication (login and password)
requirements.
⚫ The most straightforward solution to "getting past security" is to use
the extended net_loc URL component, i.e.,
http://user:[email protected].

106
urllib2 Module
⚫ The problem with this solution is that it is not programmatic. Using
urllib2, however, we can tackle this problem in two different ways.
⚫ We can create a basic authentication handler (urllib2.HTTPBasicAuthHandler)
and "register" a login password given the base URL and perhaps a realm,
meaning a string defining the secure area of the Web site. Once this is done,
you can "install" a URL-opener with this handler so that all URLs opened will
use our handler.
⚫ The other alternative is to simulate typing the username and password when
prompted by a browser and that is to send an HTTP client request with the
appropriate authorization headers.

107

https://python.readthedocs.io/en/v2.7.2/library/urllib2.html
108
urllib vs urllib2 (python 2)
• Both modules do URL request related stuff, but they have different
functionality.
• urllib2 can accept a Request object to set the headers for a URL
request, urllib accepts only a URL.
• urllib provides the urlencode method which is used for the generation
of GET query strings, urllib2 doesn't have such a function.
• Because of that urllib and urllib2 are often used together.

https://tinyurl.com/y2gkz3sd
Advanced Web Clients,
⚫ program explanation

110
Web Programming Part 2
By Lokesh Joel

111

You might also like