XML with Java
Advanced Java
Introduction to Java 2 / Session / 1 of 34
What is XML
• XML is a simple text-based language designed to store and
transmit data in a plain text format. It stands for eXtensible
Markup Language. Some salient features of XML:
• XML is a markup language.
• XML is a tag-based language like HTML.
• XML tags are not predefined like HTML. You can define your
own tags. That's why it's called an extensible language.
• XML tags are designed to be self-describing.
• XML is the W3C recommendation for data storage and
transmission.
2
XML Parsers
• What is a parser?
• A program that analyses the grammatical structure of an
input, with respect to a given formal grammar
• The parser determines how a sentence can be constructed from
the grammar of the language by describing the atomic elements
of the input and the relationship among them
• How should an XML parser work?
3
Parser - Read XML in Java
• XML parsing refers to traversing an XML document to access or modify
data.
• An XML parser provides a way to access or modify data in an XML
document.
• Java provides many options for parsing XML documents:
• Dom Parser: Parse an XML document by loading the entire document's contents
and creating its full hierarchical tree in memory.
• SAX Parser: Parse an XML document on event-based triggers, do not load the
entire complete document into memory.
• JDOM Parser: Parse an XML document in a similar way to a DOM parser but in
an easier way.
• StAX Parser: Parse an XML document in a similar way to the SAX parser but in a
more efficient way.
• …
4
DOM – Document Object Model
5
When should DOM be used?
• You need to know a lot about the structure of an XML document.
• You need to manipulate parts of the XML document (for example,
sort certain elements).
• You need to use the information in an XML document more than
once
• Advantage:
• The DOM is a generic interface for manipulating XML document
structures.
• One of its design goals is to help Java code written for a DOM-compliant
parser that will run on any other DOM-compliant parser without having to
make any modifications.
6
DOM interfaces
• DOM defines several interfaces. The most common
interfaces:
• Node: The basic data type of the DOM.
• Element: Most of the objects you'll be dealing with are Elements.
An Element is a Node.
• Attr: Represents an attribute of an element.
• Text: The actual content of the Element or Attr.
• Document: Represents the entire XML document. A Document
object is often called a DOM tree.
7
XML example
books.xml
8
XML tree structure
books.xml
9
Common DOM methods
• Document.getDocumentElement(): Returns the root element (the
first child) of the document.
• Node.getFirstChild(): Returns the first child of a given Node.
• Node.getLastChild(): Returns the last child of a given Node.
• Node.getNextSibling(): Returns the next sibling of a given Node.
• Node.getPreviousSibling(): This method returns the previous
sibling of a given Node.
• Node.getAttribute(attrName): For a given Node, it returns the
attribute with the requested name attrName.
10
Steps to use DOM
• Import XML related packages:
import org.w3c.dom.*;
import javax.xml.parsers.*;
import java.io.*;
• Create a DocumentBuilder object: 2 substeps
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
• Create a Document object from a file or a stream:
Document doc = dBuilder.parse(File inputFile); //or
StringBuilder xmlStringBuilder = new StringBuilder();
xmlStringBuilder.append("<?xml version="1.0"?> <class> </class>");
ByteArrayInputStream input = new ByteArrayInputStream(xmlStringBuilder.toString().getBytes("UTF-8"));
Document doc = builder.parse(input);
11
Steps to use DOM (cont)
• Extract the root element:
Element root = doc.getDocumentElement();// or
Node root = doc.getFirstChild();
• Check attributes:
//return specific attribute
Node.getAttribute("attributeName");
//return a Map (table) of name/value pairs
Node.getAttributes();
• Check child elements:
//return a list of children of the specified name
Node.getElementsByTagName("subelementName");
//return a list of all child nodes
Node.getChildNodes();
12
XML example
data.xml
13
DOM - Read XML documents
• Create Student.java
public class Student {
private String id;
private String name;
//Constructors
//Getters and setters
@Override
public String toString() {
return "Student{" + "id=" + id + ", name=" + name + '}';
}
}
14
DOM - Read XML documents
Create a DOMReadXML.java file
15
DOM - Read XML documents
16
DOM - Read XML documents
The result:
17
DOM - Create XML documents
• To create the following XML document named example.xml
18
DOM - Create XML documents
Create a DOMCreateXML.java file
19
DOM - Create XML documents
20
DOM - Create XML documents
read by DOMReadXML.java above
the result:
21
DOM - Modify XML documents
Modify the file example.xml above
22
DOM - Modify XML documents
23
DOM - Modify XML documents
read by DOMReadXML.java above
the result:
24
SAX – Simple API for XML
25
SAX Parser
• SAX = Simple API for XML
• XML is read sequentially
• When a parsing event happens, the parser invokes the
corresponding method of the corresponding handler
• The handlers are programmer’s implementation of
standard Java API (i.e., interfaces and classes)
• Similar to an I/O-Stream, goes in one direction
26
Implementing the Content Handler
• A SAX parser invokes methods such as startDocument,
endDocument, startElement and endElement of its
content handler as it runs
• In order to react to parsing events we must:
• implement the ContentHandler interface
• set the parser’s content handler with an instance of our ContentHandler
implementation
• An easy way to implement the ContentHandler interface is to
extend DefaultHandler
27
When should SAX be used?
• SAX parser does not load the complete XML into memory
• It parses the XML by firing different events and when it encounters different
elements like: opening tag, closing tag, data character, comment etc (an
event-based parser)
• To read XML documents with SAX Parser we need to create a class that
extends the DefaultHandler class.
• The DefaultHandler class provides the following different methods:
• startElement(): trigger this event when an opening tag is encountered.
• endElement(): fires this event when a closing tag is encountered.
• characters(): trigger this event when it encounters some text data
28
XML example
data.xml
Start
Document
29
XML example
data.xml
Start root
Element
End root
Element
30
XML example
data.xml
End Element
Start
Element
31
XML example
data.xml
Attribute
32
XML example
data.xml
End Element
StartCharacters
Element
33
SAX - Read XML documents
• Create Student.java
public class Student {
private String id;
private String name;
//Constructors
//Getters and setters
@Override
public String toString() {
return "Student{" + "id=" + id + ", name=" + name + '}';
}
}
34
SAX - Read XML documents
Create a MyHandler.java file
35
SAX - Read XML documents
36
SAX - Read XML documents: main method
The result:
37
SAX vs. DOM
38
Parser Efficiency
• The DOM object built by DOM parsers is usually complicated
and requires more memory storage than the XML file itself
• A lot of time is spent on construction before use
• For some very large documents, this may be impractical
• SAX parsers store only local information that is encountered
during the serial traversal
• Hence, programming with SAX parsers is, in general, more
efficient
39
Node Navigation
• SAX parsers do not provide access to elements other than the
one currently visited in the serial (DFS) traversal of the
document
• In particular,
• They do not read backwards
• They do not enable access to elements by ID or name
• DOM parsers enable any traversal method
• Hence, using DOM parsers is usually more comfortable
40
More DOM Advantages
• DOM object compiled XML
• You can save time and effort if you send and receive DOM
objects instead of XML files
• But, DOM object are generally larger than the source
• DOM parsers provide a natural integration of XML reading
and manipulating
• e.g., “cut and paste” of XML fragments
41
Which should we use?
DOM vs. SAX
• If your document is very large and you only need a few
elements – use SAX
• If you need to manipulate (i.e., change) the XML – use
DOM
• If you need to access the XML many times – use DOM
(assuming the file is not too large)
42
Bài tập (nộp lên elearning)
• Sử dụng API lấy dữ liệu thời tiết về dưới dạng chuỗi xml
và hiển thị trong giao diện GUI.
• Gợi ý: String link = "http://api.openweathermap.org/data/2.5/weather?q=" + city +
"&appid=cabc9614649278ff314eb4f62e95942e&mode=xml";
43
Bài tập
44
Thank you!