XML
XML
What is XML?
XML is a software- and hardware-independent tool for storing and transporting
data.
XML stands for eXtensible Markup Language
Define rules for encoding documents in a format that is both human readable and
machine readable .
XML is a markup language much like HTML
XML was designed to store and transport data
XML was designed to be self-descriptive[contain both data and metadata]
XML is a W3C Recommendation
Plain text files, which makes them easy to create and manipulate with standard
text editors and tools.
<?xml version="1.0" encoding="UTF-8"?> // xml prolog
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body> //text content is the
actual data
</note>
The XML above is quite self-descriptive:
<price>29.99</price>
An element can contain:
text
attributes
other elements
or a mix of the above
<bookstore>
<book category="children">
<title>Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>
<book category="web">
<title>Learning XML</title>
<author>Erik T. Ray</author>
<year>2003</year>
<price>39.95</price>
</book>
</bookstore>
<!ELEMENT bookstore (book+)>
<!ELEMENT book (title, author, year, price)>
<!ATTLIST book category CDATA #REQUIRED>
<!ELEMENT title (#PCDATA)>
<!ELEMENT author (#PCDATA)>
<!ELEMENT year (#PCDATA)>
<!ELEMENT price (#PCDATA)>
XML Attributes
<person gender="female">
<person gender="female">
<firstname>Anna</firstname>
<lastname>Smith</lastname>
</person>
<person>
<gender>female</gender>
<firstname>Anna</firstname>
<lastname>Smith</lastname>
</person>
Well-Formed XML Documents
A well-formed document must adher to, among others, the
following rules:
• Every start tag has a matching end tag.
• Elements may nest, but must not overlap.
• There must be exactly one root element.
• Attribute values must be quoted.
• An element may not have two attributes with the same
name.
• Comments and processing instructions may not appear
inside tags.
• No unescaped < or & signs may occur inside
character data.
Only well-formed documents
can be processed by XML
parsers.
Document Type Definitions
Sometimes XML is too flexible:
• Most Programs can only process a subset of all possible XML
applications
• For exchanging data, the format (i.e., elements, attributes and
their semantics) must be fixed
Document Type Definitions (DTD) for establishing the vocabulary for
one XML application (in some sense comparable to schemas in databases)
A document is valid with respect to a DTD if it conforms to the rules
specified in that DTD.
Most XML parsers can be configured to validate.
Components of a DTD
Element Declarations: Define the elements that can appear in the document and the
order they must appear in.
Attribute Declarations: Define attributes that elements can have and the types of those
attributes.
Entity Declarations: Define reusable content chunks.
Notation Declarations: Define how non-XML data can be referenced.
Element Declarations in
DTDs
One element declaration for each element type:
<!ELEMENT element_name content_specification>
where content_specification can be
• (#PCDATA) parsed character data
• (child) one child element
• (c1,…,cn) a sequence of child elements c1…cn
• (c1|…|cn) one of the elements c1…cn
DTD Example: Elements
<!ELEMENT article (title,author+,text)>
<!ELEMENT title (#PCDATA)>
<!ELEMENT author (#PCDATA)>
<!ELEMENT text (abstract,section*,literature?)>
<!ELEMENT abstract (#PCDATA)>
<!ELEMENT section (#PCDATA|index)+>
<!ELEMENT literature (#PCDATA)>
<!ELEMENT index
(#PCDATA)> Content of the text element
Content of the title may contain zero or more
element is parsed character section elements in this
data
Content of the article elementposition
is a title element,
followed by one or more author elements,
followed by a text element
Attribute Declarations in DTDs
Attributes are declared per element:
<!ATTLIST section number CDATA #REQUIRED
title CDATA
#REQUIRED>
declares two required attributes for element section.
element name
attribute name
attribute type
attribute default
Attribute Declarations in DTDs
Attributes are declared per element:
<!ATTLIST section number CDATA #REQUIRED
title CDATA
#REQUIRED>
declares two required attributes for element section.
Possible attributedefaults:
• #REQUIRED is required in each element instance
• #IMPLIED is optional
• #FIXED default always has this default value
• default has this default value if the attribute is
omitted from the element instance
XML DTD
A DTD defines the structure and the legal elements and attributes of an XML
document
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE note SYSTEM "Note.dtd">
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
Note.dtd:
<!DOCTYPE note
[
<!ELEMENT note (to,from,heading,body)>
<!ELEMENT to (#PCDATA)>
<!ELEMENT from (#PCDATA)>
<!ELEMENT heading (#PCDATA)>
<!ELEMENT body (#PCDATA)>
]>
Flaws of DTDs
• No support for basic data types like integers, doubles,
dates, times, …
• No structured, self-definable data types
• No type derivation
• id/idref links are quite loose (target is not specified)
Applications of XML
• Database applications
• Document Mark-up( with HTML)
• Mathematical Mark-up language(MATHML)
• Messaging b/w different business platforms
• Channel definition Format (CDF)
• Metacontent definition
• Platform for Internet Context Selection (PICS)
• Platform for Privacy References Syntax Specification
(P3P)
• Resource Description Format (RDF)
• Scaleable Vector Graphics (SVG)
• Synchronized Multemedia Integration Language
XML Schema
XML Schema, often referred to as XSD (XML Schema Definition), is a language used to define
the structure and constraints of XML documents.
It is more powerful and expressive than DTD (Document Type Definition) and supports data
types, namespaces, and more complex content models.
Elements: Define the elements that can appear in the XML document.
Attributes: Define the attributes that elements can have.
Complex Types: Define elements that can contain other elements and/or attributes.
Simple Types: Define elements that contain only text.
Restrictions: Define constraints on elements and attributes (e.g., length, range, pattern).
Namespaces: Provide a way to avoid name conflicts by qualifying names.
XML Schema
<xs:complexType>
<xs:sequence>
<xs:element name="to" type="xs:string"/>
<xs:element name="from" type="xs:string"/>
<xs:element name="heading" type="xs:string"/>
<xs:element name="body" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>