-
Paste the input XML in input.xml file (File name can be changed in the main.cpp MACRO).
-
Run main.cpp. (In terminal, type:
g++ main.cpp
) -
Get the output in the output.json file.
The following assumptions are taken which define the "well formendness" of XML data( http://en.wikipedia.org/wiki/XML#Well-formedness_and_error-handling/ ) :
-
The document contains only properly encoded legal Unicode characters
-
None of the special syntax characters such as < and & appear except when performing their markup-delineation roles
-
The begin, end, and empty-element tags that delimit the elements are correctly nested, with none missing and none overlapping
-
The element tags are case-sensitive; the beginning and end tags must match exactly. Tag names cannot contain any of the characters !"#$%&'()*+,/;<=>?@[]^`{|}~, nor a space character, and cannot start with -, ., or a numeric digit.
-
A single "root" element contains all the other elements (The code works just fine even without any such root tag)
Transformation is shown below:
Pattern | XML | JSON |
1 |
<e/>
|
"e": null
|
2 |
<e>text</e>
|
"e": "text"
|
3 |
<e name="value" />
|
"e":{"@name": "value"}
|
4 |
<e name="value">text</e>
|
"e": {
"@name": "value",
"#text": "text"
}
|
5 |
<e>
<a>text</a>
<b>text</b>
</e>
|
"e": {
"a": "text",
"b": "text"
}
|
6 |
<e>
<a>text</a>
<a>text</a>
</e>
|
"e": {
"a": ["text", "text"]
}
|
7 |
<e>
text
<a>text</a>
</e>
|
"e": {
"#text": "text",
"a": "text"
}
|
It parses through/ support:
-
XML declaration
-
Comments
-
Multi-level tags
-
** Grouping to tags of same name even when they dont occur in sequence (in the same level) and this feature can easily be switched of if required. (see below code in raw form, if on github) ~ <Harry Potter ~ <Harry Potter < Mango ~ <Harry Potter All 3 books will be clubbed together in the same array.
-
Empty tags, Attributes and all other fundamentals.
-
Take care of indentation while showing the output.
-
The XML file is read and piped into a string and is marked bt a root tag on both ends.
-
The string is the parsed through to get opening and respective closing tags.
-
The data inside those tags is then parsed to differntiate them into child elements, attributes, text, and other characteristics.
-
A chart (graph) with multiple nodes is created which is links to other nodes as child (if in a lower level) or sibblings (if in the same level).
-
The above chart is used to print JSON with with proper indentation and brackets. The output can be easily validated on jsonlint.com .