Text.XML.HXT.Parser.MainFunctions
Description
Simple parse functions.
the main building blocks for an application. this module exports complex filters and functions for common tasks for input and parsing, output and option handling.
- getXmlDocument :: Attributes -> String -> IO (XmlTree, XmlTrees, Int)
- putXmlDocument :: Attributes -> String -> XmlTree -> IO (XmlTrees, Int)
- parseDocument :: Attributes -> XmlStateFilter state
- writeDocument :: Attributes -> XmlStateFilter state
Documentation
getXmlDocument :: Attributes -> String -> IO (XmlTree, XmlTrees, Int)Source
convenient function for reading a XML document without dealing with state monads, error messages collection and other details
getXmlDocument calls parseDocument
with the list of parsing options
and an url or filename as document source.
result is a triple
- the resulting document tree with a root node containing all meta info about the document (options, status info, http header, ...)
- the list of errors and warnings
- the error level: one of
c_ok
,c_warn
,c_err
,c_fatal
example for input (see also example in putXmlDocument
and example in writeDocument
)
main :: IO () main = do (res, errs, rc) <- getXmlDocument [] "test.xml" if rc >= c_err then issueErrors errs else processTree res issueErrors :: XmlTrees -> IO () processTree :: XmlTree -> IO ()
for options see parseDocument
, a_collect_errors
is set implicitly
putXmlDocument :: Attributes -> String -> XmlTree -> IO (XmlTrees, Int)Source
the inverse operation to getXmlDocument
writes a complete document tree to a file, writing can be
controlled by options, the real work is done with filter writeDocument
.
useful options are the options of writeDocument
.
result is a pair: 1.part is a list of error messages, 2. part is the return code, the status info of the write filter
this filter is useful, when processing XML in an arbitray context in the IO monad
an example main program for such an application is:
main :: IO () main = do (input, readErrs, rc) <- getXmlDocument [...] "test.xml" if rc >= c_err then issueErrors readErrs else processTree input processTree :: XmlTree -> IO () processTree t = let res = computeNewTree input in do (writeErrs, rc2) <- putXmlDocument [...] "out.xml" res if rc2 >= c_err then issueErrors writeErrs else return () issueErrors :: XmlTrees -> IO () computeNewTree :: XmlTree -> XmlTree
parseDocument :: Attributes -> XmlStateFilter stateSource
the main parsing filter
this filter can be configured by an option list, a list of option name, option value pairs. the input tree must be a possibly empty document root tree. all the options are stored as attributes in this root node to control processing.
available options:
-
a_parse_html
: use HTML parser, else use XML parser (default) -
a_validate
: validate document (default), else skip validation -
a_check_namespaces
: check namespaces, else skip namespace processing (default) -
a_canonicalize
: canonicalize document (default), else skip canonicalization -
a_preserve_comment
: preserve comments during canonicalization, else remove comments (default) -
a_remove_whitespace
: remove all whitespace, used for document indentation, else skip this step (default) -
a_indent
: indent document by inserting whitespace, else skip this step (default) -
a_issue_warnings
: issue warnings, when parsing HTML (default), else ignore HTML parser warnings -
a_issue_errors
: issue all error messages on stderr (default), or ignore all error messages -
a_collect_errors
: all error messages are collected during processing and appended to the result document for further processing within the calling modules -
a_trace
: trace level: values: 0 -4 -
a_proxy
: proxy for http access, e.g. www-cache:3128 -
a_use_curl
: for http access via external programm curl, default is native HTTP access -
a_options_curl
: more options for external program curl -
a_source
: the document source url -
a_encoding
: default document encoding (utf8
,isoLatin1
,usAscii
, ...)
examples:
parseDocument [ (a_source, "test.xml") , (a_validate, "0") , (a_encoding, isoLatin1) ] emptyRoot
reads document "test.xml" without validation and default encoding isoLatin1
parseDocument [ (a_source, "http://www.haskell.org/") , (a_parse_html, "1") , (a_proxy, "www-cache:3128") , (a_curl, "1") , (a_issue_warnings, "0") ] emptyRoot
reads Haskell homepage with HTML parser ignoring any warnings and with http access via external program curl and proxy "www-cache" at port 3128
parseDocument [ (a_source, "http://www.w3c.org/") , (a_parse_html, "0") -- default , (a_validate, "1") -- default , (a_check_namespace, "1") , (a_remove_whitespace, "1") , (a_trace, "2") ] emptyRoot
read w3c home page, validate and chech namespaces, remove whitespace between tags, trace activities with level 2
parseDocument [ (a_source, "test.xml") , (a_validate, "1") , (a_check_namespace, "1") , (a_collect_errors, "1") , (a_issue_errors, "0") ] emptyRoot
reads file "test.xml", validates it, checks namespaces, does not issue any erros but collects errors and appends the list of errors to the single element list for the document. this enables the calling application to define own error handlers.
writeDocument :: Attributes -> XmlStateFilter stateSource
the main filter for writing documents
this filter can be configured by an option list like getXmlDocument
available options are
-
a_indent
: indent document for readability, (default: no indentation) -
a_remove_whitespace
: remove all redundant whitespace for shorten text (default: no removal) -
a_output_file
: destination file for document, default is "-" for stdout -
a_output_encoding
: encoding of document, default isa_encoding
orutf8
-
a_output_xml
: (default) issue XML: quote special XML chars >,<,",',& add XML processing instruction and encode document with respect toa_output_encoding
, if explizitly switched of, the plain text is issued, this is useful for non XML output, e.g. generated Haskell code, LaTex, Java, ... -
a_show_tree
: show tree representation of document (for debugging) -
a_show_haskell
: show Haskell representaion of document (for debugging) -
a_issue_errors
,a_collect_errors
: seeparseDocument
a typical main program running in the XmlState monad has the following structure:
main :: IO () main = do argv <- getArgs -- get the commandline arguments (inp, outp, options) <- cmdlineOpts argv -- and evaluate them, return a key-value list -- and input and output res <- run' $ application inp outp options $ emptyRoot -- run the application exitWith (if null res then ExitFailure (-1) else exitSuccess ) application :: String -> String -> Attributes -> XmlStateFilter () application inp outp al = parseDocument (al ++ [(a_source, inp)]) -- set options and source .>> -- and parse document processDocument -- the hard work .>> writeDocument [(a_output_file, outp)] -- issue results .>> checkStatus -- check errors