A Microformats 2 parser in Haskell https://mf2.packett.cool/
Find a file
2022-10-18 03:50:47 +03:00
.github/workflows Only build executables when flags are enabled 2022-10-18 03:46:38 +03:00
executable Add dark mode to webapp 2022-10-16 20:38:02 +03:00
library/Data/Microformats2 Add aeson 2.x support 2022-10-16 20:37:06 +03:00
test-suite Project stuff 2022-10-16 17:32:34 +03:00
.ghci safe haskell and stuff 2016-09-18 14:54:25 +03:00
.gitignore Add stack metadata 2015-07-12 11:47:24 -07:00
CODE_OF_CONDUCT.md release 1.0.1.7 2017-05-15 00:09:14 +03:00
microformats2-parser.cabal release 1.0.2.2 2022-10-18 03:50:47 +03:00
README.md Project stuff 2022-10-16 17:32:34 +03:00
Setup.hs release 1.0.1.4 2016-03-15 01:24:37 +03:00
stack.yaml Add aeson 2.x support 2022-10-16 20:37:06 +03:00
stack.yaml.lock Add aeson 2.x support 2022-10-16 20:37:06 +03:00
UNLICENSE unlicense, coc 2015-07-30 13:37:12 +03:00

Hackage unlicense

microformats2-parser

Microformats 2 parser for Haskell! #IndieWeb

  • parses items, rels, rel-urls
  • resolves relative URLs (with support for the <base> tag), including inside of html for e-* properties
  • parses the value-class-pattern, including date and time normalization
  • handles malformed HTML (the actual HTML parser is tagstream-conduit)
  • also can convert to JF2
  • high performance
  • extensively tested

Also check out http-link-header because you often need to read links from the Link header!

DEMO PAGE

Usage

Look at the API docs on Hackage for more info, here's a quick overview:

{-# LANGUAGE OverloadedStrings #-}

import Data.Microformats2.Parser
import Data.Default
import Network.URI

parseMf2 def $ documentRoot $ parseLBS "<body><p class=h-entry><h1 class=p-name>Yay!</h1></p></body>"

parseMf2 (def { baseUri = parseURI "https://where.i.got/that/page/from/" }) $ documentRoot $ parseLBS "<body><base href=\"base/\"><link rel=micropub href='micropub'><p class=h-entry><h1 class=p-name>Yay!</h1></p></body>"

The def is the default configuration.

The configuration includes:

  • htmlMode, an HTML parsing mode (Unsafe | Escape | Sanitize)
  • baseUri, the Maybe URI that represents the address you retrieved the HTML from, used for resolving relative addresses -- you should set it

parseMf2 will return an Aeson Value structured like canonical microformats2 JSON. lens-aeson is a good way to navigate it.

Development

Use stack to build.
Use ghci to run tests quickly with :test (see the .ghci file).

$ stack build

$ stack test

$ stack ghci

License

This is free and unencumbered software released into the public domain.
For more information, please refer to the UNLICENSE file or unlicense.org.