php-mf2 is a generic microformats-2 parser. It doesn’t have a hard-coded list of all the different microformats, just a set of procedures to handle different property types (e.g. p-
for plaintext, u-
for URL, etc). This allows for a very small and maintainable parser.
Install with Composer by adding "mf2/mf2": "0.1.*"
to the require
object in your composer.json
and running php composer.phar update.
mf2 is PSR-0 autoloadable, so all you have to do to load it is:
- Include Composer’s auto-generated autoload file (
/vendor/autoload.php
) - Declare
mf2\Parser
in youruse
statement - Make a
new Parser($input)
where$input
can either be a string of HTML or a DOMDocument
<?php
include '/vendor/autoload.php';
use mf2\Parser;
$parser = new Parser('<div class="h-card"><p class="p-name">Barnaby Walters</p></div>');
$output = $parser->parse();
print_r($output);
// EOF
Parser::parse() should return an array structure mirroring the canonical JSON serialisation introduced with µf2. print_r
ed, it looks something like this:
Array
(
[items] => Array
(
[0] => Array
(
[type] => Array
(
[0] => h-card
)
[properties] => Array
(
[name] => Barnaby Walters
)
)
)
)
If no microformats are found, items
will be an empty array. rels and alternates are also included.
Note that, whilst the property prefixes are stripped, the prefix of the h-*
classname is left on.
A baseurl can be provided as the second parameter of mf2\Parser::__construct()
— it’s prepended to any u-
properties which are relative URLs.
There are several ways to selectively parse microformats from a document. If you wish to only parse microformats from an element with a particular ID, Parser::parseFromId($id, $htmlSafe=null) is the easiest way.
If your needs are more complex, Parser::parse accepts an optional context DOMNode as it’s third parameter. Typically you’d use Parser::query to run XPath queries on the document to get the element you want to parse from under, then pass it to Parser::parse. Example usage:
$doc = 'More microformats, more microformats <div id="parse-from-here"><span class="h-card">This shows up</span></div> yet more ignored content';
$parser = new Parser($doc);
$parser->parseFromId('parse-from-here'); // returns a document with only the h-card descended from div#parse-from-here
$elementIWant = $parser->query('an xpath query')[0];
$parser->parse(null, $elementIWant); // returns a document with only mfs under the selected element
php-mf2 has limited support for classic microformats — it doesn’t actually parse
them but can convert legacy classnames into µf2 classnames (e.g. vcard
=>
h-card
, fn
=> p-name
, etc.).
// Once your parser has been initialised:
$parser->convertLegacy(); // Converts classic microformats by default
$out = $parser->parse();
You can also define your own custom class mappings, to provide some support for popular sites which don’t use mf2 but do use use semantic classnames. An experimental set for twitter.com is provided.
// Once your have $parser
$parser->convertTwitter(); // Adds twitter mapping
// Or, add your own mapping:
$parser->addClassMap([
'oldclassname' => 'p-new-class-name'
]);
$parser->convertLegacy();
// Then parse
$out = $parser->parse();
Little to no filtering of content takes place in mf2\Parser, so treat its output as you would any untrusted data from the source of the parsed document
There is an issue with the microformats2 parsing spec which can cause the parser output level of HTML-encoding to vary (e.g. some angle brackets are converted to < >, others are not) without the consumer being able to tell at what level any given string is.
To solve this, if you pass true to Parser::parse (or as the third parameter of Parser::__construct), the parser will html-encode angle brackets in any non e-* properties, bringing everything up to the same level of encoding.
Note that this does not make content from untrusted sources secure, it merely makes the parser behave in a consistent manner. If you are outputting parsed microformats you must still take security precautions such as purifying the HTML.
php-mf2 follows the various µf2 parsing guidelines on the microformats wiki. Useful reference:
php-mf2 includes support for implied p-name
, u-url
and u-photo
as per the µf2 parsing process, with the result that every microformat will have a name
property whether or not it is explicitly declared. More info on what this is any why it exists in the µf2 FAQ.
It also includes an approximate implementation of the Value-Class Pattern, currently acting only on dt-*
properties but soon to be rolled out to all property types
When a DOMElement with a classname of e-* is found, the DOMNode::C14N() stringvalue of each of it’s children are concatenated and returned
Pull requests very welcome, please try to maintain stylistic, structural and naming consistency with the existing codebase, and don’t be too upset if I make naming changes :)
Please add tests which cover changes you plan to make or have made. I use PHPUnit, which is the de-facto standard for modern PHP development.
At the very least, run the test suite before and after making your changes to make sure you haven’t broken anything.
Issues/bug reports welcome. If you know how to write tests then please do so as code always expresses problems and intent much better than English, and gives me a way of measuring whether or not fixes have actually solved your problem. If you don’t know how to write tests, don’t worry :) Just include as much useful information in the issue as you can.
Currently php-mf2 is tested fairly thoroughly, but the code itself is not hugely testable (lots of repetition and redundancy). This is something I’m working on changing
Tests are written in phpunit and are contained within /tests/
. Running phpunit . from the root dir will run them all.
There are enough tests to warrant putting them into separate suites for maintenance. The different suits are:
ParserTest.php
: Tests for internal,e-*
parsing and sanity checks.ParseImpliedTest.php
: Tests of the implied property patternsCombinedMicroformatsTest.php
: Tests of nested microformatsMicroformatsWikiExamplesTest.php
: Tests taken directly from the wiki pages about µf2Parse*Test.php
forP
,U
andDT
. Contains tests for a particular property type.
As of v0.1.6, the only property with any support for value-class is dt-*
, so that currently contains the value-class tests. These should be moved elsewhere as value-class and value-title are abstracted and rolled out to all properties.
- Rewrote some PHP 5.4 array syntax which crept into 0.1.16 so php-mf2 still works on PHP 5.3
- Fixed a bug causing weird partial microformats to be added to parent microformats if they had doubly property-nested children
- Finally actually licensed this project under a real license (MIT, in composer.json)
- Suggested barnabywalters/mf-cleaner in composer.json
- Ability to parse from only an ID
- Context DOMElement can be passed to $parse
- Parser::query runs XPath queries on the current document
- When parsing e-* properties, elements with @src, @data or @href have relative URLs resolved in the output
- Added html-safe options
- Added rel+rel-alternate parsing