Skip to content

A pure (generic parsing) microformats-2 parser for PHP

Notifications You must be signed in to change notification settings

ashwinks/php-mf2

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

71 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

php-mf2

php-mf2 is a generic microformats-2 parser. It doesn’t have a hard-coded list of all the different microformats, just a set of procedures to handle different property types (e.g. p- for plaintext, u- for URL, etc). This allows for a very small and maintainable parser.

Installation

Install with Composer by adding "mf2/mf2": "0.1.*" to the require object in your composer.json and running php composer.phar update.

Usage

mf2 is PRS-0 autoloadable, so all you have to do to load it is:

  1. Include Composer’s auto-generated autoload file (/vendor/autoload.php)
  2. Declare mf2\Parser in your use statement
  3. Make a new Parser($input) where $input can either be a string of HTML or a DOMDocument

Example Code

<?php

include $_SERVER['DOCUMENT_ROOT'] . '/vendor/autoload.php';

use mf2\Parser;

$parser = new Parser('<div class="h-card"><p class="p-name">Barnaby Walters</p></div>');
$output = $parser -> parse();

print_r($output);

// EOF

Parser::parse() should return an array structure mirroring the canonical JSON serialisation introduced with µf2. print_red, it looks something like this:

Array
(
    [items] => Array
        (
            [0] => Array
                (
                    [type] => Array
                        (
                            [0] => h-card
                        )
                    [properties] => Array
                    	(
                    		[name] => Barnaby Walters
                    	)

                )

        )

)

Note that, whilst the property prefixes are stripped, the prefix of the h-* classname is left on.

A baseurl can be provided as the second parameter of mf2\Parser::__construct() — it’s prepended to any u- properties which are relative URLs.

Output Types

Different µf-2 property types are returned as different types.

  • h-* are associative arrays containing more properties
  • p-* and u- are returned as whitespace-trimmed strings
  • dt-* are returned as \DateTime objects
  • e-* are returned as non HTML encoded strings of markup representing the innerHTML of the element classed as e-*

Security

Little to no filtering of content takes place in mf2\Parser, so treat its output as you would any untrusted data from the source of the parsed document

Parsing Behaviour

php-mf2 follows the various µf2 parsing guidelines on the microformats wiki. Useful reference:

php-mf2 includes support for implied p-name, u-url and u-photo as per the µf2 parsing process, with the result that every microformat will have a name property whether or not it is explicitly declared. More info on what this is any why it exists in the µf2 FAQ.

It also includes an approximate implementation of the Value-Class Pattern, currently acting only on dt-* properties but soon to be rolled out to all property types

When a DOMElement with a classname of e-* is found, the DOMNode::C14N() stringvalue of each of it’s children are concatenated and returned

Testing

Currently php-mf2 is tested fairly thoroughly, but the code itself is not hugely testable (lots of repetition and redundancy). This is something I’m working on changing Tests are written in phpunit and are contained within /tests/. Running phpunit . from the root dir will run them all.

There are enough tests to warrant putting them into separate suites for maintenance. The different suits are:

  • ParserTest.php: Tests for internal, e-* parsing and sanity checks.
  • ParseImpliedTest.php: Tests of the implied property patterns
  • CombinedMicroformatsTest.php: Tests of nested microformats
  • MicroformatsWikiExamplesTest.php: Tests taken directly from the wiki pages about µf2
  • Parse*Test.php for P, U and DT. Contains tests for a particular property type.

As of v0.1.6, the only property with any support for value-class is dt-*, so that currently contains the value-class tests. These should be moved elsewhere as value-class and value-title are abstracted and rolled out to all properties.

About

A pure (generic parsing) microformats-2 parser for PHP

Resources

Stars

Watchers

Forks

Packages

No packages published