|
1 | | -This implementation doesn't work yet. Bits and pieces are getting written |
2 | | -at a time. |
| 1 | +html5lib - php flavour |
3 | 2 |
|
4 | | -This is an implementation of the HTML5 specification for PHP. More friendly |
5 | | -details forthcoming, but here are some notes: |
| 3 | +This is an implementation of the tokenization and tree-building parts |
| 4 | +of the HTML5 specification in PHP. Potential uses of this library |
| 5 | +can be found in web-scrapers and HTML filters. |
| 6 | + |
| 7 | +Warning: This is a pre-alpha release, and as such, certain parts of |
| 8 | +this code are not up-to-snuff (e.g. error reporting and performance). |
| 9 | +However, the code is very close to spec and passes 100% of tests |
| 10 | +not related to parse errors. Nevertheless, expect to have to update |
| 11 | +your code on the next upgrade. |
| 12 | + |
| 13 | + |
| 14 | +Usage notes: |
| 15 | + |
| 16 | + <?php |
| 17 | + require_once '/path/to/HTML5/Parser.php'; |
| 18 | + $dom = HTML5_Parser::parse('<html><body>...'); |
| 19 | + $nodelist = HTML5_Parser::parseFragment('<b>Boo</b><br>'); |
| 20 | + $nodelist = HTML5_Parser::parseFragment('<td>Bar</td>', 'table'); |
| 21 | + |
| 22 | + |
| 23 | +Documentation: |
| 24 | + |
| 25 | +HTML5_Parser::parse($text) |
| 26 | + $text : HTML to parse |
| 27 | + return : DOMDocument of parsed document |
| 28 | + |
| 29 | +HTML5_Parser::parseFragment($text, $context) |
| 30 | + $text : HTML to parse |
| 31 | + $context : String name of context element |
| 32 | + return : DOMDocument of parsed document |
| 33 | + |
| 34 | + |
| 35 | +Developer notes: |
6 | 36 |
|
7 | 37 | * To setup unit tests, you need to add a small stub file test-settings.php |
8 | 38 | that contains $simpletest_location = 'path/to/simpletest/'; This needs to |
9 | 39 | be version 1.1 (or, until that is released, SVN trunk) of SimpleTest. |
10 | 40 |
|
11 | 41 | * We don't want to ultimately use PHP's DOM because it is not tolerant |
12 | 42 | of certain types of errors that HTML 5 allows (for example, an element |
13 | | - "foo@bar"). But for now, we will, since it's much easier. |
| 43 | + "foo@bar"). But the current implementation uses it, since it's easy. |
| 44 | + Eventually, this html5lib implementation will get a version of SimpleTree; |
| 45 | + and may possibly start using that by default. |
14 | 46 |
|
15 | 47 | vim: et sw=4 sts=4 |
0 commit comments