KAF2Tiger2 Code
Status: Alpha
Brought to you by:
rentier
File | Date | Author | Commit |
---|---|---|---|
examples | 2011-03-16 | rentier | [r9] added two Italian examples |
src | 2011-03-01 | rentier | [r8] FIX bug in English example (wrong date format) |
validation | 2011-02-28 | rentier | [r6] Updated to a slightly modified kaf-0.6 version |
COPYING | 2011-02-15 | rentier | [r1] Initial import: kaf2tiger2-0.0.2 |
Changelog | 2011-03-16 | rentier | [r9] added two Italian examples |
README | 2011-02-15 | rentier | [r1] Initial import: kaf2tiger2-0.0.2 |
KAF to <Tiger2/> documentation ============================== KAF2Tiger2 converts linguistically annotated texts in KYOTO annotation format (KAF) into the <Tiger2/> XML format. KAF2Tiger2 is developed as part of the KYOTO project (part of the EU-FP7 ICT work programme 2007). Requirements ------------ KAF2Tiger2 is written in Python. You will need a Python installation from the 2.5 branch or newer. Older versions might work as well but are not tested. Python 3.0 is not actively supported. You will need to have the lxml XML parsing library installed (http://codespeak.net/lxml). KAF in short ------------ * multi layer stand-off linguistic annotation framework * layers used in KYOTO (denoted by tag names): - text (token layer (word form layer), primary data, including sentence and page number) - terms (term/multiword layer) - deps (syntactic layer on top of terms) - chunks (syntactic layer on top of terms) - tunits (?) - locations (semantic layer) - dates (semantic layer) In KAF, primary data representation is already abstracted into a token sequence. Therefore I chose the inline representation for the <tiger2/> serialization. Every KAF-term thus becomes a terminal node in <tiger2/>.