A new morphological analyser that considers semantic plausibility of word sequences by using a recurrent neural network language model (RNNLM).
- OS: Linux (tested on CentOS 6.7, Ubuntu 16.4)
- gcc (4.9 or later)
- Boost C++ Libraries (1.57 or later)
http://www.boost.org/users/history/version_1_57_0.html []( for unordered_map, interprocess(dynamic loading)) - gperftool (optional)
https://github.com/gperftools/gperftools
- libunwind (required by gperftool in 64bit environment) http://download.savannah.gnu.org/releases/libunwind/libunwind-0.99-beta.tar.gz Note contains instruction for install libraries.
% git clone [email protected]:ku-nlp/jumanpp.git
% wget http://lotus.kuee.kyoto-u.ac.jp/nl-resource/jumanpp/jumanpp-1.01.tar.xz
(Since this repository does not include resource files, it needs to copy the files
from ditributed package.)
% tar xJvf jumanpp-1.01.tar.xz
% mv jumanpp-1.01/jumanpp-resource jumanpp/.
% cd jumanpp
% ./autogen.sh
% ./configure
% make
% sudo make install
% wget http://lotus.kuee.kyoto-u.ac.jp/nl-resource/jumanpp/jumanpp-1.01.tar.xz
% tar xJvf jumanpp-1.01.tar.xz
% cd jumanpp-1.01
% ./configure
% make
% sudo make install
% echo "魅力がたっぷりと詰まっている" | jumanpp
魅力 みりょく 魅力 名詞 6 普通名詞 1 * 0 * 0 "代表表記:魅力/みりょく カテゴリ:抽象物"
が が が 助詞 9 格助詞 1 * 0 * 0 NIL
たっぷり たっぷり たっぷり 副詞 8 * 0 * 0 * 0 "自動認識"
と と と 助詞 9 格助詞 1 * 0 * 0 NIL
詰まって つまって 詰まる 動詞 2 * 0 子音動詞ラ行 10 タ系連用テ形 14 "代表表記:詰まる/つまる ドメイン:料理・食事 自他動詞:他:詰める/つめる"
いる いる いる 接尾辞 14 動詞性接尾辞 7 母音動詞 1 基本形 2 "代表表記:いる/いる"
EOS
usage: jumanpp [options]
options:
-s, --specifics lattice format output (unsigned int [=5])
-B, --beam set beam width used in analysis (unsigned int [=5])
-v, --version print version
-h, --help print this message
It receives utf-8 encoding text as an input.
Lines beginning with #
will be interpreted as comment line.
See ``Morphological Analysis for Unsegmented Languages using Recurrent Neural Network Language Model. Hajime Morita, Daisuke Kawahara, Sadao Kurohashi. EMNLP 2015'' link
Hajime Morita [email protected]
Daisuke Kawahara [email protected]
Sadao Kurohashi [email protected]
Juman++ uses the following open source software/codes:
- faster-rnnlm for training RNNLM and reading models.
- RNNLM-toolkit for reading RNNLMs.
- Darts for Double-Array.
- tinycdb for reading CDB.
- cmdline by Hideyuki Tanaka to parse command line options.
wget http://download.savannah.gnu.org/releases/libunwind/libunwind-0.99-beta.tar.gz
tar xzf libunwind-0.99-beta.tar.gz
cd libunwind-0.99-beta
./configure --prefix=/somewhere/local/
make
make install
./configure --prefx=/somewhere/local/
make
# When ld try to link libunwind.so (and failed to build),
# please set an option "UNWIND_LIBS=-lunwind-x86_64" to make.
make install
sh bootstrap.sh
./b2 install -j2 --prefix=/somewhere/local/