You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -109,8 +112,6 @@ Bayesian, statistics and Linguistics approaches for Natural Language Processing
109
112
*[Pattern](http://www.clips.ua.ac.be/pattern) - A web mining module for the Python programming language. It has tools for natural language processing, machine learning, among others.
110
113
*[TextBlob](http://textblob.readthedocs.org/) - Providing a consistent API for diving into common natural language processing (NLP) tasks. Stands on the giant shoulders of NLTK and Pattern, and plays nicely with both.
111
114
*[YAlign](https://github.com/machinalis/yalign) - A sentence aligner, a friendly tool for extracting parallel sentences from comparable corpora.
112
-
*[jieba](https://github.com/fxsjy/jieba#jieba-1) - Chinese Words Segmentation Utilities.
113
-
*[SnowNLP](https://github.com/isnowfy/snownlp) - A library for processing Chinese text.
114
115
*[Rosetta](https://github.com/columbia-applied-data-science/rosetta) - Text processing tools and wrappers (e.g. Vowpal Wabbit)
115
116
*[BLLIP Parser](https://pypi.python.org/pypi/bllipparser/) - Python bindings for the BLLIP Natural Language Parser (also known as the Charniak-Johnson parser)
116
117
*[PyNLPl](https://github.com/proycon/pynlpl) - Python Natural Language Processing Library. General purpose NLP library for Python. Also contains some specific modules for parsing common NLP formats, most notably for [FoLiA](http://proycon.github.io/folia/), but also ARPA language models, Moses phrasetables, GIZA++ alignments.
@@ -416,6 +417,42 @@ Dodge et. al 2015. Tests Memory Networks on 4 tasks including reddit dialog task
416
417
*[Naver Sentiment Movie Corpus in Korean](https://github.com/e9t/nsmc/)
417
418
*[Chosun Ilbo archive](http://srchdb1.chosun.com/pdf/i_archive/) - dataset in Korean from one of the major newspapers in South Korea, the Chosun Ilbo.
418
419
420
+
## NLP in Arabic
421
+
422
+
[Back to Top](#contents)
423
+
424
+
### Libraries
425
+
426
+
*[goarabic](https://github.com/01walid/goarabic)- A Go package for dealing with Arabic text.
427
+
*[jsastem](https://github.com/ejtaal/jsastem) - An Arabic stemmer package in Javascript
428
+
*[PyArabic](https://pypi.python.org/pypi/PyArabic/0.4) - Arabic text tools for Python
429
+
430
+
### Datasets
431
+
432
+
*[LABR](https://github.com/mohamedadaly/labr) - LArge Arabic Book Reviews dataset
433
+
*[Arabic Stopwords](https://github.com/mohataher/arabic-stop-words) - A list of Arabic stopwords from various resources
434
+
435
+
## NLP in Chinese
436
+
437
+
[Back to Top](#contents)
438
+
439
+
### Libraries
440
+
441
+
*[jieba](https://github.com/fxsjy/jieba#jieba-1) - A Chinese Words Segmentation Utilities library in Python
442
+
*[SnowNLP](https://github.com/isnowfy/snownlp) - A library for processing Chinese text in Python
443
+
*[FudanNLP](https://github.com/FudanNLP/fnlp)- A Java library for Chinese text processing.
444
+
445
+
## NLP in Spanish
446
+
447
+
[Back to Top](#contents)
448
+
449
+
## Corpora
450
+
451
+
*[Columbian Political Speeches](https://github.com/dav009/LatinamericanTextResources)
0 commit comments