Skip to content

Commit 3120f41

Browse files
authored
Merge pull request keon#111 from the-ethan-hunt/master
Added NLP sections for Arabic, Chinese and Spanish
2 parents 51461c4 + 4fbd303 commit 3120f41

File tree

1 file changed

+39
-2
lines changed

1 file changed

+39
-2
lines changed

README.md

Lines changed: 39 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,9 @@ Please feel free to create [pull requests](https://github.com/keonkim/awesome-nl
4141
- [Text Classification](#text-classification)
4242
- [Datasets](#datasets)
4343
- [NLP in Korean](#nlp-in-korean)
44+
- [NLP in Arabic](#nlp-in-arabic)
45+
- [NLP in Chinese](#nlp-in-chinese)
46+
- [NLP in Spanish](#nlp-in-spanish)
4447
- [Credits](#credits)
4548

4649

@@ -109,8 +112,6 @@ Bayesian, statistics and Linguistics approaches for Natural Language Processing
109112
* [Pattern](http://www.clips.ua.ac.be/pattern) - A web mining module for the Python programming language. It has tools for natural language processing, machine learning, among others.
110113
* [TextBlob](http://textblob.readthedocs.org/) - Providing a consistent API for diving into common natural language processing (NLP) tasks. Stands on the giant shoulders of NLTK and Pattern, and plays nicely with both.
111114
* [YAlign](https://github.com/machinalis/yalign) - A sentence aligner, a friendly tool for extracting parallel sentences from comparable corpora.
112-
* [jieba](https://github.com/fxsjy/jieba#jieba-1) - Chinese Words Segmentation Utilities.
113-
* [SnowNLP](https://github.com/isnowfy/snownlp) - A library for processing Chinese text.
114115
* [Rosetta](https://github.com/columbia-applied-data-science/rosetta) - Text processing tools and wrappers (e.g. Vowpal Wabbit)
115116
* [BLLIP Parser](https://pypi.python.org/pypi/bllipparser/) - Python bindings for the BLLIP Natural Language Parser (also known as the Charniak-Johnson parser)
116117
* [PyNLPl](https://github.com/proycon/pynlpl) - Python Natural Language Processing Library. General purpose NLP library for Python. Also contains some specific modules for parsing common NLP formats, most notably for [FoLiA](http://proycon.github.io/folia/), but also ARPA language models, Moses phrasetables, GIZA++ alignments.
@@ -416,6 +417,42 @@ Dodge et. al 2015. Tests Memory Networks on 4 tasks including reddit dialog task
416417
* [Naver Sentiment Movie Corpus in Korean](https://github.com/e9t/nsmc/)
417418
* [Chosun Ilbo archive](http://srchdb1.chosun.com/pdf/i_archive/) - dataset in Korean from one of the major newspapers in South Korea, the Chosun Ilbo.
418419

420+
## NLP in Arabic
421+
422+
[Back to Top](#contents)
423+
424+
### Libraries
425+
426+
* [goarabic](https://github.com/01walid/goarabic)- A Go package for dealing with Arabic text.
427+
* [jsastem](https://github.com/ejtaal/jsastem) - An Arabic stemmer package in Javascript
428+
* [PyArabic](https://pypi.python.org/pypi/PyArabic/0.4) - Arabic text tools for Python
429+
430+
### Datasets
431+
432+
* [LABR](https://github.com/mohamedadaly/labr) - LArge Arabic Book Reviews dataset
433+
* [Arabic Stopwords](https://github.com/mohataher/arabic-stop-words) - A list of Arabic stopwords from various resources
434+
435+
## NLP in Chinese
436+
437+
[Back to Top](#contents)
438+
439+
### Libraries
440+
441+
* [jieba](https://github.com/fxsjy/jieba#jieba-1) - A Chinese Words Segmentation Utilities library in Python
442+
* [SnowNLP](https://github.com/isnowfy/snownlp) - A library for processing Chinese text in Python
443+
* [FudanNLP](https://github.com/FudanNLP/fnlp)- A Java library for Chinese text processing.
444+
445+
## NLP in Spanish
446+
447+
[Back to Top](#contents)
448+
449+
## Corpora
450+
451+
* [Columbian Political Speeches](https://github.com/dav009/LatinamericanTextResources)
452+
* [Copenhagen Treebank](http://code.google.com/p/copenhagen-dependency-treebank/)
453+
* [Reuters Corpora RCV2](http://trec.nist.gov/data/reuters/reuters.html)
454+
* [Spanish Billion words corpus with Word2Vec embeddings](http://crscardellino.me/SBWCE/)
455+
419456
## Credits
420457
Awesome NLP was seeded with curated content from the lot of repositories, some of which are listed below | [Back to Top](#contents)
421458
* [ai-reading-list](https://github.com/m0nologuer/AI-reading-list)

0 commit comments

Comments
 (0)