Skip to content

Commit 16862d3

Browse files
authored
Formatting fix in Python; edit Chinese
- Remove redundancy in Chinese links - Formatting fixes in Python section - Move languages to their own separate section
1 parent ad1e661 commit 16862d3

File tree

1 file changed

+18
-20
lines changed

1 file changed

+18
-20
lines changed

README.md

Lines changed: 18 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,7 @@ Please feel free to create [pull requests](https://github.com/keonkim/awesome-nl
4444
- [NLP in Arabic](#nlp-in-arabic)
4545
- [NLP in Chinese](#nlp-in-chinese)
4646
- [NLP in Spanish](#nlp-in-spanish)
47+
- [Other Languages](#other-languages)
4748
- [Credits](#credits)
4849

4950

@@ -112,26 +113,17 @@ Note: :v: Recommended packages
112113

113114
Packages marked by :v: are popular and used in production grade systems by atleast one maintainer of this repository or people they respect
114115

115-
* [TextBlob](http://textblob.readthedocs.org/) - Providing a consistent API for diving into common natural language processing (NLP) tasks. Stands on the giant shoulders of [Natural Language Toolkit (NLTK)](http://www.nltk.org/) and [Pattern](https://github.com/clips/pattern), and plays nicely with both :v:
116-
* [spaCy](https://github.com/spacy-io/spaCy) - Industrial strength NLP with Python and Cython :v:
117-
* [textacy](https://github.com/chartbeat-labs/textacy) - Higher level NLP built on spaCy :v:
118-
* [gensim](https://radimrehurek.com/gensim/index.html) - Python library to conduct unsupervised semantic modelling from plain text :v:
119-
* [scattertext](https://github.com/JasonKessler/scattertext) - Python library to produce d3 visualizations of how language differs between corpora :v:
120-
* [AllenNLP](https://github.com/allenai/allennlp) - An NLP research library, built on PyTorch, for developing state-of-the-art deep learning models on a wide variety of linguistic tasks.
121-
* [Rosetta](https://github.com/columbia-applied-data-science/rosetta) - Text processing tools and wrappers (e.g. Vowpal Wabbit)
122-
* [PyNLPl](https://github.com/proycon/pynlpl) - Python Natural Language Processing Library. General purpose NLP library for Python. Also contains some specific modules for parsing common NLP formats, most notably for [FoLiA](http://proycon.github.io/folia/), but also ARPA language models, Moses phrasetables, GIZA++ alignments.
123-
* [jPTDP](https://github.com/datquocnguyen/jPTDP) - A toolkit for joint part-of-speech (POS) tagging and dependency parsing. jPTDP provides pre-trained models for 40+ languages.
124-
* [BigARTM](https://github.com/bigartm/bigartm) - a fast library for topic modelling
125-
126-
* Language Specific Tools
127-
* Chinese: [YAlign](https://github.com/machinalis/yalign) - A sentence aligner, a friendly tool for extracting parallel sentences from comparable corpora
128-
* Chinese: [SnowNLP](https://github.com/isnowfy/snownlp) - A library for processing Chinese text
129-
* Chinese: [jieba](https://github.com/fxsjy/jieba#jieba-1) - Chinese Words Segmentation Utilities.
130-
* Russian: [pymorphy2](https://github.com/kmike/pymorphy2) - a good pos-tagger for Russian
131-
* Thai: [PyThaiNLP](https://github.com/wannaphongcom/pythainlp) - Thai NLP in Python Package
132-
* Ancient Languages: [CLTK](https://github.com/cltk/cltk): The Classical Language Toolkit is a Python library and collection of texts for doing NLP in ancient languages
133-
* Dutch: [python-frog](https://github.com/proycon/python-frog) - Python binding to Frog, an NLP suite for Dutch. (pos tagging, lemmatisation, dependency parsing, NER)
134-
116+
* [TextBlob](http://textblob.readthedocs.org/) - Providing a consistent API for diving into common natural language processing (NLP) tasks. Stands on the giant shoulders of [Natural Language Toolkit (NLTK)](http://www.nltk.org/) and [Pattern](https://github.com/clips/pattern), and plays nicely with both :v:
117+
* [spaCy](https://github.com/spacy-io/spaCy) - Industrial strength NLP with Python and Cython :v:
118+
* [textacy](https://github.com/chartbeat-labs/textacy) - Higher level NLP built on spaCy :v:
119+
* [gensim](https://radimrehurek.com/gensim/index.html) - Python library to conduct unsupervised semantic modelling from plain text :v:
120+
* [scattertext](https://github.com/JasonKessler/scattertext) - Python library to produce d3 visualizations of how language differs between corpora :v:
121+
* [AllenNLP](https://github.com/allenai/allennlp) - An NLP research library, built on PyTorch, for developing state-of-the-art deep learning models on a wide variety of linguistic tasks.
122+
* [Rosetta](https://github.com/columbia-applied-data-science/rosetta) - Text processing tools and wrappers (e.g. Vowpal Wabbit)
123+
* [PyNLPl](https://github.com/proycon/pynlpl) - Python Natural Language Processing Library. General purpose NLP library for Python. Also contains some specific modules for parsing common NLP formats, most notably for [FoLiA](http://proycon.github.io/folia/), but also ARPA language models, Moses phrasetables, GIZA++ alignments.
124+
* [jPTDP](https://github.com/datquocnguyen/jPTDP) - A toolkit for joint part-of-speech (POS) tagging and dependency parsing. jPTDP provides pre-trained models for 40+ languages.
125+
* [BigARTM](https://github.com/bigartm/bigartm) - a fast library for topic modelling
126+
135127

136128
* <a id="c++">**C++** - C++ Libraries</a> | [Back to Top](#contents)
137129
* [MIT Information Extraction Toolkit](https://github.com/mit-nlp/MITIE) - C, C++, and Python tools for named entity recognition and relation extraction
@@ -453,6 +445,12 @@ Dodge et. al 2015. Tests Memory Networks on 4 tasks including reddit dialog task
453445
* [Reuters Corpora RCV2](http://trec.nist.gov/data/reuters/reuters.html)
454446
* [Spanish Billion words corpus with Word2Vec embeddings](http://crscardellino.me/SBWCE/)
455447

448+
### Other Languages
449+
* Russian: [pymorphy2](https://github.com/kmike/pymorphy2) - a good pos-tagger for Russian
450+
* Thai: [PyThaiNLP](https://github.com/wannaphongcom/pythainlp) - Thai NLP in Python Package
451+
* Ancient Languages: [CLTK](https://github.com/cltk/cltk): The Classical Language Toolkit is a Python library and collection of texts for doing NLP in ancient languages
452+
* Dutch: [python-frog](https://github.com/proycon/python-frog) - Python binding to Frog, an NLP suite for Dutch. (pos tagging, lemmatisation, dependency parsing, NER)
453+
456454
## Credits
457455
Awesome NLP was seeded with curated content from the lot of repositories, some of which are listed below | [Back to Top](#contents)
458456
* [ai-reading-list](https://github.com/m0nologuer/AI-reading-list)

0 commit comments

Comments
 (0)