pyscripter python 2 free download

WordCount

Count frequency of single, 2-word and 3-word clusters in a text

The program can read a text file and count the occurrences of single words and clusters of 2 and 3 words. The resulting list will be sorted in descending order (highest frequency on top).

Downloads: 5 This Week

Last Update: 2025-02-01

See Project

Tokenized Text Aligner

Aligns tokens in two versions of a text with differing tokenization.

This tool performs token-by-token alignment of two versions of a text with differing tokenization by interpreting the results of a file diff (https://docs.python.org/3/library/difflib.html). It is intended for use in the preparation of annotated linguistic corpora, where differences in tokenization may arise (i) following corrections or modifications to the source text or (ii) through the creation of different layers of annotation (part-of-speech, treebank) requiring different tokenization...

Downloads: 0 This Week

Last Update: 2024-07-31

See Project

Safe Harbor Deidentification

Safe Harbor Deidentification for medical documents

Phalanx - Deidentify Safe Harbor Deidentification Mode of Phalanx is an abridged pipeline of NLP annotators culminating in NER annotators which write output of text offsets. It uses the Safe Harbor deidentification method.

Downloads: 0 This Week

Last Update: 2019-09-10

See Project

Arabic Corpus

Text categorization, arabic language processing, language modeling

The Arabic Corpus {compiled by Dr. Mourad Abbas ( http://sites.google.com/site/mouradabbas9/corpora ) The corpus Khaleej-2004 contains 5690 documents. It is divided to 4 topics (categories). The corpus Watan-2004 contains 20291 documents organized in 6 topics (categories). Researchers who use these two corpora would mention the two main references: (1) For Watan-2004 corpus ---------------------- M. Abbas, K. Smaili, D. Berkani, (2011) Evaluation of Topic Identification Methods...

Downloads: 3 This Week

Last Update: 2019-03-05

See Project

Automatic Compound Processing (AuCoPro)

Automatic compound splitting and semantic analysis of compounds

... analysis of compounds; as such, the project will be divided into two interrelated subprojects, to be executed simultaneously. The focus in this project will be on Afrikaans (with Dutch as the closely-related, well-sourced language), which will lay grounds for future work on other closely-related language pairs.

Downloads: 3 This Week

Last Update: 2015-07-28

See Project

Corpus redundancy manager

Redundancy due to cut-paste operations in text creates bias in machine learning for NLP. This module takes a directory and produces a subset of the files in that directory (in a list) with an upper bound on similarity between two files.

Downloads: 2 This Week

Last Update: 2014-06-30

See Project

pygermanet

Python API to the german wordnet GermaNet. In the current state this can be only seen as a quickstart-help to access GermaNet. To be honest, this API can't be called API. To use it, you will need access to a licensed copy of GermaNet (version > 5).

Downloads: 0 This Week

Last Update: 2013-04-23

See Project

Search Results for "pyscripter python 2"

Showing 7 open source projects for "pyscripter python 2"

WordCount

Tokenized Text Aligner

Safe Harbor Deidentification

Arabic Corpus

Automatic Compound Processing (AuCoPro)

Corpus redundancy manager

pygermanet

Search Results for "pyscripter python 2"

Showing 7 open source projects for "pyscripter python 2"

WordCount

Tokenized Text Aligner

Safe Harbor Deidentification

Arabic Corpus

Automatic Compound Processing (AuCoPro)

Corpus redundancy manager

pygermanet

Related Searches

Related Categories