Open Source Python Natural Language Processing (NLP) Tools

Python Natural Language Processing (NLP) Tools

View 189 business solutions

Browse free open source Python Natural Language Processing (NLP) Tools and projects below. Use the toggles on the left to filter open source Python Natural Language Processing (NLP) Tools by OS, license, language, programming language, and project status.

  • Gen AI apps are built with MongoDB Atlas Icon
    Gen AI apps are built with MongoDB Atlas

    Build gen AI apps with an all-in-one modern database: MongoDB Atlas

    MongoDB Atlas provides built-in vector search and a flexible document model so developers can build, scale, and run gen AI apps without stitching together multiple databases. From LLM integration to semantic search, Atlas simplifies your AI architecture—and it’s free to get started.
    Start Free
  • No-Nonsense Code-to-Cloud Security for Devs | Aikido Icon
    No-Nonsense Code-to-Cloud Security for Devs | Aikido

    Connect your GitHub, GitLab, Bitbucket, or Azure DevOps account to start scanning your repos for free.

    Aikido provides a unified security platform for developers, combining 12 powerful scans like SAST, DAST, and CSPM. AI-driven AutoFix and AutoTriage streamline vulnerability management, while runtime protection blocks attacks.
    Start for Free
  • 1
    funNLP

    funNLP

    Resources, corpora, and tools for Chinese natural language processing

    FunNLP is a large, curated collection of resources, corpora, and tools for Chinese natural language processing (NLP). It aggregates datasets, lexicons, wordlists, sentiment dictionaries, knowledge graphs, and pretrained model references, serving as a one-stop resource hub for Chinese NLP practitioners. The repository is organized into categories such as sentiment analysis, text classification, named entity recognition, knowledge graphs, and various lexicons (e.g. sensitive words, emotion dictionaries, stopwords). It also includes links to academic papers, open-source model implementations, and practical utilities like word segmentation or text cleaning scripts. The project is highly community-oriented, frequently updated with contributions and new resources, and it’s widely used in both academic and applied NLP research. Its value lies in providing not just tools but also curated, domain-specific data, which can be hard to find elsewhere.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 2
    AllenNLP

    AllenNLP

    An open-source NLP research library, built on PyTorch

    AllenNLP makes it easy to design and evaluate new deep learning models for nearly any NLP problem, along with the infrastructure to easily run them in the cloud or on your laptop. AllenNLP includes reference implementations of high quality models for both core NLP problems (e.g. semantic role labeling) and NLP applications (e.g. textual entailment). AllenNLP supports loading "plugins" dynamically. A plugin is just a Python package that provides custom registered classes or additional allennlp subcommands. There is ecosystem of open source plugins, some of which are maintained by the AllenNLP team here at AI2, and some of which are maintained by the broader community. AllenNLP will automatically find any official AI2-maintained plugins that you have installed, but for AllenNLP to find personal or third-party plugins you've installed, you also have to create either a local plugins file named .allennlp_plugins in the directory where you run the allennlp command.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 3
    VADER

    VADER

    Lexicon and rule-based sentiment analysis tool

    VADER (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon and rule-based sentiment analysis tool designed for analyzing the sentiment of text, particularly in social media and short text formats. It is optimized for quick and accurate analysis of positive, negative, and neutral sentiments.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 4
    GT NLP Class

    GT NLP Class

    Course materials for Georgia Tech CS 4650 and 7650

    This repository contains lecture notes, slides, assignments, and code for a university-level Natural Language Processing course. It spans core NLP topics such as language modeling, sequence tagging, parsing, semantics, and discourse, alongside modern machine learning methods used to solve them. Students work through programming exercises and problem sets that build intuition for both classical algorithms (like HMMs and CRFs) and neural approaches (like word embeddings and sequence models). The materials emphasize theory grounded in practical experimentation, often via Python notebooks or scripts that visualize results and encourage ablation studies. Clear organization and self-contained examples make it possible to follow along outside the classroom, using the repo as a self-study resource. For learners and instructors alike, the course provides a coherent path from foundational linguistics to current techniques, with reproducible code that makes concepts concrete.
    Downloads: 4 This Week
    Last Update:
    See Project
  • Deliver secure remote access with OpenVPN. Icon
    Deliver secure remote access with OpenVPN.

    Trusted by nearly 20,000 customers worldwide, and all major cloud providers.

    OpenVPN's products provide scalable, secure remote access — giving complete freedom to your employees to work outside the office while securely accessing SaaS, the internet, and company resources.
    Get started — no credit card required.
  • 5
    Machine Learning PyTorch Scikit-Learn

    Machine Learning PyTorch Scikit-Learn

    Code Repository for Machine Learning with PyTorch and Scikit-Learn

    Initially, this project started as the 4th edition of Python Machine Learning. However, after putting so much passion and hard work into the changes and new topics, we thought it deserved a new title. So, what’s new? There are many contents and additions, including the switch from TensorFlow to PyTorch, new chapters on graph neural networks and transformers, a new section on gradient boosting, and many more that I will detail in a separate blog post. For those who are interested in knowing what this book covers in general, I’d describe it as a comprehensive resource on the fundamental concepts of machine learning and deep learning. The first half of the book introduces readers to machine learning using scikit-learn, the defacto approach for working with tabular datasets. Then, the second half of this book focuses on deep learning, including applications to natural language processing and computer vision.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 6
    Underthesea

    Underthesea

    Underthesea - Vietnamese NLP Toolkit

    Underthesea is a Vietnamese NLP toolkit providing various text processing capabilities, including word segmentation, part-of-speech tagging, and named entity recognition.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 7
    spaCy

    spaCy

    Industrial-strength Natural Language Processing (NLP)

    spaCy is a library built on the very latest research for advanced Natural Language Processing (NLP) in Python and Cython. Since its inception it was designed to be used for real world applications-- for building real products and gathering real insights. It comes with pretrained statistical models and word vectors, convolutional neural network models, easy deep learning integration and so much more. spaCy is the fastest syntactic parser in the world according to independent benchmarks, with an accuracy within 1% of the best available. It's blazing fast, easy to install and comes with a simple and productive API.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 8
    Ciphey

    Ciphey

    Decrypt encryptions without knowing the key or cipher

    Fully automated decryption/decoding/cracking tool using natural language processing & artificial intelligence, along with some common sense. You don't know, you just know it's possibly encrypted. Ciphey will figure it out for you. Ciphey can solve most things in 3 seconds or less. Ciphey aims to be a tool to automate a lot of decryptions & decodings such as multiple base encodings, classical ciphers, hashes or more advanced cryptography. If you don't know much about cryptography, or you want to quickly check the ciphertext before working on it yourself, Ciphey is for you. The technical part. Ciphey uses a custom-built artificial intelligence module (AuSearch) with a Cipher Detection Interface to approximate what something is encrypted with. And then a custom-built, customizable natural language processing Language Checker Interface, which can detect when the given text becomes plaintext.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 9
    Haystack

    Haystack

    Haystack is an open source NLP framework to interact with your data

    Apply the latest NLP technology to your own data with the use of Haystack's pipeline architecture. Implement production-ready semantic search, question answering, summarization and document ranking for a wide range of NLP applications. Evaluate components and fine-tune models. Ask questions in natural language and find granular answers in your documents using the latest QA models with the help of Haystack pipelines. Perform semantic search and retrieve ranked documents according to meaning, not just keywords! Make use of and compare the latest pre-trained transformer-based languages models like OpenAI’s GPT-3, BERT, RoBERTa, DPR, and more. Pick any Transformer model from Hugging Face's Model Hub, experiment, find the one that works. Use Haystack NLP components on top of Elasticsearch, OpenSearch, or plain SQL. Boost search performance with Pinecone, Milvus, FAISS, or Weaviate vector databases, and dense passage retrieval.
    Downloads: 2 This Week
    Last Update:
    See Project
  • Photo and Video Editing APIs and SDKs Icon
    Photo and Video Editing APIs and SDKs

    Trusted by 150 million+ creators and businesses globally

    Unlock Picsart's full editing suite by embedding our Editor SDK directly into your platform. Offer your users the power of a full design suite without leaving your site.
    Learn More
  • 10
    SciSpaCy

    SciSpaCy

    A full spaCy pipeline and models for scientific/biomedical documents

    ScispaCy is a spaCy extension optimized for processing biomedical and scientific text, providing domain-specific NLP models for tasks like named entity recognition (NER) and dependency parsing.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 11
    deepdoctection

    deepdoctection

    A Repo For Document AI

    DeepDoctection is a document AI framework that applies deep learning techniques to analyze and extract structured data from scanned documents, PDFs, and images. deepdoctection is a Python library that orchestrates document extraction and document layout analysis tasks using deep learning models. It does not implement models but enables you to build pipelines using highly acknowledged libraries for object detection, OCR and selected NLP tasks and provides an integrated frameworks for fine-tuning, evaluating and running models. For more specific text processing tasks use one of the many other great NLP libraries.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 12
    Awesome Fraud Detection Research Papers

    Awesome Fraud Detection Research Papers

    A curated list of data mining papers about fraud detection

    A curated list of data mining papers about fraud detection from several conferences.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 13
    Datasets

    Datasets

    Hub of ready-to-use datasets for ML models

    Datasets is a library for easily accessing and sharing datasets, and evaluation metrics for Natural Language Processing (NLP), computer vision, and audio tasks. Load a dataset in a single line of code, and use our powerful data processing methods to quickly get your dataset ready for training in a deep learning model. Backed by the Apache Arrow format, process large datasets with zero-copy reads without any memory constraints for optimal speed and efficiency. We also feature a deep integration with the Hugging Face Hub, allowing you to easily load and share a dataset with the wider NLP community. There are currently over 2658 datasets, and more than 34 metrics available. Datasets naturally frees the user from RAM memory limitation, all datasets are memory-mapped using an efficient zero-serialization cost backend (Apache Arrow). Smart caching: never wait for your data to process several times.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 14
    DeepLearning

    DeepLearning

    Deep Learning (Flower Book) mathematical derivation

    " Deep Learning " is the only comprehensive book in the field of deep learning. The full name is also called the Deep Learning AI Bible (Deep Learning) . It is edited by three world-renowned experts, Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Includes linear algebra, probability theory, information theory, numerical optimization, and related content in machine learning. At the same time, it also introduces deep learning techniques used by practitioners in the industry, including deep feedforward networks, regularization, optimization algorithms, convolutional networks, sequence modeling and practical methods, and investigates topics such as natural language processing, Applications in speech recognition, computer vision, online recommender systems, bioinformatics, and video games. Finally, the Deep Learning book provides research directions covering theoretical topics including linear factor models, autoencoders, representation learning, structured probabilistic models, etc.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 15
    DeepPavlov

    DeepPavlov

    A library for deep learning end-to-end dialog systems and chatbots

    DeepPavlov makes it easy for beginners and experts to create dialogue systems. The best place to start is with user-friendly tutorials. They provide quick and convenient introduction on how to use DeepPavlov with complete, end-to-end examples. No installation needed. Guides explain the concepts and components of DeepPavlov. Follow step-by-step instructions to install, configure and extend DeepPavlov framework for your use case. DeepPavlov is an open-source framework for chatbots and virtual assistants development. It has comprehensive and flexible tools that let developers and NLP researchers create production-ready conversational skills and complex multi-skill conversational assistants. Use BERT and other state-of-the-art deep learning models to solve classification, NER, Q&A and other NLP tasks. DeepPavlov Agent allows building industrial solutions with multi-skill integration via API services.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 16
    Diffgram

    Diffgram

    Training data (data labeling, annotation, workflow) for all data types

    From ingesting data to exploring it, annotating it, and managing workflows. Diffgram is a single application that will improve your data labeling and bring all aspects of training data under a single roof. Diffgram is world’s first truly open source training data platform that focuses on giving its users an unlimited experience. This is aimed to reduce your data labeling bills and increase your Training Data Quality. Training Data is the art of supervising machines through data. This includes the activities of annotation, which produces structured data; ready to be consumed by a machine learning model. Annotation is required because raw media is considered to be unstructured and not usable without it. That’s why training data is required for many modern machine learning use cases including computer vision, natural language processing and speech recognition.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 17
    Keras Hub

    Keras Hub

    Pretrained model hub for Keras 3

    Keras Hub is a repository of pre-trained models for Keras 3, offering a collection of ready-to-use models for various machine-learning tasks. KerasHub is an extension of the core Keras API; KerasHub components are provided as Layer and Model implementations. If you are familiar with Keras, congratulations. You already understand most of KerasHub.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 18
    KoNLPy

    KoNLPy

    Python package for Korean natural language processing

    KoNLPy is a natural language processing (NLP) library for the Korean language, offering tokenization, morphological analysis, and named entity recognition.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 19
    ModelScope

    ModelScope

    Bring the notion of Model-as-a-Service to life

    ModelScope is built upon the notion of “Model-as-a-Service” (MaaS). It seeks to bring together most advanced machine learning models from the AI community, and streamlines the process of leveraging AI models in real-world applications. The core ModelScope library open-sourced in this repository provides the interfaces and implementations that allow developers to perform model inference, training and evaluation. In particular, with rich layers of API abstraction, the ModelScope library offers unified experience to explore state-of-the-art models spanning across domains such as CV, NLP, Speech, Multi-Modality, and Scientific-computation. Model contributors of different areas can integrate models into the ModelScope ecosystem through the layered APIs, allowing easy and unified access to their models. Once integrated, model inference, fine-tuning, and evaluations can be done with only a few lines of code.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 20
    NLG-Eval

    NLG-Eval

    Evaluation code for various unsupervised automated metrics

    NLG-Eval is a toolkit for evaluating the quality of natural language generation (NLG) outputs using multiple automated metrics such as BLEU, METEOR, and ROUGE.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 21
    Seq2Seq Chatbot

    Seq2Seq Chatbot

    Chatbot in 200 lines of code using TensorLayer

    Seq2Seq Chatbot is an implementation of a sequence-to-sequence chatbot model using TensorLayer, demonstrating how to build conversational agents with minimal code.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 22
    Stanza

    Stanza

    Stanford NLP Python library for many human languages

    Stanza is a collection of accurate and efficient tools for the linguistic analysis of many human languages. Starting from raw text to syntactic analysis and entity recognition, Stanza brings state-of-the-art NLP models to languages of your choosing. Stanza is a Python natural language analysis package. It contains tools, which can be used in a pipeline, to convert a string containing human language text into lists of sentences and words, to generate base forms of those words, their parts of speech and morphological features, to give a syntactic structure dependency parse, and to recognize named entities. The toolkit is designed to be parallel among more than 70 languages, using the Universal Dependencies formalism. Stanza is built with highly accurate neural network components that also enable efficient training and evaluation with your own annotated data.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 23
    txtai

    txtai

    Build AI-powered semantic search applications

    txtai executes machine-learning workflows to transform data and build AI-powered semantic search applications. Traditional search systems use keywords to find data. Semantic search applications have an understanding of natural language and identify results that have the same meaning, not necessarily the same keywords. Backed by state-of-the-art machine learning models, data is transformed into vector representations for search (also known as embeddings). Innovation is happening at a rapid pace, models can understand concepts in documents, audio, images and more. Machine-learning pipelines to run extractive question-answering, zero-shot labeling, transcription, translation, summarization and text extraction. Cloud-native architecture that scales out with container orchestration systems (e.g. Kubernetes). Applications range from similarity search to complex NLP-driven data extractions to generate structured databases. The following applications are powered by txtai.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 24

    Arabic Corpus

    Text categorization, arabic language processing, language modeling

    The Arabic Corpus {compiled by Dr. Mourad Abbas ( http://sites.google.com/site/mouradabbas9/corpora ) The corpus Khaleej-2004 contains 5690 documents. It is divided to 4 topics (categories). The corpus Watan-2004 contains 20291 documents organized in 6 topics (categories). Researchers who use these two corpora would mention the two main references: (1) For Watan-2004 corpus ---------------------- M. Abbas, K. Smaili, D. Berkani, (2011) Evaluation of Topic Identification Methods on Arabic Corpora,JOURNAL OF DIGITAL INFORMATION MANAGEMENT,vol. 9, N. 5, pp.185-192. 2) For Khaleej-2004 corpus --------------------------------- M. Abbas, K. Smaili (2005) Comparison of Topic Identification Methods for Arabic Language, RANLP05 : Recent Advances in Natural Language Processing ,pp. 14-17, 21-23 september 2005, Borovets, Bulgary. More useful references to check: ------------------------------------------- https://sites.google.com/site/mouradabbas9/corpora
    Leader badge
    Downloads: 10 This Week
    Last Update:
    See Project
  • 25

    BioC

    We describe a simple XML format to share text documents and annotation

    A minimalist approach to share text documents and data annotations. Allows a large number of different annotations to be represented. Project files contain: - simple code to hold/read/write data and perform sample processing. - BioC-formatted corpora - BioC tools that work with BioC corpora BioC goals - simplicity - interoperability - broad use - reuse There should be little investment required to learn to use a format or a software module to process that format. We are interested in reuse, and we focus on common NLP tasks that are broadly useful for textmining.
    Leader badge
    Downloads: 5 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next
Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.