Skip to content

rsadaphule/nlp

 
 

Repository files navigation

NLP Best Practices

This repository contains examples and best practices for building natural language processing (NLP) systems, provided as Jupyter notebooks and utility functions. The focus of the repository is on state-of-the-art methods and common scenarios that are popular among researchers and practitioners working on problems involving text and language.

Overview

The goal of this repository is to build a comprehensive set of tools and examples that leverage recent advances in NLP algorithms, neural architectures, and distributed machine learning systems. The content is based on our past and potential future engagements with customers as well as collaboration with partners, researchers, and the open source community.

We’re hoping that the tools would significantly reduce the time from a business problem, or a research idea, to full implementation of a system. In addition, the example notebooks would serve as guidelines and showcase best practices and usage of the tools.

In an era of transfer learning, transformers, and deep architectures, we believe that pretrained models provide a unified solution to many real-world problems and allow handling different tasks and languages easily. We will, therefore, prioritize such models, as they achieve state-of-the-art results on several NLP benchmarks and can be used in a number of applications ranging from simple text classification to sophisticated intelligent chat bots.

GLUE Leaderboard
SQuAD Leaderbord

Content

The following is a summary of the scenarios covered in the repository. Each scenario is demonstrated in one or more Jupyter notebook examples that make use of the core code base of models and utilities.

Scenario Applications Models
Text Classification Topic Classification BERT
Named Entity Recognition Wikipedia NER BERT
Entailment XNLI Natural Language Inference BERT
Question Answering SQuAD BiDAF
Sentence Similarity STS Benchmark Representation: TF-IDF, Word Embeddings, Doc Embeddings
Metrics: Cosine Similarity, Word Mover's Distance
Embeddings Custom Embeddings Training Word2Vec
fastText
GloVe

Getting Started

To get started, navigate to the Setup Guide, where you'll find instructions on how to setup your environment and dependencies.

Contributing

This project welcomes contributions and suggestions. Before contributing, please see our contribution guidelines.

Build Status

Build Type Branch Status
Linux CPU master Build Status
Linux GPU master Build Status

About

Natural Language Processing Best Practices & Examples

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 63.5%
  • Jupyter Notebook 24.9%
  • C 10.1%
  • Other 1.5%