- All languages
- Ada
- Awk
- C
- C#
- C++
- CSS
- Clojure
- CoffeeScript
- Cuda
- Cython
- D
- F#
- Go
- HTML
- Handlebars
- Java
- JavaScript
- Jinja
- Jsonnet
- Julia
- Jupyter Notebook
- Kotlin
- LookML
- Lua
- MATLAB
- MDX
- Markdown
- Objective-C
- OpenEdge ABL
- PHP
- Perl
- Prolog
- Python
- R
- Ruby
- Rust
- Scala
- Shell
- Swift
- TeX
- Twig
- TypeScript
- Vue
- Web Ontology Language
Starred repositories
The official home of the Presto distributed SQL query engine for big data
OpenRefine is a free, open source power tool for working with messy data and improving it
Apache Beam is a unified programming model for Batch and Streaming data processing.
Upserts, Deletes And Incremental Processing on Big Data.
A machine learning software for extracting information from scholarly documents
Alink is the Machine Learning algorithm platform based on Flink, developed by the PAI team of Alibaba computing platform.
Anthelion is a plugin for Apache Nutch to crawl semantic annotations within HTML pages.
Apache Drill is a distributed MPP query layer for self describing data
Plugin to integrate Learning to Rank (aka machine learning for better relevance) with Elasticsearch
MacroBase: A Search Engine for Fast Data
A Question Answering system built on top of the Apache UIMA framework.
Generating Vectors for DBpedia Entities via Word2Vec and Wikipedia Dumps. Questions? https://gitter.im/idio-opensource/Lobby
CogComp's Natural Language Processing Libraries and Demos: Modules include lemmatizer, ner, pos, prep-srl, quantifier, question type, relation-extraction, similarity, temporal normalizer, tokenizer…
Fast Entity Linker Toolkit for training models to link entities to KnowledgeBase (Wikipedia) in documents and queries.
Java 8 Recommender Systems framework for novelty, diversity and much more
A machine learning tool for fishing entities
Dexter is a framework that implements some popular algorithms and provides all the tools needed to develop any entity linking technique.
Improving topic models LDA and DMM (one-topic-per-document model for short texts) with word embeddings (TACL 2015)
A text tagger based on Lucene / Solr, using FST technology
A probabilistic approach from an Improbabilistic company



