Page 2 | Best Open Source Data Science Tools 2025

Data Science Tools

View 127 business solutions

Data Science Clear Filters

Grafana: The open and composable observability platform
Faster answers, predictable costs, and no lock-in built by the team helping to make observability accessible to anyone.

Grafana is the open source analytics & monitoring solution for every database.

Learn More
Keep company data safe with Chrome Enterprise
Protect your business with AI policies and data loss prevention in the browser

Make AI work your way with Chrome Enterprise. Block unapproved sites and set custom data controls that align with your company's policies.

Download Chrome
1

Acamica - Data Science

Downloads: 0 This Week

Last Update: 2020-11-30
See Project
2

Amazon SageMaker Examples

Jupyter notebooks that demonstrate how to build models using SageMaker

Welcome to Amazon SageMaker. This projects highlights example Jupyter notebooks for a variety of machine learning use cases that you can run in SageMaker. If you’re new to SageMaker we recommend starting with more feature-rich SageMaker Studio. It uses the familiar JupyterLab interface and has seamless integration with a variety of deep learning and data science environments and scalable compute resources for training, inference, and other ML operations. Studio offers teams and companies easy on-boarding for their team members, freeing them up from complex systems admin and security processes. Administrators control data access and resource provisioning for their users. Notebook Instances are another option. They have the familiar Jupyter and JuypterLab interfaces that work well for single users, or small teams where users are also administrators. Advanced users also use SageMaker solely with the AWS CLI and Python scripts using boto3 and/or the SageMaker Python SDK.

Downloads: 0 This Week

Last Update: 2021-09-14
See Project
3

ClearML

Streamline your ML workflow

ClearML is an open source platform that automates and simplifies developing and managing machine learning solutions for thousands of data science teams all over the world. It is designed as an end-to-end MLOps suite allowing you to focus on developing your ML code & automation, while ClearML ensures your work is reproducible and scalable. The ClearML Python Package for integrating ClearML into your existing scripts by adding just two lines of code, and optionally extending your experiments and other workflows with ClearML powerful and versatile set of classes and methods. The ClearML Server storing experiment, model, and workflow data, and supports the Web UI experiment manager, and ML-Ops automation for reproducibility and tuning. It is available as a hosted service and open source for you to deploy your own ClearML Server. The ClearML Agent for ML-Ops orchestration, experiment and workflow reproducibility, and scalability.

Downloads: 0 This Week

Last Update: 2025-07-10
See Project
4

Cookiecutter Data Science

Project structure for doing and sharing data science work

A logical, reasonably standardized, but flexible project structure for doing and sharing data science work. When we think about data analysis, we often think just about the resulting reports, insights, or visualizations. While these end products are generally the main event, it's easy to focus on making the products look nice and ignore the quality of the code that generates them. Because these end products are created programmatically, code quality is still important! And we're not talking about bikeshedding the indentation aesthetics or pedantic formatting standards, ultimately, data science code quality is about correctness and reproducibility. It's no secret that good analyses are often the result of very scattershot and serendipitous explorations. Tentative experiments and rapidly testing approaches that might not work out are all part of the process for getting to the good stuff, and there is no magic bullet to turn data exploration into a simple, linear progression.

Downloads: 0 This Week

Last Update: 2025-07-24
See Project
Lightspeed golf course management software
Lightspeed Golf is all-in-one golf course management software to help courses simplify operations, drive revenue and deliver amazing golf experiences.

From tee sheet management, point of sale and payment processing to marketing, automation, reporting and more—Lightspeed is built for the pro shop, restaurant, back office, beverage cart and beyond.

Learn More
5

DEPRECATED - KVFinder

Cavity Detection PyMOL plugin

The KVFinder software, originally published in 2014, is deprecated. We published more recent software: parKVFinder and pyKVFinder. [parKVFinder] A Linux/macOS version is available in this GitHub repository, https://github.com/LBC-LNBio/parKVFinder, while a Windows version is in this GitHub repository, https://github.com/LBC-LNBio/parKVFinder-win. Please read and cite the original paper ParKVFinder: A thread-level parallel approach in biomolecular cavity detection (10.1016/j.softx.2020.100606). [pyKVFinder] pyKVFinder is available in this Python Package Index (PyPI) repository, https://pypi.org/project/pyKVFinder and this GitHub repository, https://github.com/LBC-LNBio/pyKVFinder. Please read and cite the original paper pyKVFinder: an efficient and integrable Python package for biomolecular cavity detection and characterization in data science (10.1186/s12859-021-04519-4).

Downloads: 0 This Week

Last Update: 2022-02-11
See Project
6

DSTK - Data Science TooKit 3

Data and Text Mining Software for Everyone

DSTK - Data Science Toolkit 3 is a set of data and text mining softwares, following the CRISP DM model. DSTK offers data understanding using statistical and text analysis, data preparation using normalization and text processing, modeling and evaluation for machine learning and algorithms. It is based on the old version DSTK at https://sourceforge.net/projects/dstk2/ DSTK Engine is like R. DSTK ScriptWriter offers GUI to write DSTK script. DSTK Studio offers SPSS Statistics like GUI for data mining, and DSTK Text Explorer offers GUI for Text Mining. DSTK Engine and DSTK ScriptWriter are opensource, but DSTK Studio and Text Explorer requires small amount of payment. DSTK Studio and Text Explorer are free to use 10 times

Downloads: 0 This Week

Last Update: 2019-06-07
See Project
7

Dask

Parallel computing with task scheduling

Dask is a Python library for parallel and distributed computing, designed to scale analytics workloads from single machines to large clusters. It integrates with familiar tools like NumPy, Pandas, and scikit-learn while enabling execution across cores or nodes with minimal code changes. Dask excels at handling large datasets that don’t fit into memory and is widely used in data science, machine learning, and big data pipelines.

Downloads: 0 This Week

Last Update: 2025-11-06
See Project
8

Data Science

A learning library for Data Science

This project is a collection of sub-projects that contain various experiments in various languages for exploring the machine learning and data science fields. Notable languages are Scala and Python.

Downloads: 0 This Week

Last Update: 2017-06-09
See Project
9

Data Science Notes

Curated collection of data science learning materials

Data Science Notes is a large, curated collection of data science learning materials, with explanations, code snippets, and structured notes across the typical end-to-end workflow. It spans foundational math and statistics through data wrangling, visualization, machine learning, and practical project organization. The content emphasizes hands-on understanding by pairing narrative notes with runnable examples, making it useful for both self-study and classroom settings. Because it aggregates topics in one place, learners can move linearly or jump into specific areas as needed during projects. The notes also highlight common pitfalls and good practices, which helps beginners adopt professional habits early. It’s a living resource that many students consult when revising fundamentals or exploring adjacent tools in the ecosystem.

Downloads: 0 This Week

Last Update: 2025-11-03
See Project
Award-Winning Medical Office Software Designed for Your Specialty
Succeed and scale your practice with cloud-based, data-backed, AI-powered healthcare software.

RXNT is an ambulatory healthcare technology pioneer that empowers medical practices and healthcare organizations to succeed and scale through innovative, data-backed, AI-powered software.

Learn More
10

Deep Learning course

Slides and Jupyter notebooks for the Deep Learning lectures

Slides and Jupyter notebooks for the Deep Learning lectures at Master Year 2 Data Science from Institut Polytechnique de Paris. This course is being taught at as part of Master Year 2 Data Science IP-Paris. Note: press "P" to display the presenter's notes that include some comments and additional references. This lecture is built and maintained by Olivier Grisel and Charles Ollion.

Downloads: 0 This Week

Last Update: 2022-08-17
See Project
11

Deep Learning with PyTorch

Latest techniques in deep learning and representation learning

This course concerns the latest techniques in deep learning and representation learning, focusing on supervised and unsupervised deep learning, embedding methods, metric learning, convolutional and recurrent nets, with applications to computer vision, natural language understanding, and speech recognition. The prerequisites include DS-GA 1001 Intro to Data Science or a graduate-level machine learning course. To be able to follow the exercises, you are going to need a laptop with Miniconda (a minimal version of Anaconda) and several Python packages installed. The following instruction would work as is for Mac or Ubuntu Linux users, Windows users would need to install and work in the Git BASH terminal. JupyterLab has a built-in selectable dark theme, so you only need to install something if you want to use the classic notebook interface.

Downloads: 0 This Week

Last Update: 2021-10-12
See Project
12

DeepLearningProject

An in-depth machine learning tutorial

This tutorial tries to do what most Most Machine Learning tutorials available online do not. It is not a 30 minute tutorial that teaches you how to "Train your own neural network" or "Learn deep learning in under 30 minutes". It's a full pipeline which you would need to do if you actually work with machine learning - introducing you to all the parts, and all the implementation decisions and details that need to be made. The dataset is not one of the standard sets like MNIST or CIFAR, you will make you very own dataset. Then you will go through a couple conventional machine learning algorithms, before finally getting to deep learning! In the fall of 2016, I was a Teaching Fellow (Harvard's version of TA) for the graduate class on "Advanced Topics in Data Science (CS209/109)" at Harvard University. I was in charge of designing the class project given to the students, and this tutorial has been built on top of the project I designed for the class.

Downloads: 0 This Week

Last Update: 2022-08-03
See Project
13

FlexiList.

FlexiList is a Java data structure that combines the benefits of array

FlexiList is a Java data structure that combines the benefits of arrays and linked lists. Like an array, it allows for efficient access to elements by index. Like a linked list, it allows for efficient insertion and deletion of elements at any position in the list. Benefits Over Arrays and ArrayList ->Efficient Insertion and Deletion: FlexiList can insert or delete nodes at any position in the list in O(1) time, whereas arrays require shifting all elements after the insertion or deletion point. ->Dynamic Size: FlexiList can grow or shrink dynamically as elements are added or removed, whereas arrays have a fixed size. ->Good Memory Locality: FlexiList nodes are stored in a contiguous block of memory, making it more cache-friendly than arrays. ->Faster Insertion and Deletion: FlexiList can insert or delete nodes at any position in the list in O(1) time, whereas ArrayList requires shifting all elements after the insertion or deletion point.

1 Review

Downloads: 0 This Week

Last Update: 2024-06-10
See Project
14

Forecasting Best Practices

Time Series Forecasting Best Practices & Examples

Time series forecasting is one of the most important topics in data science. Almost every business needs to predict the future in order to make better decisions and allocate resources more effectively. This repository provides examples and best practice guidelines for building forecasting solutions. The goal of this repository is to build a comprehensive set of tools and examples that leverage recent advances in forecasting algorithms to build solutions and operationalize them. Rather than creating implementations from scratch, we draw from existing state-of-the-art libraries and build additional utilities around processing and featuring the data, optimizing and evaluating models, and scaling up to the cloud. The examples and best practices are provided as Python Jupyter notebooks and R markdown files and a library of utility functions.

Downloads: 0 This Week

Last Update: 2022-08-08
See Project
15

ML workspace

All-in-one web-based IDE specialized for machine learning

All-in-one web-based development environment for machine learning. The ML workspace is an all-in-one web-based IDE specialized for machine learning and data science. It is simple to deploy and gets you started within minutes to productively built ML solutions on your own machines. This workspace is the ultimate tool for developers preloaded with a variety of popular data science libraries (e.g., Tensorflow, PyTorch, Keras, Sklearn) and dev tools (e.g., Jupyter, VS Code, Tensorboard) perfectly configured, optimized, and integrated. Usable as remote kernel (Jupyter) or remote machine (VS Code) via SSH. Easy to deploy on Mac, Linux, and Windows via Docker. Jupyter, JupyterLab, and Visual Studio Code web-based IDEs.By default, the workspace container has no resource constraints and can use as much of a given resource as the host’s kernel scheduler allows.

Downloads: 0 This Week

Last Update: 2022-07-12
See Project
16

NannyML

Detecting silent model failure. NannyML estimates performance

NannyML is an open-source python library that allows you to estimate post-deployment model performance (without access to targets), detect data drift, and intelligently link data drift alerts back to changes in model performance. Built for data scientists, NannyML has an easy-to-use interface, and interactive visualizations, is completely model-agnostic, and currently supports all tabular classification use cases. NannyML closes the loop with performance monitoring and post deployment data science, empowering data scientist to quickly understand and automatically detect silent model failure. By using NannyML, data scientists can finally maintain complete visibility and trust in their deployed machine learning models. When the actual outcome of your deployed prediction models is delayed, or even when post-deployment target labels are completely absent, you can use NannyML's CBPE-algorithm to estimate model performance.

Downloads: 0 This Week

Last Update: 2025-07-12
See Project
17

NuzeBot

Finds interesting news headlines.

This is a bot to finds the news you want to see. It can be made to find the news that interests you and reject everything else. View on one page the most interesting headlines from many websites.

Downloads: 0 This Week

Last Update: 2024-10-31
See Project
18

OGLDataScienceTool

Opengl tool for data science visualization

Data visualization tool written in LWJGL Compatible with libgdx and other opengl wrappers The project depends on apache poi, and apache commons, for office files support Planned features for next release: * reading json, and other nosql data structures * jdbc connection for creating dataframes * data heatmaps, and additional plots for questions, contact me kumar.santhi1982@hotmail.com more details: http://www.java-gaming.org/topics/ds/41920/view.html http://datascienceforindia.com/

Downloads: 0 This Week

Last Update: 2018-11-27
See Project
19

Omega Data Science

Downloads: 0 This Week

Last Update: 2016-10-05
See Project
20

PySyft

Data science on data without acquiring a copy

Most software libraries let you compute over the information you own and see inside of machines you control. However, this means that you cannot compute on information without first obtaining (at least partial) ownership of that information. It also means that you cannot compute using machines without first obtaining control over those machines. This is very limiting to human collaboration and systematically drives the centralization of data, because you cannot work with a bunch of data without first putting it all in one (central) place. The Syft ecosystem seeks to change this system, allowing you to write software which can compute over information you do not own on machines you do not have (total) control over. This not only includes servers in the cloud, but also personal desktops, laptops, mobile phones, websites, and edge devices. Wherever your data wants to live in your ownership, the Syft ecosystem exists to help keep it there while allowing it to be used privately.

Downloads: 0 This Week

Last Update: 2025-02-13
See Project
21

Raku-DSL-Shared

Raku package for DSL shared utilities and grammar roles.

This repository provides a Raku package for shared utilities and (grammar) roles in the package context "DSL::". ("DSL" stands for "Domains Specific Language".) The initial versions of the code in this repository can be found in the GitHub repository [AAr1]. ## Utilities One of the reasons for making this package is to encapsulate and easily share utilities for making DSL translators. Here are "the first wave" utilities: Modify token patterns to include fuzzy matching Merge two or more roles into one ## Roles Another reason for making this package is to encapsulate and easily share grammar roles for making DSL translators. Here are "the first wave" roles: Error handling role Common English terms and phrases role used in workflows from Machine Learning, Data Science, or Scientific Computing Predicate specification role

Downloads: 0 This Week

Last Update: 2022-04-04
See Project
22

Recommenders

Best practices on recommendation systems

The Recommenders repository provides examples and best practices for building recommendation systems, provided as Jupyter notebooks. The module reco_utils contains functions to simplify common tasks used when developing and evaluating recommender systems. Several utilities are provided in reco_utils to support common tasks such as loading datasets in the format expected by different algorithms, evaluating model outputs, and splitting training/test data. Implementations of several state-of-the-art algorithms are included for self-study and customization in your own applications. Please see the setup guide for more details on setting up your machine locally, on a data science virtual machine (DSVM) or on Azure Databricks. Independent or incubating algorithms and utilities are candidates for the contrib folder. This will house contributions which may not easily fit into the core repository or need time to refactor or mature the code and add necessary tests.

Downloads: 0 This Week

Last Update: 2024-12-23
See Project
23

SageMaker Containers

Create SageMaker-compatible Docker containers

Amazon SageMaker is a fully managed service for data science and machine learning (ML) workflows. You can use Amazon SageMaker to simplify the process of building, training, and deploying ML models. To train a model, you can include your training script and dependencies in a Docker container that runs your training code. A container provides an effectively isolated environment, ensuring a consistent runtime and reliable training process. The SageMaker Training Toolkit can be easily added to any Docker container, making it compatible with SageMaker for training models. If you use a prebuilt SageMaker Docker image for training, this library may already be included. Very often, an entry point needs additional information from the container that is not available in hyperparameters. SageMaker Containers writes this information as environment variables that are available inside the script.

Downloads: 0 This Week

Last Update: 2022-07-12
See Project
24

SageMaker Inference Toolkit

Serve machine learning models within a Docker container

Serve machine learning models within a Docker container using Amazon SageMaker. Amazon SageMaker is a fully managed service for data science and machine learning (ML) workflows. You can use Amazon SageMaker to simplify the process of building, training, and deploying ML models. Once you have a trained model, you can include it in a Docker container that runs your inference code. A container provides an effectively isolated environment, ensuring a consistent runtime regardless of where the container is deployed. Containerizing your model and code enables fast and reliable deployment of your model. The SageMaker Inference Toolkit implements a model serving stack and can be easily added to any Docker container, making it deployable to SageMaker. This library's serving stack is built on Multi Model Server, and it can serve your own models or those you trained on SageMaker using machine learning frameworks with native SageMaker support.

Downloads: 0 This Week

Last Update: 2023-10-25
See Project
25

Seldon Server

Machine learning platform and recommendation engine on Kubernetes

Seldon Server is a machine learning platform and recommendation engine built on Kubernetes. Seldon reduces time-to-value so models can get to work faster. Scale with confidence and minimize risk through interpretable results and transparent model performance. Seldon Core focuses purely on deploying a wide range of ML models on Kubernetes, allowing complex runtime serving graphs to be managed in production. Seldon Core is a progression of the goals of the Seldon-Server project but also a more restricted focus to solving the final step in a machine learning project which is serving models in production. Seldon Server is a machine learning platform that helps your data science team deploy models into production. It provides an open-source data science stack that runs within a Kubernetes Cluster. You can use Seldon to deploy machine learning and deep learning models into production on-premise or in the cloud (e.g. GCP, AWS, Azure).

Downloads: 0 This Week

Last Update: 2022-04-05
See Project