GitHub - mxbi/ParallelCV: Multi-threaded cross-validation implementation for KNIME

ParallelCV - Multi-threaded cross-validation for KNIME

ParallelCV is a meta-node for KNIME, similar in functionality to the X-Partitioner/X-Agreggator, except that it performs each fold on a seperate thread in parallel.

On single-threaded workloads (such as training a Multi-Layer Perceptron), and a CPU with >4 threads, this has a 5x speed increase versus traditional CV. However, on workloads which are multi-threaded already (eg. Random Forests), this has no performance impact.

Features

5x faster than built-in tools
Equal folds regardless of dataset size
Results are in the same order as input
Robust against uneven fold sizes (remainder gets placed into 5th fold)
Drop in replacement for X-Paritioner/Aggregator

Shortcomings

5-fold cross-validation only
No performance benefit with multi-threaded loads
Learner/Predictor must be duplicated for each fold in workflow due to KNIME limitations

Bugs

No bugs, just features

Install

Head over to the releases page to download the latest zip file. In KNIME, choose File -> Import Workflow and select the Workflow zip file. This contains a workflow with the ParallelCV Partitioner and Agreggator setup in an example, ready to copy to another Workflow.

Alternatively, you can git clone and place the 'Workflow' folder in your KNIME Workspace directory.

Usage

Copy the Paritioner and Aggregator nodes to your workflow
Connect the input table to the input port of the ParallelCV-Partitioner
Duplicate your Predictor/Learner pair for each fold and connect each one to a pair of train/test ports on the node as per the below layout
Take the labeled output table from your learner and connect them to the Aggregator.
Connect the output of the Aggregator and use it in your workflow

Unfortunately due to limitations in the way KNIME works, this requires a seperate learner/predictor copy for each fold. If you want to control the parameters for all folds at once, use a Table Creator with a Table Row to Variable node to pass parameters as flow variables to your learner.

Under the hood

The partitioner is essentially a bunch of Math Formula nodes which calculate the start and endpoints of each fold. These are then converted into flow variables, merged, and passed to a Row Filter which creates the train/test sets for each fold. The aggregator takes the output tables from the predictor and concanetates them back into one table.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
Workflow		Workflow
LICENSE		LICENSE
README.md		README.md
node-internals.png		node-internals.png
node-ports.png		node-ports.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ParallelCV - Multi-threaded cross-validation for KNIME

Features

Shortcomings

Bugs

Install

Usage

Under the hood

About

Uh oh!

Releases 1

Packages

License

mxbi/ParallelCV

Folders and files

Latest commit

History

Repository files navigation

ParallelCV - Multi-threaded cross-validation for KNIME

Features

Shortcomings

Bugs

Install

Usage

Under the hood

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Packages