FlowGrid

FlowGrid density-based clustering algorithm that can perform fast and accurate clustering on very large scRNA-seq data sets. It can be implemented with Scanpy for fast clustering of Scanpy Anndata.

Installation

FlowGrid supports pip installation.

pip install FlowGrid / pip3 install FlowGrid

Example1:

Running Flowgrid within Scanpy for scRNA-seq analysis

requirement	location
Package: Scanpy	https://scanpy.readthedocs.io/en/stable/
Data: Mouse Brain data set [https://www.nature.com/articles/s41593-017-0029-5?WT.feed_name=subjects_molecular-biology]	https://storage.googleapis.com/h5ad/2017-12-Hrvatin-et-al-NNeuroscience/GSE102827_merged_all_raw.h5ad

Remind！

The result of the steps below and detailed workflow can be found in the FlowGrid_Example.ipynb

Install the packages

pip install FlowGrid
pip install scanpy

Import the packages and do the basic setting

import FlowGrid
import scanpy as sc

Load the data

#You can change your file location here
adata = sc.read('~/GSE102827_merged_all_raw.h5ad')

Preprocess

#Normalization
sc.pp.normalize_per_cell(adata, counts_per_cell_after=1e4)
sc.pp.log1p(adata)
adata.raw = adata
#Highly variable genes selection
sc.pp.highly_variable_genes(adata, min_mean=0.0125, max_mean=3, min_disp=0.5)
adata = adata[:, adata.var['highly_variable']]

PCA for dimensionality reduction

#PCA to 5 dimensions#
sc.tl.pca(adata, n_comps=5)

Cluster using FlowGrid

You can use autoFlowGrid to do clustering for the data automatically.

#recomm_parameters = FlowGrid.autoFlowGrid(adata, int(set_n), list(binN_range), list(eps_range), list(MinDenB_range), list(MinDenC_range))

FlowGrid is extremely good at scalability, so we can implement a wide range parameter space of bin_n and eps, where eps = [1,2,3,4,5] and bin_n=[6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25]. autoFlowGrid will iterate all good possibilities of bin_n and eps with effective pruning strategy. Users can also specify binN_range and eps_range to reduce computational time.

Sample usage is as follows:

recomm_parameters, CHI_reports = FlowGrid.autoFlowGrid(adata, 5)

Visualize the result

#neighbor graph
sc.pp.neighbors(adata, n_neighbors=30, n_pcs=5)
#umap
sc.tl.umap(adata)

#results of recommended parameters
for i in range(len(recomm_parameters)):
    sc.pl.umap(adata, color=recomm_parameters[i],frameon =False)

NOTE

Run FlowGrid with specified parameters

You can also specify the parameter to do clustering.

#FlowGrid.cluster(adata, int(binN), float(eps), int(MinDenB), int(MinDenC))

binN is the number of bins for grid, recommended range for binN is from 10 to 25, large binN should result in more cluster groups.
eps is the maximun distance between two bins, recommended range for eps is from 1.0 to 2.5, larger eps should result in less cluster groups.
Sample usage is as follows:

FlowGrid.cluster(adata, 10, 1.2)

Compute adjusted Rand index when there are reference labels

Adjusted Rand index can be calculated when there are reference labels, or you can compare results between FlowGrid and Louvain or different parameters.

#FlowGrid.AdjustedRandScore(adata, list[predlabel_list], list[reflabel_list])

predlabel_list is the cluster label list to evaluate.
reflabel_list is the ref label list to be used as a reference.
Sample usage is as follows:

FlowGrid.AdjustedRandScore(adata, ['binN_10_eps_1.0_FlowGrid', 'louvain'], ['maintype', 'celltype'])

Keep only valuable results

Unneccessary results can be removed to make Anndata.obs more clean.

#FlowGrid.keep_labels(adata, list[remain_list])

remain_list is the list of FlowGrid clustering results you want to reserve.
Sample usage is as follows:

FlowGrid.keep_labels(adata,  ['binN_9_eps_1.1_FlowGrid', 'binN_10_eps_1.0_FlowGrid'])

consensusFlowGrid

ConsensusFlowGrid can be used for high-dimensional data.

Sample usage is as follows:

sc.tl.pca(adata100k, n_comps=20)
consensusResult = consensusFlowGrid(adata, nDims = 20)

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
FlowGrid		FlowGrid
dist		dist
image		image
FlowGridHavrtinExample.ipynb		FlowGridHavrtinExample.ipynb
LICENSE.txt		LICENSE.txt
MANIFEST		MANIFEST
README.md		README.md
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

FlowGrid

Installation

Example1:

Remind！

Install the packages

Import the packages and do the basic setting

Load the data

Preprocess

PCA for dimensionality reduction

Cluster using FlowGrid

Visualize the result

NOTE

Run FlowGrid with specified parameters

Compute adjusted Rand index when there are reference labels

Keep only valuable results

consensusFlowGrid

License

About

Uh oh!

Releases

Packages

Languages

License

xiayuan-huang/FlowGrid

Folders and files

Latest commit

History

Repository files navigation

FlowGrid

Installation

Example1:

Remind！

Install the packages

Import the packages and do the basic setting

Load the data

Preprocess

PCA for dimensionality reduction

Cluster using FlowGrid

Visualize the result

NOTE

Run FlowGrid with specified parameters

Compute adjusted Rand index when there are reference labels

Keep only valuable results

consensusFlowGrid

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages