Skip to content

Shellcat-Zero/skmca

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Use https://github.com/MaxHalford/Prince instead

skmca

A scikit-learn pipeline API compatible implementation of Multiple Correspondence Analysis (MCA).

Usage

import pandas as pd
from skmca import MCA

df = pd.read_csv('http://www.statoek.wiso.uni-goettingen.de/'
                 'CARME-N/download/wg93.txt',
                 sep='\t', dtype='category')
mca = MCA()
mca.fit(df)

Crucially, the input to MCA.fit must be a pandas.DataFrame where all the columns have a category dtype. This is necessary to ensure that the dummy encoding of the columns is consistent across training and test datasets.

Background

MCA is like `PCA`_, but for categorical data. You can use it to visualize high-dimensional datasets. It can also be useful as a pre-processing step for clustering, to avoid the curse of dimensionality.

skmca requires pandas and scikit-learn.

References

This library follows the setup in `Nenadic and Greenacre (2005)`_.

About

A scikit-learn compatible implementation of MCA

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%