A Python wrapper for MADlib - an open source library for scalable in-database machine learning algorithms
PyMADlib currently has wrappers for the following algorithms in MADlib
- Linear regression
- Logistic Regression
- SVM (regression & classification)
- K-Means
- LDA
Refer MADlib User Docs for MADlib's user documentation.
- You'll need the python extension psycopg2 to use PyMADlib.
- If you have matplotlib installed, you'll see Matplotlib visualizations for Linear Regression demo.
- If you have installed networkx, you'll see a visualization of the k-means demo
- PyROC is included in the source of this distribution with permission from its developer. You'll see a visualization of the ROC curves for Logistic Regression.
To configure your DB Connection parameters You should create a file in your home directory
~/.pymadlib.config
that should look like so :
[db_connection] user = gpadmin password = XXXXX hostname = 127.0.0.1 (or the IP of your DB server) port = 5432 (the port# of your DB) database = vatsandb (the database you wish to connect to)
PyMADlib depends on psycopg2
and Pandas
. It is easiest to work with PyMADlib if you have Anaconda Python
.
-
Download & install [Anaconda-1.9.0-MacOSX-x86_64.pkg] (http://repo.continuum.io/archive/Anaconda-1.9.0-MacOSX-x86_64.pkg)
-
Open a terminal and check if you have Anaconda Python & the package manager conda
vatsan-mac$ which python /Users/vatsan/anaconda/bin/python vatsan-mac$ which conda /Users/vatsan/anaconda/bin/conda
- If you haven't installed PostgreSQL on your Mac already, you'll have to download & install
PostGreSQL
for Mac. This is so that we get some required libraries to compile the SQL Engine: psycopg2. The easiest way to installPostGreSQL
on Mac is viahttp://postgresapp.com/
. Once you've downloaded and installed PostGreSQL on Mac, it should typically be found under/Library/PostgreSQL
vatsan-mac$ ls /Library/PostgreSQL/9.2/ Library include pg_env.sh uninstall-postgresql.app bin installer scripts data lib share doc pgAdmin3.app stackbuilder.app
I don't think the version of the PostGreSQL
matters (9.1 or above is fine).
- You may need to create some symlinks to
libpq
&libssl
so thatpsycopg2
is able to find it:
vatsan-mac$ sudo ln -s /Users/vatsan/anaconda/lib/libssl.1.0.0.dylib /usr/lib vatsan-mac$ sudo ln -s /Users/vatsan/anaconda/lib/libcrypto.1.0.0.dylib /usr/lib
- Install
Psycopg2
vatsan-mac$ conda install distribute vatsan-mac$ pip install psycopg2
- Now we're ready to test if the installations of the required libraries were successful.
vatsan-mac$ python -c 'import psycopg2'
If the above command did not error out, then installation was successful.
- You may install
PyMADlib
by downloading the source (from PyPI) and then run the following
sudo python setup.py build sudo python setup.py install
- If you use easy_install or pip, simply run :
sudo easy_install pymadlib
Visit PyMADlib Tutorial for a tutorial on using PyMADlib Also visit PyMADlib IPython NB to download the IPython NB tutorial
You may run the demo from the extracted directory of pymadlib like so :
python example.py
If you installed PyMADlib using instructions in the previous section, then simply run
python -c 'from pymadlib.example import runDemos; runDemos()'
Remember to close the Matplotlib windows that pop-up to continue with the rest of the demo.
PyMADlib packages publicly available datasets from the UCI machine learning repository and other sources.
- Wine quality dataset from UCI Machine Learning repository
- Auto MPG dataset from UCI ML repository from UCI Machine Learning repository
- Wine quality dataset from UCI Machine Learning repository
- Obama-Romney second presidential debate (2012) transcripts