Skip to content

namkoong-lab/dro

Repository files navigation

License: MIT Downloads pypy: v codecov

dro: A Python Package for Distributionally Robust Optimization in Machine Learning

Jiashuo Liu, Tianyu Wang, Henry Lam, Hongseok Namkoong, Jose Blanchet
† equal contributions (α-β order)

dro is a python package that implements typical DRO methods on linear loss (SVM, logistic regression, and linear regression) for supervised learning tasks. It is built based on the convex optimization solver cvxpy. The dro package supports different kinds of distance metrics $d(\cdot,\cdot)$ as well as different kinds of base models (e.g., linear regression, logistic regression, SVM, neural networks...). Furthermore, it integrates different synthetic data generating mechanisms from recent research papers.

Without specified, our DRO model is to solve the following optimization problem: $$\min_{\theta} \max_{P: P \in U} E_{(X,Y) \sim P}[\ell(\theta;(X, Y))],$$ where $U$ is the so-called ambiguity set and typically of the form $U = \{P: d(P, \hat P) \leq \epsilon\}$ and $\hat P := \frac{1}{n}\sum_{i = 1}^n \delta_{(X_i, Y_i)}$ is the empirical distribution of training samples ${(X_i, Y_i)}_{i = 1}^n$. And $\epsilon$ is the hyperparameter.

Installation

(1) Prepare Mosek license

Our package is built upon cvxpy and mosek (by default), where the latter needs the license file. The steps are as follows:

  • Request license at Official Website, and then the license mosek.lic will be emailed to you.

  • Put your license in your home directory as follows:

    cd
    mkdir mosek
    mv /path_to_license/mosek.lic  mosek/
    

(2) Install dro package

To install dro package, you can simply run:

pip install dro

And it will install all required packages.

Quick Start

A simple user example is as follows:

from dro.src.data.dataloader_classification import classification_basic
from dro.src.data.draw_utils import draw_classification
from dro.src.linear_model.chi2_dro import Chi2DRO

# Data generating
X, y = classification_basic(d = 2, num_samples = 100, radius = 2, visualize = True)

# Chi-square DRO 
clf_model = Chi2DRO(input_dim=2, model_type = 'logistic')
clf_model.update({'eps': 0.1})
print(clf_model.fit(X, y))

For more examples, please refer to our examples.

Documentation & APIs

As for the latest v0.2.2 version, dro supports:

(1) Synthetic data generation

Python Module Function Name Description




dro.src.data.dataloader_classification
classification_basic Basic classification task
classification_DN21 Following Section 3.1.1 of
"Learning Models with Uniform Performance via Distributionally Robust Optimization"
classification_SNVD20 Following Section 5.1 of
"Certifying Some Distributional Robustness with Principled Adversarial Training"
classification_LWLC Following Section 4.1 (Classification) of
"Distributionally Robust Optimization with Data Geometry"





dro.src.data.dataloader_regression
regression_basic Basic regression task
regression_DN20_1 Following Section 3.1.2 of
"Learning Models with Uniform Performance via Distributionally Robust Optimization"
regression_DN20_2 Following Section 3.1.3 of
"Learning Models with Uniform Performance via Distributionally Robust Optimization"
regression_DN20_3 Following Section 3.3 of
"Learning Models with Uniform Performance via Distributionally Robust Optimization"
regression_LWLC Following Section 4.1 (Regression)
of "Distributionally Robust Optimization with Data Geometry"

(2) Linear DRO models

The models listed below are solved by exact solvers from cvxpy.

Python Module Class Name Description
dro.src.linear_dro.base BaseLinearDRO Base class for linear DRO methods
dro.src.linear_dro.chi2_dro Chi2DRO Linear chi-square divergence-based DRO
dro.src.linear_dro.kl_dro KLDRO Kullback-Leibler divergence-based DRO
dro.src.linear_dro.cvar_dro CVaRDRO CVaR DRO
dro.src.linear_dro.tv_dro TVDRO Total Variation DRO
dro.src.linear_dro.marginal_dro MarginalCVaRDRO Marginal-X CVaR DRO
dro.src.linear_dro.mmd_dro MMD_DRO Maximum Mean Discrepancy DRO
dro.src.linear_dro.conditional_dro ConditionalCVaRDRO Y|X (ConditionalShiftBased) CVaR DRO
dro.src.linear_dro.hr_dro HR_DRO_LR Holistic Robust DRO on linear models


dro.src.linear_dro.wasserstein_dro
WassersteinDRO Wasserstein DRO
WassersteinDROsatisficing Robust satisficing version of Wasserstein DRO
dro.src.linear_dro.sinkhorn_dro SinkhornLinearDRO Sinkhorn DRO on linear models
dro.src.linear_dro.mot_dro MOTDRO Optimal Transport DRO with Conditional Moment Constraints
dro.src.linear_dro.or_wasserstein_dro ORWDRO Outlier-Robust Wasserstein DRO

(3) NN DRO models

The models listed below are solved by gradient descent (Pytorch).

Python Module Class Name Description
dro.src.neural_model.base_nn BaseNNDRO Base model for neural-network-based DRO
dro.src.neural_model.fdro_nn Chi2NNDRO Chi-square Divergence-based Neural DRO Model
dro.src.neural_model.wdro_nn WNNDRO Wasserstein Neural DRO with Adversarial Robustness.
dro.src.neural_model.hrdro_nn HRNNDRO Holistic Robust NN DRO

(4) Tree-based Ensemble DRO models

The models listed below are solved by function approximation (xgboost, lightgbm).

Python Module Class Name Description


dro.src.tree_model.lgbm
KLDRO_LGBM KL Divergence-based Robust LightGBM
CVaRDRO_LGBM CVaR Robust LightGBM
Chi2DRO_LGBM Chi2 Divergence-based Robust LightGBM
CVaRDRO_LGBM CVaR Robust LightGBM


dro.src.tree_model.xgb
KLDRO_XGB KL Divergence-based Robust XGBoost
Chi2DRO_XGB Chi2 Divergence-based Robust XGBoost
CVaRDRO_XGB CVaR Robust XGBoost

(5) Model-based Diagnostics

In linear DRO models, we provide additional interfaces for understanding the worst-case model performance and evaluating the true model performance.

Python Module Function Name Description
.worst_distribution the worst case distribution of the DRO model
.evaluate true out-of-sample model performance of the DRO model

For more details, please refer to https://python-dro.org for more details!

ps: our logo is generated via GPT:)

About

A package of distributionally robust optimization (DRO) methods. Implemented via cvxpy and PyTorch

Topics

Resources

License

Stars

Watchers

Forks

Contributors 2

  •  
  •