Merge pull request rasbt#342 from jrbourbeau/add_decision_regions_kwargs

rasbt · web-flow · commit eef3bb9af6f2 · 2018-03-14T21:40:24.000-04:00
Adds style related dictionaries to plot_decision_regions
diff --git a/docs/sources/CHANGELOG.md b/docs/sources/CHANGELOG.md
@@ -35,6 +35,8 @@ The CHANGELOG for the current development version is available at
 - Apriori code is faster due to optimization in `onehot transformation` and the amount of candidates generated by the `apriori` algorithm. ([#327](https://github.com/rasbt/mlxtend/pull/327) by [Jakub Smid](https://github.com/jaksmid))
 - The `OnehotTransactions` class (which is typically often used in combination with the `apriori` function for association rule mining) is now more memory efficient as it uses boolean arrays instead of integer arrays. In addition, the `OnehotTransactions` class can be now be provided with `sparse` argument to generate sparse representations of the `onehot` matrix to further improve memory efficiency. ([#328](https://github.com/rasbt/mlxtend/pull/328) by [Jakub Smid](https://github.com/jaksmid))
 - The `OneHotTransactions` has been deprecated and replaced by the `TransactionEncoder`. ([#332](https://github.com/rasbt/mlxtend/pull/332)
+- The `plot_decision_regions` function now has three new parameters, `scatter_kwargs`, `contourf_kwargs`, and `scatter_highlight_kwargs`, that can be used to modify the plotting style. ([#342](https://github.com/rasbt/mlxtend/pull/342) by [James Bourbeau](https://github.com/jrbourbeau))
+
 
 ##### Bug Fixes
 
@@ -55,7 +57,7 @@ The CHANGELOG for the current development version is available at
 
 - New `store_train_meta_features` parameter for `fit` in StackingCVRegressor. if True, train meta-features are stored in `self.train_meta_features_`.
     New `pred_meta_features` method for `StackingCVRegressor`. People can get test meta-features using this method. ([#294](https://github.com/rasbt/mlxtend/pull/294) via [takashioya](https://github.com/takashioya))
-- The new `store_train_meta_features` attribute and `pred_meta_features` method for the `StackingCVRegressor` were also added to the `StackingRegressor`, `StackingClassifier`, and `StackingCVClassifier` ([#299](https://github.com/rasbt/mlxtend/pull/299) & [#300](https://github.com/rasbt/mlxtend/pull/300)) 
+- The new `store_train_meta_features` attribute and `pred_meta_features` method for the `StackingCVRegressor` were also added to the `StackingRegressor`, `StackingClassifier`, and `StackingCVClassifier` ([#299](https://github.com/rasbt/mlxtend/pull/299) & [#300](https://github.com/rasbt/mlxtend/pull/300))
 - New function (`evaluate.mcnemar_tables`) for creating multiple 2x2 contigency from model predictions arrays that can be used in multiple McNemar (post-hoc) tests or Cochran's Q or F tests, etc. ([#307](https://github.com/rasbt/mlxtend/issues/307))
 - New function (`evaluate.cochrans_q`) for performing Cochran's Q test to compare the accuracy of multiple classifiers. ([#310](https://github.com/rasbt/mlxtend/issues/310))
 
@@ -84,8 +86,8 @@ The CHANGELOG for the current development version is available at
 ##### Changes
 
 - All feature index tuples in `SequentialFeatureSelector` or now in sorted order. ([#262](https://github.com/rasbt/mlxtend/pull/262))
-- The `SequentialFeatureSelector` now runs the continuation of the floating inclusion/exclusion as described in Novovicova & Kittler (1994). 
-Note that this didn't cause any difference in performance on any of the test scenarios but could lead to better performance in certain edge cases. 
+- The `SequentialFeatureSelector` now runs the continuation of the floating inclusion/exclusion as described in Novovicova & Kittler (1994).
+Note that this didn't cause any difference in performance on any of the test scenarios but could lead to better performance in certain edge cases.
 ([#262](https://github.com/rasbt/mlxtend/pull/262))
 - `utils.Counter` now accepts a name variable to help distinguish between multiple counters, time precision can be set with the 'precision' kwarg and the new attribute end_time holds the time the last iteration completed. ([#278](https://github.com/rasbt/mlxtend/pull/278) via [Mathew Savage](https://github.com/matsavage))
 
diff --git a/docs/sources/user_guide/plotting/plot_decision_regions.ipynb b/docs/sources/user_guide/plotting/plot_decision_regions.ipynb
diff --git a/mlxtend/plotting/decision_regions.py b/mlxtend/plotting/decision_regions.py
@@ -9,7 +9,7 @@
 from itertools import cycle
 import matplotlib.pyplot as plt
 import numpy as np
-from ..utils import check_Xy
+from ..utils import check_Xy, format_kwarg_dictionaries
 import warnings
 
 
@@ -49,7 +49,10 @@ def plot_decision_regions(X, y, clf,
                           legend=1,
                           hide_spines=True,
                           markers='s^oxv<>',
-                          colors='red,blue,limegreen,gray,cyan'):
+                          colors='red,blue,limegreen,gray,cyan',
+                          scatter_kwargs=None,
+                          contourf_kwargs=None,
+                          scatter_highlight_kwargs=None):
     """Plot decision regions of a classifier.
 
     Please note that this functions assumes that class labels are
@@ -95,10 +98,16 @@ def plot_decision_regions(X, y, clf,
     legend : int (default: 1)
         Integer to specify the legend location.
         No legend if legend is 0.
-    markers : str (default 's^oxv<>')
+    markers : str (default: 's^oxv<>')
         Scatterplot markers.
-    colors : str (default 'red,blue,limegreen,gray,cyan')
+    colors : str (default: 'red,blue,limegreen,gray,cyan')
         Comma separated list of colors.
+    scatter_kwargs : dict (default: None)
+        Keyword arguments for underlying matplotlib scatter function.
+    contourf_kwargs : dict (default: None)
+        Keyword arguments for underlying matplotlib contourf function.
+    scatter_highlight_kwargs : dict (default: None)
+        Keyword arguments for underlying matplotlib scatter function.
 
     Returns
     ---------
@@ -204,15 +213,26 @@ def plot_decision_regions(X, y, clf,
     Z = clf.predict(X_predict.astype(X.dtype))
     Z = Z.reshape(xx.shape)
     # Plot decisoin region
+    # Make sure contourf_kwargs has backwards compatible defaults
+    contourf_kwargs_default = {'alpha': 0.3, 'antialiased': True}
+    contourf_kwargs = format_kwarg_dictionaries(
+                        default_kwargs=contourf_kwargs_default,
+                        user_kwargs=contourf_kwargs,
+                        protected_keys=['colors', 'levels'])
     ax.contourf(xx, yy, Z,
-                alpha=0.3,
                 colors=colors,
                 levels=np.arange(Z.max() + 2) - 0.5,
-                antialiased=True)
+                **contourf_kwargs)
 
     ax.axis(xmin=xx.min(), xmax=xx.max(), y_min=yy.min(), y_max=yy.max())
 
     # Scatter training data samples
+    # Make sure scatter_kwargs has backwards compatible defaults
+    scatter_kwargs_default = {'alpha': 0.8, 'edgecolor': 'black'}
+    scatter_kwargs = format_kwarg_dictionaries(
+                        default_kwargs=scatter_kwargs_default,
+                        user_kwargs=scatter_kwargs,
+                        protected_keys=['c', 'marker', 'label'])
     for idx, c in enumerate(np.unique(y)):
         if dim == 1:
             y_data = [0 for i in X[y == c]]
@@ -232,11 +252,10 @@ def plot_decision_regions(X, y, clf,
 
         ax.scatter(x=x_data,
                    y=y_data,
-                   alpha=0.8,
                    c=colors[idx],
                    marker=next(marker_gen),
-                   edgecolor='black',
-                   label=c)
+                   label=c,
+                   **scatter_kwargs)
 
     if hide_spines:
         ax.spines['right'].set_visible(False)
@@ -248,14 +267,6 @@ def plot_decision_regions(X, y, clf,
     if dim == 1:
         ax.axes.get_yaxis().set_ticks([])
 
-    if legend:
-        if dim > 2 and filler_feature_ranges is None:
-            pass
-        else:
-            handles, labels = ax.get_legend_handles_labels()
-            ax.legend(handles, labels,
-                      framealpha=0.3, scatterpoints=1, loc=legend)
-
     if plot_testdata:
         if dim == 1:
             x_data = X_highlight
@@ -270,13 +281,26 @@ def plot_decision_regions(X, y, clf,
             y_data = X_highlight[feature_range_mask, y_index]
             x_data = X_highlight[feature_range_mask, x_index]
 
+        # Make sure scatter_highlight_kwargs backwards compatible defaults
+        scatter_highlight_defaults = {'c': '',
+                                      'edgecolor': 'black',
+                                      'alpha': 1.0,
+                                      'linewidths': 1,
+                                      'marker': 'o',
+                                      's': 80}
+        scatter_highlight_kwargs = format_kwarg_dictionaries(
+                                    default_kwargs=scatter_highlight_defaults,
+                                    user_kwargs=scatter_highlight_kwargs)
         ax.scatter(x_data,
                    y_data,
-                   c='',
-                   edgecolor='black',
-                   alpha=1.0,
-                   linewidths=1,
-                   marker='o',
-                   s=80)
+                   **scatter_highlight_kwargs)
+
+    if legend:
+        if dim > 2 and filler_feature_ranges is None:
+            pass
+        else:
+            handles, labels = ax.get_legend_handles_labels()
+            ax.legend(handles, labels,
+                      framealpha=0.3, scatterpoints=1, loc=legend)
 
     return ax
diff --git a/mlxtend/plotting/tests/test_decision_regions.py b/mlxtend/plotting/tests/test_decision_regions.py
@@ -79,3 +79,37 @@ def test_y_ary_dim():
                   'y must be a 1D array',
                   plot_decision_regions,
                   X[:, :2], y[:, np.newaxis], sr)
+
+
+def test_scatter_kwargs_type():
+    kwargs = 'not a dictionary'
+    sr.fit(X[:, :2], y)
+    message = ('d must be of type dict or None, but got '
+               '{} instead'.format(type(kwargs)))
+    assert_raises(TypeError,
+                  message,
+                  plot_decision_regions,
+                  X[:, :2], y, sr, scatter_kwargs=kwargs)
+
+
+def test_contourf_kwargs_type():
+    kwargs = 'not a dictionary'
+    sr.fit(X[:, :2], y)
+    message = ('d must be of type dict or None, but got '
+               '{} instead'.format(type(kwargs)))
+    assert_raises(TypeError,
+                  message,
+                  plot_decision_regions,
+                  X[:, :2], y, sr, contourf_kwargs=kwargs)
+
+
+def test_scatter_highlight_kwargs_type():
+    kwargs = 'not a dictionary'
+    sr.fit(X[:, :2], y)
+    message = ('d must be of type dict or None, but got '
+               '{} instead'.format(type(kwargs)))
+    assert_raises(TypeError,
+                  message,
+                  plot_decision_regions,
+                  X[:, :2], y, sr, X_highlight=X[:, :2],
+                  scatter_highlight_kwargs=kwargs)
diff --git a/mlxtend/utils/__init__.py b/mlxtend/utils/__init__.py
@@ -6,6 +6,7 @@
 
 from .counter import Counter
 from .testing import assert_raises
-from .checking import check_Xy
+from .checking import check_Xy, format_kwarg_dictionaries
 
-__all__ = ["Counter", "assert_raises", "check_Xy"]
+__all__ = ["Counter", "assert_raises", "check_Xy",
+           "format_kwarg_dictionaries"]
diff --git a/mlxtend/utils/checking.py b/mlxtend/utils/checking.py
@@ -36,3 +36,36 @@ def check_Xy(X, y, y_int=True):
     if y.shape[0] != X.shape[0]:
         raise ValueError('y and X must contain the same number of samples. '
                          'Got y: %d, X: %d' % (y.shape[0], X.shape[0]))
+
+
+def format_kwarg_dictionaries(default_kwargs=None, user_kwargs=None,
+                              protected_keys=None):
+    """Function to combine default and user specified kwargs dictionaries
+
+    Parameters
+    ----------
+    default_kwargs : dict, optional
+        Default kwargs (default is None).
+    user_kwargs : dict, optional
+        User specified kwargs (default is None).
+    protected_keys : array_like, optional
+        Sequence of keys to be removed from the returned dictionary
+        (default is None).
+
+    Returns
+    -------
+    formatted_kwargs : dict
+        Formatted kwargs dictionary.
+    """
+    formatted_kwargs = {}
+    for d in [default_kwargs, user_kwargs]:
+        if not isinstance(d, (dict, type(None))):
+            raise TypeError('d must be of type dict or None, but '
+                            'got {} instead'.format(type(d)))
+        if d is not None:
+            formatted_kwargs.update(d)
+    if protected_keys is not None:
+        for key in protected_keys:
+            formatted_kwargs.pop(key, None)
+
+    return formatted_kwargs
diff --git a/mlxtend/utils/tests/test_checking_inputs.py b/mlxtend/utils/tests/test_checking_inputs.py
@@ -5,20 +5,24 @@
 # License: BSD 3 clause
 
 from mlxtend.utils import assert_raises
-from mlxtend.utils import check_Xy
+from mlxtend.utils import check_Xy, format_kwarg_dictionaries
 import numpy as np
 import sys
 import os
 
 y = np.array([1, 2, 3, 4])
 X = np.array([[1., 2.], [3., 4.], [5., 6.], [7., 8.]])
 
+d_default = {'key1': 1, 'key2': 2}
+d_user = {'key3': 3, 'key4': 4}
+protected_keys = ['key1', 'key4']
 
-def test_ok():
+
+def test_check_Xy_ok():
     check_Xy(X, y)
 
 
-def test_invalid_type_X():
+def test_check_Xy_invalid_type_X():
     expect = "X must be a NumPy array. Found <class 'list'>"
     if (sys.version_info < (3, 0)):
         expect = expect.replace('class', 'type')
@@ -29,15 +33,15 @@ def test_invalid_type_X():
                   y)
 
 
-def test_float16_X():
+def test_check_Xy_float16_X():
     check_Xy(X.astype(np.float16), y)
 
 
-def test_float16_y():
+def test_check_Xy_float16_y():
     check_Xy(X, y.astype(np.int16))
 
 
-def test_invalid_type_y():
+def test_check_Xy_invalid_type_y():
     expect = "y must be a NumPy array. Found <class 'list'>"
     if (sys.version_info < (3, 0)):
         expect = expect.replace('class', 'type')
@@ -48,15 +52,15 @@ def test_invalid_type_y():
                   [1, 2, 3, 4])
 
 
-def test_invalid_dtype_X():
+def test_check_Xy_invalid_dtype_X():
     assert_raises(ValueError,
                   'X must be an integer or float array. Found object.',
                   check_Xy,
                   X.astype('object'),
                   y)
 
 
-def test_invalid_dtype_y():
+def test_check_Xy_invalid_dtype_y():
 
     if (sys.version_info > (3, 0)):
         expect = ('y must be an integer array. Found <U1. '
@@ -71,7 +75,7 @@ def test_invalid_dtype_y():
                   np.array(['a', 'b', 'c', 'd']))
 
 
-def test_invalid_dim_y():
+def test_check_Xy_invalid_dim_y():
     if sys.version_info[:2] == (2, 7) and os.name == 'nt':
         s = 'y must be a 1D array. Found (4L, 2L)'
     else:
@@ -83,7 +87,7 @@ def test_invalid_dim_y():
                   X.astype(np.integer))
 
 
-def test_invalid_dim_X():
+def test_check_Xy_invalid_dim_X():
     if sys.version_info[:2] == (2, 7) and os.name == 'nt':
         s = 'X must be a 2D array. Found (4L,)'
     else:
@@ -95,7 +99,7 @@ def test_invalid_dim_X():
                   y)
 
 
-def test_unequal_length_X():
+def test_check_Xy_unequal_length_X():
     assert_raises(ValueError,
                   ('y and X must contain the same number of samples. '
                    'Got y: 4, X: 3'),
@@ -104,10 +108,56 @@ def test_unequal_length_X():
                   y)
 
 
-def test_unequal_length_y():
+def test_check_Xy_unequal_length_y():
     assert_raises(ValueError,
                   ('y and X must contain the same number of samples. '
                    'Got y: 3, X: 4'),
                   check_Xy,
                   X,
                   y[1:])
+
+
+def test_format_kwarg_dictionaries_defaults_empty():
+    empty = format_kwarg_dictionaries()
+    assert isinstance(empty, dict)
+    assert len(empty) == 0
+
+
+def test_format_kwarg_dictionaries_protected_keys():
+    formatted_kwargs = format_kwarg_dictionaries(
+                            default_kwargs=d_default,
+                            user_kwargs=d_user,
+                            protected_keys=protected_keys)
+
+    for key in protected_keys:
+        assert key not in formatted_kwargs
+
+
+def test_format_kwarg_dictionaries_no_default_kwargs():
+    formatted_kwargs = format_kwarg_dictionaries(user_kwargs=d_user)
+    assert formatted_kwargs == d_user
+
+
+def test_format_kwarg_dictionaries_no_user_kwargs():
+    formatted_kwargs = format_kwarg_dictionaries(default_kwargs=d_default)
+    assert formatted_kwargs == d_default
+
+
+def test_format_kwarg_dictionaries_default_kwargs_invalid_type():
+    invalid_kwargs = 'not a dictionary'
+    message = ('d must be of type dict or None, but got '
+               '{} instead'.format(type(invalid_kwargs)))
+    assert_raises(TypeError,
+                  message,
+                  format_kwarg_dictionaries,
+                  default_kwargs=invalid_kwargs)
+
+
+def test_format_kwarg_dictionaries_user_kwargs_invalid_type():
+    invalid_kwargs = 'not a dictionary'
+    message = ('d must be of type dict or None, but got '
+               '{} instead'.format(type(invalid_kwargs)))
+    assert_raises(TypeError,
+                  message,
+                  format_kwarg_dictionaries,
+                  user_kwargs=invalid_kwargs)