Hyperparameter tuning with search methods
Hyperparameter tuning is crucial for optimizing candidate ML models, and scikit-learn makes this process easier with a variety of built-in search methods. The library provides two popular methods, GridSearchCV() and RandomizedSearchCV(), in easy-to-implement APIs, along with their counterpart methods, that implement a successive halving approach to hyperparameter search.
scikit-learn also allows a manual approach to setting hyperparameters if you wish to adjust default values for your own training purposes: the set_params() and get_params() methods. The set_params() method allows users to adjust model hyperparameters programmatically, while get_params() retrieves the current hyperparameter settings. This functionality ensures flexibility when experimenting with different model configurations and can be paired with the techniques mentioned earlier for efficient tuning:
from sklearn.ensemble import RandomForestClassifier
# Create a RandomForestClassifier model
model = RandomForestClassifier()
# Set hyperparameters prior to training using set_params()
model.set_params(n_estimators=100, max_depth=10, random_state=42)
# Check the updated parameters
print(model.get_params())
# Output:
{'bootstrap': True, 'ccp_alpha': 0.0, 'class_weight': None, 'criterion': 'gini', 'max_depth': 10, 'max_features': 'sqrt', 'max_leaf_nodes': None, 'max_samples': None, 'min_impurity_decrease': 0.0, 'min_samples_leaf': 1, 'min_samples_split': 2, 'min_weight_fraction_leaf': 0.0, 'monotonic_cst': None, 'n_estimators': 100, 'n_jobs': None, 'oob_score': False, 'random_state': 42, 'verbose': 0, 'warm_start': False} As you can see, scikit-learn provides a detailed output of model hyperparameters that provide the best fit. This is something we can use in our model for training purposes.