Scaling techniques
When working with datasets, features can have vastly different scales. For instance, a feature representing age may range from 0 to 100, while another feature representing income could range from 0 to 100,000. Many ML algorithms, such as KNN and gradient descent-based methods (e.g., linear regression), are sensitive to these differences in scale. Therefore, scaling helps ensure that no single feature dominates the learning process. This recipe covers the three most commonly used scaling techniques in ML.
The following are key concepts. It is worth noting that sometimes these two terms are used interchangeably, but they are not the same and should not be implemented as such!
- Standardization (Z-score transformation) changes the data to have a mean of 0 and a standard deviation of 1
- Normalization changes the range of the data distribution so values fall between 0 and 1
Getting ready
We will use the previously defined iterative_imputed_df DataFrame...