Member-only story
A Gentle Introduction To Min-Max Data Normalization
And How To Apply It To Pandas Data Frames With Scikit-Learn
If you’d like to support my writing, consider buying a copy of my E-Book, JetPack SQL. It’s a comprehensive, 70+ page PDF for absolute beginners. Click to learn more: https://stan.store/datawithjon/p/elevate-your-sql-skills-elevate-your-career
Min-maxing is a statistical technique for re-scaling numerical values into a [0,1] range. For example, a series of album ratings scaled from 70 to 150 could be min-maxed so that every rating falls on or between 0 and 1, and the proportional distance between data points is retained. Min-maxing is a form of data normalization.
Data scientists often use min-maxing to convert features to the same scale before using those features to train machine learning models like those used for clustering and linear regression.
Generally speaking, min-maxing features before training ML models prevents features whose scale is large from overshadowing features whose scale is small. Re-scaling the features to match produces a better model.
In this article, I will walk through the fundamentals of the min-maxing mathematical formula and then demonstrate how to apply the technique to Pandas data frames using scikit-learn’s MinMaxScaler…