Normalization

Normalization, also known as feature scaling, is the process of adjusting data values to fit in a prescribed range. The is done to make the Machine Learning process more efficient and accurate.

Numeric Values Normalization

Common forms of numeric value normalization include:

Coefficient of variation: calculates the ratio of the standard deviation to the mean
Min-Max: calculates relative values within a range, often [0, 1] or [-1, 1]
Standard score: calculates the number of standard deviations a number is from the group mean
Standardized moment: calculates the division of a moment of a probability distribution by an expression of the standard deviation
Studentized residual: calculates the division of a residual by an estimate of its standard deviation
t-statistic: calculates the ratio of the departure of the estimated value of a parameter from its hypothesized value to its standard error

Calendar Date Normalization

Single calendar dates (year/month/day) often have no meaning for training a machine learning model. One way to give them meaning is to calculate a time difference between a date and some other date. This time difference can then have meaning because it can be compared to other time differences in model training and inference processing data.

For example, a model to predict product returns by customers might use calls to customer care as one data feature. If the June 15th date of a call is used for a customer that returns a product on July 15th is used by itself, it isn’t comparable to a customer care call for a product return on August 15. But if the time interval between the call and return is used instead, the 30 day interval is comparable to other returns.

Normalization

Numeric Values Normalization

Calendar Date Normalization

References