Normalization
Normalization, also known as feature scaling, is the process of adjusting data values to fit in a prescribed range. The is done to make the Machine Learning process more efficient and accurate.
Numeric Values Normalization
Common forms of numeric value normalization include:
Coefficient of variation: calculates the ratio of the standard deviation to the mean
Min-Max: calculates relative values within a range, often [0, 1] or [-1, 1]
Standard score: calculates the number of standard deviations a number is from the group mean
Standardized moment: calculates the division of a moment of a probability distribution by an expression of the standard deviation
Studentized residual: calculates the division of a residual by an estimate of its standard deviation
t-statistic: calculates the ratio of the departure of the estimated value of a parameter from its hypothesized value to its standard error
Calendar Date Normalization
Single calendar dates (year/month/day) often have no meaning for training a machine learning model. One way to give them meaning is to calculate a time difference between a date and some other date. This time difference can then have meaning because it can be compared to other time differences in model training and inference processing data.
For example, a model to predict product returns by customers might use calls to customer care as one data feature. If the June 15th date of a call is used for a customer that returns a product on July 15th is used by itself, it isn’t comparable to a customer care call for a product return on August 15. But if the time interval between the call and return is used instead, the 30 day interval is comparable to other returns.