Cross Decomposition
Cross decomposition algorithms are useful for finding relationships between two multivariate datasets. Examples are:
PLS is used to find the fundamental relations between two matrices (X and Y). PLS regression is particularly suited when the matrix of predictors has more variables than observations, and when there is multicollinearity among X values. By contrast, standard regression will fail in these cases (unless it is regularized).
In the Partial Least Squares example below, various levels of correlation between two datasets (train, test) is illustrated:
Mathematical Model
The decompositions of X and Y are made so as to maximize the covariance between two variables.
Python Example
To download the code below click here.
""" partial_least_squares_with_scikit_learn.py correlates and makes predictions on data in different dimension spaces """ # Import the scikit learn PLS module. from sklearn.cross_decomposition import PLSRegression # Define X and Y data. X = [[0., 0., 1.], [1., 0., 0.], [2., 2., 2.], [2., 5., 4.]] Y = [[0.1, -0.2], [0.9, 1.1], [6.2, 5.9], [11.9, 12.3]] # Instantiate a PLSRegression model. pls2 = PLSRegression(n_components=2) # Fit the model to the data. pls2.fit(X, Y) # Make a prediction based on new input data. X_new = [[2., 1., 1.], [2., 1., 0.], [5., 3., 2.], [1., 4., 2.]] Y_pred = pls2.predict(X_new) # Display the result. print(Y_pred)
Results are shown below:
[[ 4.3062782 4.24098373]
[ 3.11475114 3.00799304]
[12.05128289 12.11358741]
[ 6.87275405 6.98368783]]