Linear Regression

Linear regression is a linear approach to modeling the relationship between a dependent variable and one or more independent variables.

In a sample equation y = 5 + 4x:

x is a predictor independent variable
y is a predicted value
5 is a constant
4 is a coefficient value that multiplies a predictor x

Assumption for linear regression include:

there is a linear relationship between the dependent variables and the independent variables (regressors)
the error residuals are normally distributed and independent from each other
there is minimal multicollinearity between the independent variables
the variance around the regression line is the same for all values of the independent (predictor) variable

Mathematical Model

Linear regression determines a best fit of a linear function to a set of data points using a least squares approach using the elements shown below:

Python Example

To download the code below, click here.

"""
linear_regression_with_numpy.py
uses numpy built-in functions to perform linear regression
"""

# Import needed libraries.
import numpy as np
import matplotlib.pyplot as plotlib

# Define an input vector x (one dimensional array) for the independent variables.
x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

# Define an input vector y (one dimensional array) for the dependent variables.
y = np.array([2, 5, 1, 6, 8, 10, 9, 8, 11, 13])

# Get the number of observation points.
number_of_observations = np.size(x)

# Calculate mean values of the x and y vectors.
mean_x = np.mean(x)
mean_y = np.mean(y)

# Calculate the cross-deviation between the x and y vectors.
x_y_deviation = np.sum(y * x) - (number_of_observations * mean_y * mean_x)

# Calculate the deviation of the x vector.
x_deviation = np.sum(x * x) - (number_of_observations * mean_x * mean_x)

# Calculate least-squares regression coefficients.
b_1 = x_y_deviation / x_deviation
b_0 = mean_y - (b_1 * mean_x)

# Print the regression coefficients.
print("Regression Coefficients: " + str(b_0) + ', ' + str(b_1))

# Plot the x,y data points using the x and y vectors.
plotlib.scatter(x, y, color="r", marker="o", s=30)

# Calculate a y values vector for the regression line.
y_values_for_regression_line = b_0 + (b_1 * x)

# Plot the regression line using the vectors x and y_values_for_regression_line.
plotlib.plot(x, y_values_for_regression_line, color="b")

# Set the graph axis labels.
plotlib.xlabel('x')
plotlib.ylabel('y')

# Display the graph.
plotlib.show()

Results are shown below:

Regression Coefficients: (2.2, 1.1333333333333333)

Screen Shot 2020-04-13 at 10.55.15 AM.png

Python Example using SciKit Learn