Least Squares Regression is a fundamental technique in statistical modeling and data analysis used for fitting a model to observed data. The primary goal is to find a set of parameters that minimize the discrepancies (residuals) between the model’s predictions and the actual observed data. The "least squares" criterion is chosen because it leads to convenient mathematical properties and closed-form solutions, particularly for linear models.
In its simplest form, least squares regression is applied to linear regression, where we assume a linear relationship between a set of input variables (features)
Given a matrix of features
We model:
Objective: Minimize the Residual Sum of Squares (RSS):
The goal is to find
By setting the gradient of this objective with respect to
Provided
This
I. Set up the Problem:
Suppose we have observations
$$\hat{y}i = \sum{j=1}^n \beta_j x_{ij} = x_i^\top \beta,$$ or in matrix form:
II. Defining the Error to Minimize:
We define the residuals as:
The objective is to minimize:
III. Finding the Minimum:
To minimize with respect to
This implies:
IV. Solving the Normal Equation:
If
This formula provides a closed-form solution for the ordinary least squares estimator
I. Data Preparation:
- Construct your design matrix
$X$ by stacking the observations row-wise. - Each row corresponds to one observation and each column corresponds to one feature.
- Often, a column of ones is added to incorporate the intercept term.
- Construct the response vector
$Y$ from the observed target values.
II. Compute Matrices:
Compute
III. Check Invertibility:
- Ensure
$X^\top X$ is invertible (or use a pseudo-inverse if not). - If
$X^\top X$ is not invertible, it may be due to multicollinearity. Consider removing or combining features, or use regularization methods (like Ridge or Lasso).
IV. Solve for
V. Use the Model for Prediction:
For a new input
$$\hat{y}{\text{new}} = x{\text{new}}^\top \beta.$$
Given Data Points:
Step-by-step:
I. Add an intercept term:
II. Compute:
III. Invert
IV. Compute
Thus, the fitted line is:
- Closed-Form Solution: Provides an explicit formula for the optimal parameters, enabling direct interpretation.
- Efficient for Small Problems: Works well with relatively small datasets and few features.
- Foundational Method: Forms the basis for many advanced regression techniques and regularized models.
- Assumes Linearity: The method presupposes a linear relationship between features and output.
- Sensitive to Outliers: Squared errors emphasize large errors more heavily, making the model sensitive to outliers.
-
Invertibility Issues: If
$X^\top X$ is not invertible, the standard formula fails. Issues like multicollinearity require either dropping features, transformations, or using regularized regression variants.