-
-
Notifications
You must be signed in to change notification settings - Fork 46.8k
Reimplement polynomial_regression.py #8889
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
cclauss
merged 10 commits into
TheAlgorithms:master
from
tianyizheng02:polynomial-regression
Jul 28, 2023
Merged
Changes from all commits
Commits
Show all changes
10 commits
Select commit
Hold shift + click to select a range
9771eb4
Reimplement polynomial_regression.py
tianyizheng02 be948a3
updating DIRECTORY.md
88c4579
Merge branch 'TheAlgorithms:master' into polynomial-regression
tianyizheng02 bf38eab
Fix matrix formatting in docstrings
tianyizheng02 8aedc3f
Try to fix failing doctest
tianyizheng02 8af9d16
Debugging failing doctest
tianyizheng02 018deeb
Fix failing doctest attempt 2
tianyizheng02 b463d88
Remove unnecessary return value descriptions in docstrings
tianyizheng02 80fb8c6
Readd placeholder doctest for main function
tianyizheng02 1d72837
Fix typo in algorithm description
tianyizheng02 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,213 @@ | ||
""" | ||
Polynomial regression is a type of regression analysis that models the relationship | ||
between a predictor x and the response y as an mth-degree polynomial: | ||
|
||
y = β₀ + β₁x + β₂x² + ... + βₘxᵐ + ε | ||
|
||
By treating x, x², ..., xᵐ as distinct variables, we see that polynomial regression is a | ||
special case of multiple linear regression. Therefore, we can use ordinary least squares | ||
(OLS) estimation to estimate the vector of model parameters β = (β₀, β₁, β₂, ..., βₘ) | ||
for polynomial regression: | ||
|
||
β = (XᵀX)⁻¹Xᵀy = X⁺y | ||
|
||
where X is the design matrix, y is the response vector, and X⁺ denotes the Moore–Penrose | ||
pseudoinverse of X. In the case of polynomial regression, the design matrix is | ||
|
||
|1 x₁ x₁² ⋯ x₁ᵐ| | ||
X = |1 x₂ x₂² ⋯ x₂ᵐ| | ||
|⋮ ⋮ ⋮ ⋱ ⋮ | | ||
|1 xₙ xₙ² ⋯ xₙᵐ| | ||
|
||
In OLS estimation, inverting XᵀX to compute X⁺ can be very numerically unstable. This | ||
implementation sidesteps this need to invert XᵀX by computing X⁺ using singular value | ||
decomposition (SVD): | ||
|
||
β = VΣ⁺Uᵀy | ||
|
||
where UΣVᵀ is an SVD of X. | ||
|
||
References: | ||
- https://en.wikipedia.org/wiki/Polynomial_regression | ||
- https://en.wikipedia.org/wiki/Moore%E2%80%93Penrose_inverse | ||
- https://en.wikipedia.org/wiki/Numerical_methods_for_linear_least_squares | ||
- https://en.wikipedia.org/wiki/Singular_value_decomposition | ||
""" | ||
|
||
import matplotlib.pyplot as plt | ||
import numpy as np | ||
|
||
|
||
class PolynomialRegression: | ||
__slots__ = "degree", "params" | ||
|
||
def __init__(self, degree: int) -> None: | ||
""" | ||
@raises ValueError: if the polynomial degree is negative | ||
""" | ||
if degree < 0: | ||
raise ValueError("Polynomial degree must be non-negative") | ||
|
||
self.degree = degree | ||
self.params = None | ||
|
||
@staticmethod | ||
def _design_matrix(data: np.ndarray, degree: int) -> np.ndarray: | ||
""" | ||
Constructs a polynomial regression design matrix for the given input data. For | ||
input data x = (x₁, x₂, ..., xₙ) and polynomial degree m, the design matrix is | ||
the Vandermonde matrix | ||
|
||
|1 x₁ x₁² ⋯ x₁ᵐ| | ||
X = |1 x₂ x₂² ⋯ x₂ᵐ| | ||
|⋮ ⋮ ⋮ ⋱ ⋮ | | ||
|1 xₙ xₙ² ⋯ xₙᵐ| | ||
|
||
Reference: https://en.wikipedia.org/wiki/Vandermonde_matrix | ||
|
||
@param data: the input predictor values x, either for model fitting or for | ||
prediction | ||
@param degree: the polynomial degree m | ||
@returns: the Vandermonde matrix X (see above) | ||
@raises ValueError: if input data is not N x 1 | ||
|
||
>>> x = np.array([0, 1, 2]) | ||
>>> PolynomialRegression._design_matrix(x, degree=0) | ||
array([[1], | ||
[1], | ||
[1]]) | ||
tianyizheng02 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
>>> PolynomialRegression._design_matrix(x, degree=1) | ||
array([[1, 0], | ||
[1, 1], | ||
[1, 2]]) | ||
>>> PolynomialRegression._design_matrix(x, degree=2) | ||
array([[1, 0, 0], | ||
[1, 1, 1], | ||
[1, 2, 4]]) | ||
>>> PolynomialRegression._design_matrix(x, degree=3) | ||
array([[1, 0, 0, 0], | ||
[1, 1, 1, 1], | ||
[1, 2, 4, 8]]) | ||
>>> PolynomialRegression._design_matrix(np.array([[0, 0], [0 , 0]]), degree=3) | ||
Traceback (most recent call last): | ||
... | ||
ValueError: Data must have dimensions N x 1 | ||
""" | ||
rows, *remaining = data.shape | ||
if remaining: | ||
raise ValueError("Data must have dimensions N x 1") | ||
|
||
return np.vander(data, N=degree + 1, increasing=True) | ||
|
||
def fit(self, x_train: np.ndarray, y_train: np.ndarray) -> None: | ||
""" | ||
Computes the polynomial regression model parameters using ordinary least squares | ||
(OLS) estimation: | ||
|
||
β = (XᵀX)⁻¹Xᵀy = X⁺y | ||
|
||
where X⁺ denotes the Moore–Penrose pseudoinverse of the design matrix X. This | ||
function computes X⁺ using singular value decomposition (SVD). | ||
|
||
References: | ||
- https://en.wikipedia.org/wiki/Moore%E2%80%93Penrose_inverse | ||
- https://en.wikipedia.org/wiki/Singular_value_decomposition | ||
- https://en.wikipedia.org/wiki/Multicollinearity | ||
|
||
@param x_train: the predictor values x for model fitting | ||
@param y_train: the response values y for model fitting | ||
@raises ArithmeticError: if X isn't full rank, then XᵀX is singular and β | ||
doesn't exist | ||
|
||
>>> x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]) | ||
>>> y = x**3 - 2 * x**2 + 3 * x - 5 | ||
>>> poly_reg = PolynomialRegression(degree=3) | ||
>>> poly_reg.fit(x, y) | ||
>>> poly_reg.params | ||
array([-5., 3., -2., 1.]) | ||
>>> poly_reg = PolynomialRegression(degree=20) | ||
>>> poly_reg.fit(x, y) | ||
Traceback (most recent call last): | ||
... | ||
ArithmeticError: Design matrix is not full rank, can't compute coefficients | ||
|
||
Make sure errors don't grow too large: | ||
>>> coefs = np.array([-250, 50, -2, 36, 20, -12, 10, 2, -1, -15, 1]) | ||
>>> y = PolynomialRegression._design_matrix(x, len(coefs) - 1) @ coefs | ||
>>> poly_reg = PolynomialRegression(degree=len(coefs) - 1) | ||
>>> poly_reg.fit(x, y) | ||
>>> np.allclose(poly_reg.params, coefs, atol=10e-3) | ||
True | ||
""" | ||
X = PolynomialRegression._design_matrix(x_train, self.degree) # noqa: N806 | ||
_, cols = X.shape | ||
if np.linalg.matrix_rank(X) < cols: | ||
raise ArithmeticError( | ||
"Design matrix is not full rank, can't compute coefficients" | ||
) | ||
|
||
# np.linalg.pinv() computes the Moore–Penrose pseudoinverse using SVD | ||
self.params = np.linalg.pinv(X) @ y_train | ||
|
||
def predict(self, data: np.ndarray) -> np.ndarray: | ||
""" | ||
Computes the predicted response values y for the given input data by | ||
constructing the design matrix X and evaluating y = Xβ. | ||
|
||
@param data: the predictor values x for prediction | ||
@returns: the predicted response values y = Xβ | ||
@raises ArithmeticError: if this function is called before the model | ||
parameters are fit | ||
|
||
>>> x = np.array([0, 1, 2, 3, 4]) | ||
>>> y = x**3 - 2 * x**2 + 3 * x - 5 | ||
>>> poly_reg = PolynomialRegression(degree=3) | ||
>>> poly_reg.fit(x, y) | ||
>>> poly_reg.predict(np.array([-1])) | ||
array([-11.]) | ||
>>> poly_reg.predict(np.array([-2])) | ||
array([-27.]) | ||
>>> poly_reg.predict(np.array([6])) | ||
array([157.]) | ||
>>> PolynomialRegression(degree=3).predict(x) | ||
Traceback (most recent call last): | ||
... | ||
ArithmeticError: Predictor hasn't been fit yet | ||
""" | ||
if self.params is None: | ||
raise ArithmeticError("Predictor hasn't been fit yet") | ||
|
||
return PolynomialRegression._design_matrix(data, self.degree) @ self.params | ||
|
||
|
||
def main() -> None: | ||
tianyizheng02 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
""" | ||
Fit a polynomial regression model to predict fuel efficiency using seaborn's mpg | ||
dataset | ||
|
||
>>> pass # Placeholder, function is only for demo purposes | ||
""" | ||
import seaborn as sns | ||
|
||
mpg_data = sns.load_dataset("mpg") | ||
|
||
poly_reg = PolynomialRegression(degree=2) | ||
poly_reg.fit(mpg_data.weight, mpg_data.mpg) | ||
|
||
weight_sorted = np.sort(mpg_data.weight) | ||
predictions = poly_reg.predict(weight_sorted) | ||
|
||
plt.scatter(mpg_data.weight, mpg_data.mpg, color="gray", alpha=0.5) | ||
plt.plot(weight_sorted, predictions, color="red", linewidth=3) | ||
plt.title("Predicting Fuel Efficiency Using Polynomial Regression") | ||
plt.xlabel("Weight (lbs)") | ||
plt.ylabel("Fuel Efficiency (mpg)") | ||
plt.show() | ||
|
||
|
||
if __name__ == "__main__": | ||
import doctest | ||
|
||
doctest.testmod() | ||
|
||
main() |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.