Open
Description
Multiple linear regression via least squares is a core method in statistics and should be considered for stdlib. In a Fortran program one will often want to compute many regressions, that could be related, for example a set of regressions where predictor variables successively added or removed or where observations are added or removed. (For a single regression it will always be easier to use R or some other statistical software than Fortran.)
Some codes and references are
- Alan Miller's regression and subset selection codes. Miller was an expert on this subject, having authored the book Subset Selection in Regression (2002). His regression codes are meant for interactive use, and I have had some trouble modifying them for batch use. His lsq.f90 has many public SAVEd variables, which I would like to avoid.
- John Burkardt's qr_solve. This is GPL licensed and can be used for ideas but not code.
- John Monahan's Fortran codes from his book Numerical Methods of Statistics (2nd ed.)
- Compare computational methods for least squares regression is an article by a SAS researcher comparing the speeds of various methods. Ideally a Fortran code would have the "sweep" algorithm he mentions and which is in Monahan's codes.
- Algorithm Cross product #686 of ACM TOMS is a Fortran code for updating the QR decomposition of a matrix
- Lapack has linear least squares codes using the QR decomposition and SVD.
- Numerical Linear Algebra in Statistical Computing (1987) by Nicholas J. Higham and G. W. Stewart discusses why using the normal equations to fit regression coefficients is often acceptable