Skip to content

Multiple linear regression via least squares #522

Open
@Beliavsky

Description

@Beliavsky

Multiple linear regression via least squares is a core method in statistics and should be considered for stdlib. In a Fortran program one will often want to compute many regressions, that could be related, for example a set of regressions where predictor variables successively added or removed or where observations are added or removed. (For a single regression it will always be easier to use R or some other statistical software than Fortran.)

Some codes and references are

  1. Alan Miller's regression and subset selection codes. Miller was an expert on this subject, having authored the book Subset Selection in Regression (2002). His regression codes are meant for interactive use, and I have had some trouble modifying them for batch use. His lsq.f90 has many public SAVEd variables, which I would like to avoid.
  2. John Burkardt's qr_solve. This is GPL licensed and can be used for ideas but not code.
  3. John Monahan's Fortran codes from his book Numerical Methods of Statistics (2nd ed.)
  4. Compare computational methods for least squares regression is an article by a SAS researcher comparing the speeds of various methods. Ideally a Fortran code would have the "sweep" algorithm he mentions and which is in Monahan's codes.
  5. Algorithm Cross product #686 of ACM TOMS is a Fortran code for updating the QR decomposition of a matrix
  6. Lapack has linear least squares codes using the QR decomposition and SVD.
  7. Numerical Linear Algebra in Statistical Computing (1987) by Nicholas J. Higham and G. W. Stewart discusses why using the normal equations to fit regression coefficients is often acceptable

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions