Fits a multiple linear regression model using least squares.
Y Vector of length NOBS containing the dependent (response) variable. (Input)
X NOBS by NIND matrix containing the independent (explanatory) variables. (Input)
B Vector of
length INTCEP
+ NIND
containing a least-squares solution for the regression coefficients. (Output)
For INTCEP
= 0, the fitted value for observation I is
B(1) * X(I, 1) + B(2) * X(I, 2) +
+ B(NIND)
* X(I, NIND).
For INTCEP
= 1, the fitted value for observation I is
B(1) + B(2) * X(I, 1) +
+ B(NIND
+ 1) * X(I, NIND).
NOBS Number of
observations. (Input)
Default: NOBS = size (Y,1).
NIND Number of
independent (explanatory) variables. (Input)
Default: NIND = size (X,2).
LDX Leading
dimension of X exactly as specified in the dimension
statement in the calling program. (Input)
Default: LDX = size (X,1).
INTCEP
Intercept option. (Input)
Default: INTCEP = 1.
INTCEP Action
0 An intercept is not in the model.
1 An intercept is in the model.
SST Total sum
of squares. (Output)
If INTCEP
= 1, the total sum of squares is corrected for the mean.
SSE Sum of squares for error. (Output)
Generic: CALL RLSE (Y, X, B [, ])
Specific: The specific interface names are S_RLSE and D_RLSE.
Single: CALL RLSE (NOBS, Y, NIND, X, LDX, INTCEP, B, SST, SSE)
Double: The double precision name is DRLSE.
Routine RLSE fits a multiple linear regression model with or without an intercept. If INTCEP = 1, the multiple linear regression model is
where the observed values of the yis (input in Y)
constitute the responses or values of the dependent variable, the xi1s, xi2s,
, xiks (input in X)
are the settings of the k (input in NIND)
independent variables, β0, β1,
, βk are the regression
coefficients whose estimated values are output in B,
and the ɛ
is are independently distributed normal errors each with mean
zero and variance σ2. Here, n is the number of valid
rows in the augmented matrix (X,
Y),
i.e. n equals
NOBS
− NRMISS
(the number of rows that do not contain NaN). If INTCEP
= 0, β0 is not included in the
model.
Routine RLSE computes estimates of the regression coefficients by minimizing the sum of squares of the deviations of the observed response yi from the fitted response
for the n observations. This minimum sum of squares (the error sum of squares) is output and denoted by
In addition, the total sum of squares is output. For the case, INTCEP = 1, the total sum of squares is the sum of squares of the deviations of yi from its mean
the so-called corrected total sum of squares; it is denoted by
For the case INTCEP = 0, the total sum of squares is the sum of squares of yithe so-called uncorrected total sum of squares; it is denoted by
See Draper and Smith (1981) for a good general treatment of the multiple linear regression model, its analysis, and many examples.
In order to compute a least-squares solution, RLSE performs an orthogonal reduction of the matrix of regressors to upper triangular form. If the user needs the upper triangular matrix output for subsequent computing, the routine R2SE can be invoked in place of RLSE. (See the description of R in Comment 1). The reduction is based on one pass through the rows of the augmented matrix (X, Y) using fast Givens transformations. (See routines SROTMG and SROTM Golub and Van Loan, 1983, pages 156-162, Gentleman, 1974.) This method has the advantage that the loss of accuracy resulting from forming the crossproduct matrix used in the normal equations is avoided.
With INTCEP
= 1, the current means of the dependent and independent variables are used to
internally center the data for improved accuracy. Let xj be a column vector
containing the j-th row of data for the independent variables. Let represent the mean vector for the independent variables
given the data for rows 1, 2,
, i. The
current mean vector is defined to be
The i-th row of data has subtracted from it and is then weighted by i/(i − 1). Although a
crossproduct matrix is not computed, the validity of this centering operation
can be seen from the following formula for the sum of squares and crossproducts
matrix:
An orthogonal reduction on the centered matrix is computed. When the final computations are performed, the first row of R and the first element of B are updated so that they reflect the statistics for the original (uncentered) data. This means that the estimate of the intercept and the R matrix are for the uncentered data.
As part of the final computations, RLSE checks for linearly dependent regressors. If the i-th regressor is a linear combination of the first i − 1 regressors, the i-th diagonal element of R is close to zero (exactly zero if infinite precision arithmetic could be used) prior to the final computations. In particular, linear dependence of the regressors is declared if any of the following three conditions is satisfied:
A regressor equals zero.
Two or more regressors are constant.
The result of
is less than or equal to 100 Χ ɛ where ɛ is the machine epsilon. (For RLSE, ɛ = AMACH(4) and for DRLSE, ɛ = DMACH(4). See routines AMACH and DMACH in Reference Material). Here, Ri 1, 2, ,i−1 is the multiple correlation coefficient of the i-th independent variable with the first i − 1 independent variables. If no intercept is in the model (INTCEP = 0), the multiple correlation coefficient is computed without adjusting for the mean.
On completion of the final computations, if the i-th regressor is declared
to be linearly dependent upon the previous i − 1 regressors,
then the i-th
element of B
and all elements in the i-th row of R
are set to zero. Finally, if a linear dependence is declared, an informational
(error) message,
code 1, is issued indicating the model is not full
rank.
1. Workspace may be explicitly provided, if desired, by use of R2SE/DR2SE. The reference is:
CALL R2SE (NOBS, Y, NIND, X, LDX, INTCEP, B, SST, SSE, R, LDR, DFE, NRMISS, WK)
The additional arguments are as follows:
R INTCEP + NIND by INTCEP + NIND upper triangular matrix containing the
R matrix from a QR decomposition of the matrix of
regressors. (Output)
All of the diagonal element of R are taken to be nonnegative. The
rank of the matrix of regressors is the number of positive diagonal elements,
which equals NOBS − NRMISS − DFE.
LDR Leading dimension of R exactly as specified in the dimension statement in the calling program. (Input)
DFE Degrees of freedom for error. (Output)
NRMISS Number of rows in the
augmented matrix (X, Y) containing NaN
(not a number). (Output)
If a row contains NaN, that row is
excluded from all other computations.
WK Work vector of length 5 * NIND + 4 * INTCEP + 2.
2. Informational error
Type Code
3 1 The model is not full rank. There is not a unique least-squares solution. If the I-th diagonal element of R is zero, B(I) is set to zero in order to compute a solution.
A regression model
yi= β0 + β1xi1 + β2xi2 + β3xi3 + ɛ i i = 1,2, , 9
is fitted to data taken from Maindonald (1984, pages 203−204).
USE RLSE_INT
USE WRRRN_INT
USE UMACH_INT
IMPLICIT NONE
INTEGER INTCEP, LDX, NCOEF, NIND, NOBS, J
PARAMETER (INTCEP=1, NIND=3, NOBS=9, LDX=NOBS, &
NCOEF=INTCEP+NIND)
!
INTEGER NOUT
REAL B(NCOEF), SSE, SST, X(LDX,NIND), Y(NOBS)
!
DATA (X(1,J),J=1,NIND)/ 7.0, 5.0, 6.0/, Y(1)/ 7.0/
DATA (X(2,J),J=1,NIND)/ 2.0, -1.0, 6.0/, Y(2)/-5.0/
DATA (X(3,J),J=1,NIND)/ 7.0, 3.0, 5.0/, Y(3)/ 6.0/
DATA (X(4,J),J=1,NIND)/-3.0, 1.0, 4.0/, Y(4)/ 5.0/
DATA (X(5,J),J=1,NIND)/ 2.0, -1.0, 0.0/, Y(5)/ 5.0/
DATA (X(6,J),J=1,NIND)/ 2.0, 1.0, 7.0/, Y(6)/-2.0/
DATA (X(7,J),J=1,NIND)/-3.0, -1.0, 3.0/, Y(7)/ 0.0/
DATA (X(8,J),J=1,NIND)/ 2.0, 1.0, 1.0/, Y(8)/ 8.0/
DATA (X(9,J),J=1,NIND)/ 2.0, 1.0, 4.0/, Y(9)/ 3.0/
!
CALL RLSE (Y, X, B, SST=SST, SSE=SSE)
CALL WRRRN ('B', B)
CALL UMACH (2, NOUT)
WRITE (NOUT,*)
WRITE (NOUT,99999) 'SST = ', SST, ' SSE = ', SSE
99999 FORMAT (A7, F7.2, A7, F7.2)
END
B
1 7.733
2
-0.200
3 2.333
4 -1.667
SST = 156.00
SSE = 4.00
A weighted least-squares fit is computed using the model
yi= β0 + β1xi1 + β2xi2 + ɛ i i = 1, 2, , 4
and weights 1/i2 discussed by Maindonald (1984, pages 67 - 68). In order to compute the weighted least-squares fit, using an ordinary least squares routine (RLSE), the regressors (including the column of ones for the intercept term as well as the independent variables) and the responses must be transformed prior to invocation of RLSE. The transformed regressors and responses can be computed by using routine SHPROD (IMSL MATH/LIBRARY). For the i-th case the corresponding response and regressors are multiplied by a square root of the i-th weight. Because the column of ones corresponding to the intercept term in the untransformed model, is transformed by the weights, this transformed column of ones must be input to the least squares subroutine as an additional independent variable along with the option INTCEP = 0.
In terms of the original, untransformed regressors and responses, the minimum sum of squares for error output in SSE is
where here the weight wi = 1/i2. Also, since INTCEP = 0, the uncorrected total sum of squares is output in SST. In terms of the original untransformed responses,
USE RLSE_INT
USE SHPROD_INT
USE WRRRN_INT
USE UMACH_INT
IMPLICIT NONE
INTEGER INTCEP, LDX, NCOEF, NIND, NOBS, J
PARAMETER (INTCEP=0, NIND=3, NOBS=4, LDX=NOBS, &
NCOEF=INTCEP+NIND)
!
INTEGER I, NOUT
REAL B(NCOEF), SQRT, SSE, SST, W(NOBS), X(LDX,NIND), &
Y(NOBS)
INTRINSIC SQRT
!
DATA (X(1,J),J=1,NIND)/1.0, -2.0, 0.0/, Y(1)/-3.0/
DATA (X(2,J),J=1,NIND)/1.0, -1.0, 2.0/, Y(2)/ 1.0/
DATA (X(3,J),J=1,NIND)/1.0, 2.0, 5.0/, Y(3)/ 2.0/
DATA (X(4,J),J=1,NIND)/1.0, 7.0, 3.0/, Y(4)/ 6.0/
!
DO 10 I=1, NOBS
! Assign weights
W(I) = 1.0/I**2
! Store square roots of weights
W(I) = SQRT(W(I))
10 CONTINUE
! Transform regressors
DO 20 J=1, NIND
CALL SHPROD (NOBS, W, 1, X(:,J), 1, X(:,J), 1)
20 CONTINUE
! Transform response
CALL SHPROD (NOBS, W, 1, Y, 1, Y, 1)
!
CALL RLSE (Y, X, B, INTCEP=INTCEP, SST=SST, SSE=SSE)
!
CALL WRRRN ('B', B)
CALL UMACH (2, NOUT)
WRITE (NOUT,*)
WRITE (NOUT,99999) 'SST = ', SST, ' SSE = ', SSE
99999 FORMAT (A7, F7.2, A7, F7.2)
END
B
1 -1.431
2
0.658
3 0.748
SST = 11.94 SSE
= 1.01
PHONE: 713.784.3131 FAX:713.781.9260 |