Fits a multivariate linear regression model via fast Givens transformations.
X |NROW| by NCOL matrix containing the data. (Input)
IIND Independent variable option. (Input)
IIND Meaning
< 0 The first −IIND columns of X contain the independent (explanatory) variables.
> 0 The IIND independent variables are specified by the column numbers in INDIND.
= 0 There are no independent variables.
The regressors are the intercept (if INTCEP = 1) and the independent variables. There are INTCEP + |IIND| regression coefficients for each dependent variable.
INDIND Index
vector of length IIND
containing the column numbers of X that are the
independent variables. (Input, if IIND
is positive)
If IIND
is nonpositive, INDIND is not
referenced and can be a vector of length one.
IDEP Dependent variable option. (Input)
IDEP Meaning
< 0 The last −IDEP columns of X contain the dependent (response) variables. That is, columns NCOL + IDEP + 1, NCOL + IDEP + 2, , NCOL contain the dependent variables.
> 0 The IDEP dependent (response) variables are specified by the column numbers in INDDEP.
= 0 There are no dependent variables. (Generally, this option is not used. The R matrix from a QR decomposition of a matrix of regressors is computed.)
INDDEP Index
vector of length IDEP
containing the column numbers of X that are the
dependent variables. (Input, if IDEP
is positive)
If IDEP
is nonpositive, INDDEP is not
referenced and can be a vector of length one.
B INTCEP + |IIND| by |IDEP| matrix containing a least-squares solution
for the regression coefficients on return from the final invocation of this
routine. (Output, if IDO
= 0 or 1; input/output, if IDO
= 2 or 3)
If INTCEP
= 1, row 1 is for the intercept. Row INTCEP
+ I is for the
I-th independent
variable. Column j is for the j-th dependent variable.
IDO |
Action |
1 or 2 |
A current least-squares solution is given by a solution x to the equation Rx = B. |
0 or 3 |
A least-squares solution for the regression coefficients is returned in B. Elements of the appropriate row(s) of B are set to 0.0 if linear dependence of the regressors is declared. |
If IDEP = 0, B is not referenced and can be a vector of length 1.
IDO Processing
option. (Input)
Default: IDO
= 0.
IDO Action
0 This is the only invocation of RGIVN for this data set, and all the data are input at once.
1 This is the first invocation, and additional calls to RGIVN will be made. Initialization and updating for the data in X are performed.
2 This is an intermediate invocation of RGIVN, and updating for the data in X is performed.
3 This is the final invocation of this routine. Updating for the data in X and wrap-up computations are performed.
NROW The
absolute value of NROW
is the number of rows of data currently input in X.
(Input)
NROW
may be positive, zero, or negative. Negative NROW
means that the −NROW
rows of data are to be deleted from some aspects of the analysis, and this
should be done only if IDO is 2 or 3 and the
wrap-up computations have not been performed. When a negative value is input for
NROW,
it is assumed that each of the −NROW
rows of X
has been input (with positive NROW)
in previous invocations of RGIVN. Use of negative
values of NROW
should be made with care and with the understanding that XMIN and XMAX cannot be updated
properly in this case. It is also possible that a constant variable in the
remaining data will not be recognized as such.
Default: NROW
= size (X,1).
NCOL Number of
columns in X.
(Input)
Default: NCOL
= size (X,2).
LDX Leading
dimension of X
exactly as specified in the dimension statement in the calling
program. (Input)
Default: LDX
= size (X,1).
INTCEP
Intercept option. (Input)
Default: INTCEP
= 1.
INTCEP Action
0 An intercept is not in the model.
1 An intercept is in the model.
IFRQ Frequency
option. (Input)
IFRQ = 0 means that
all frequencies are 1.0. For positive IFRQ, column number
IFRQ of X
contains the frequencies. If X(I, IFRQ) = 0.0, none of
the remaining elements of row I of X
are referenced, and updating of statistics is skipped for row I.
Default: IFRQ
= 0.
IWT Weighting
option. (Input)
IWT = 0 means that all
weights are 1.0. For positive IWT, column number
IWT of X contains the
weights.
Default: IWT
= 0.
ICEN Data
centering option. (Input)
If INTCEP
= 0, ICEN must
equal 0.
Default: ICEN
= 1.
ICEN Action
0 No centering. This option should be used when (1) the data are already centered; (2) there is no intercept in the model; or (3) the independent variables for a large percentage of the data are zero, and sparsity of the problem needs to be preserved in order that the Givens rotations are performed quickly.
1 Variables are centered using the method of provisional means for improved accuracy of the computations. The final estimate for the intercept and the R matrix are given for the uncentered data. This option is generally recommended.
TOL Tolerance
used in determining linear dependence. (Input)
For RGIVN, TOL
= 100 * AMACH(4)
is a common choice. See the documentation for routine AMACH
in Reference Material.
Default: TOL
= 1.e-5 for single precision and 2.D-14 for double precision.
LDB Leading
dimension of B
exactly as specified in the dimension statement in the calling
program. (Input)
Default: LDB
= size (B,1).
R INTCEP + |IIND| by INTCEP + |IIND| upper triangular matrix containing the R matrix from a QR decomposition of the matrix of regressors on return from the final invocation of this routine. (Output, if IDO = 0 or 1; input/output, if IDO = 2 or 3)
IDO |
Action |
1 or 2 |
The current matrix of raw sums of squares and crossproducts for the regressors can be found as RT · diag(D) · R where diag(D) is the diagonal matrix whose diagonal elements are the elements of the vector D. |
0 or 3 |
The matrix of raw sums of squares and crossproducts for the regressors can be found as RT R. Elements of the appropriate row(s) of R are set to 0.0 if linear dependence of the regressors is declared. |
LDR Leading
dimension of R
exactly as specified in the dimension statement in the calling
program. (Input)
Default: LDR
= size (R,1).
D Vector of length INTCEP + |IIND| containing scale factors for fast Givens transformations. (Output, if IDO = 0 or 1; input/output, if IDO = 2 or 3)
IDO |
Action |
1 or 2 |
D contains the current scale factors associated with the fast Givens transformations. |
0 or 3 |
Each element of D is set to 1.0. |
IRANK Rank of
R. (Output, if
IDO
= 0 or 3)
IRANK less than INTCEP
+ |IIND|
indicates linear dependence of the regressors was declared.
DFE Degrees of
freedom for error on return from the final invocation of this
routine. (Output, if IDO = 0 or 1;
input/output, if IDO = 2 or 3)
Prior to the final invocation of RGIVN, DFE is the sum of the
frequencies.
SCPE |IDEP|
by |IDEP|
matrix containing error (residual) sums of squares and
crossproducts. (Output, if IDO = 0 or 1;
input/output, if IDO = 2 or 3)
SCPE(m,
n) contains the current sum of
crossproducts of residuals for the m-th and n-th dependent variables. If IDEP
= 0, SCPE is not
referenced and can be a 1 by 1 array.
LDSCPE Leading
dimension of SCPE exactly as
specified in the dimension statement in the calling program.
(Input)
Default: LDSCPE
= size (SCPE,1).
NRMISS Number
of rows of data encountered in calls to RGIVN that contain any
missing values for the independent, dependent, weight, or frequency
variables. (Output, if
IDO = 0 or 1;
input/output, if IDO = 2 or 3)
NaN
(not a number) is used as the missing value code. Any row of X containing NaN as a
value of the independent, dependent, weight, or frequency variables is omitted
from the analysis.
XMIN Vector of length INTCEP + |IIND| containing the minimum values for each of the regressors. (Output, if IDO = 0 or 1; input/output, if IDO = 2 or 3)
XMAX Vector of length INTCEP + |IIND| containing the maximum values for each of the regressors. (Output, if IDO = 0 or 1; input/output, if IDO = 2 or 3)
Generic: CALL RGIVN (X, IIND, INDIND, IDEP, INDDEP, B [, ])
Specific: The specific interface names are S_RGIVN and D_RGIVN.
Single: CALL RGIVN (IDO, NROW, NCOL, X, LDX, INTCEP, IIND, INDIND, IDEP, INDDEP, IFRQ, IWT, ICEN, TOL, B, LDB, R, LDR, D, IRANK, DFE, SCPE, LDSCPE, NRMISS, XMIN, XMAX)
Double: The double precision name is DRGIVN.
Routine RGIVN fits a multivariate linear regression model. (See the chapter introduction for a description of the multivariate linear regression model.) The routine RGIVN is designed so that multiple invocations can be made. In this case, zero, one, or several rows of the data set can be input for each invocation of RGIVN (with IDO = 1, 2, 2, , 2, 3). Alternatively, one invocation of RGIVN (with IDO = 0) can be made with the entire data set contained in X. Routine RSTAT can be invoked after the wrap-up computations are performed by RGIVN to compute and print summary statistics related to the fitted regression.
Routine RGIVN performs an orthogonal reduction of the matrix of regressors to upper triangular form. The reduction is based on fast Givens transformations. (See routines SROTMG and SROTM, Golub and Van Loan 1983, pages 156-162, Gentleman 1974.) This method has two main advantages: (1) the loss of accuracy resulting from forming the crossproduct matrix used in the normal equations is avoided, (2) data can be conveniently added or deleted to take advantage of the previous computations performed.
With ICEN
= 1, the current means of the independent and dependent variables are used to
center the data for improved accuracy. Let xi be a column vector
containing the i-th row of data for the
independent variables. Let represent the mean
vector for the independent variables given the data for observations 1, 2,
, i. The mean vector is
defined to be
where the wjs and fjs are the weights and
frequencies, respectively. The i-th row of data has subtracted from it, and then wifi is multiplied by the
factor ai/ai-1 where
Although a crossproduct matrix is not computed, the validity of this centering operation can be seen from the following formula for the sum of squares and crossproducts matrix:
An orthogonal reduction on the centered matrix is computed.
When wrap-up computations
(IDO
= 3 or IDO=
0) are performed, the first rows of R and B are updated so that
they reflect the statistics for the original (uncentered) data. This means that
the estimate of the intercept and the R matrix are for the uncentered
data.
If the i-th regressor is a linear combination of the first i − 1 regressors, the i-th diagonal element of R will be close to zero (exactly zero if infinite precision arithmetic could be used) prior to the wrap-up computations. When performing the wrap-up computations, RGIVN checks sequentially for linear dependent regressors. Linear dependence of the regressors is declared if any of the following three conditions is satisfied:
A regressor equals zero, as determined from XMIN and XMAX.
Two
or more regressors are constant, as determined from
XMIN
and XMAX.
is less than or equal to TOL. Here, Ri 1,2, ,i−1 is the multiple correlation coefficient of the i-th independent variable with the first i − 1 independent variables. If no intercept is in the model (INTCEP = 0) the multiple correlation coefficient is computed without adjusting for the mean.
When a dependence is declared, R is changed in the wrap-up
computations so as to reflect the deletion of the i-th regressor from the
model. On completion of the wrap-up computations, if the
i-th regressor is declared
to be dependent upon the previous i − 1 regressors,
then the R and matrices will have all elements in their i-th rows set to zero.
1. Workspace may be explicitly provided, if desired, by use of R2IVN/DR2IVN. The reference is:
CALL R2IVN (IDO, NROW, NCOL, X, LDX, INTCEP, IIND, INDIND, IDEP, INDDEP, IFRQ, IWT, ICEN, TOL, B, LDB, R, LDR, D, IRANK, DFE, SCPE, LDSCPE, NRMISS, XMIN, XMAX, WK)
The additional argument is:
WK Work vector of length INTCEP + |IIND| + |IDEP|
2. Informational errors
Type Code
4 1 Negative weight encountered.
4 2 Negative frequency encountered.
The first example uses a data set from Draper and Smith (1981, pages 629-630). This data set is put into the matrix X by routine GDATA in Chapter 19, Utilities. There is 1 dependent variable and 4 independent variables. RGIVN is invoked to fit the regression model with the IDO = 0 option, so all computations are performed in one call.
USE RGIVN_INT
USE GDATA_INT
USE WRRRN_INT
USE UMACH_INT
IMPLICIT NONE
INTEGER LDB, LDCOEF, LDR, LDSCPE, LDX, NCOEF, NCOL, NDEP, NRX
PARAMETER (LDSCPE=1, NCOEF=5, NCOL=5, NDEP=1, NRX=13, &
LDB=NCOEF, LDCOEF=NCOEF, LDR=NCOEF, LDX=NRX)
!
INTEGER I, IDEP, IIND, INDDEP(1), INDIND(1), &
IRANK, NOBS, NOUT, NRMISS, NVAR
REAL B(LDB,NDEP), D(NCOEF), DFE, R(LDR,NCOEF), &
SCPE(LDSCPE,NDEP), X(LDX,NCOL), XMAX(NCOEF), &
XMIN(NCOEF)
!
CALL GDATA (5, X, NOBS, NVAR)
!
IIND = -4
IDEP = -1
CALL RGIVN (X, IIND, INDIND, IDEP, INDDEP, B, r=r, d=d, &
irank=irank, dfe=dfe, scpe=scpe, nrmiss=nrmiss, &
xmin=xmin, xmax=xmax)
!
CALL WRRRN ('B', B)
CALL WRRRN ('R', R)
CALL UMACH (2, NOUT)
WRITE (NOUT,*)
WRITE (NOUT,*) 'Regressor XMIN XMAX'
DO 10 I=1, NCOEF
WRITE (NOUT,'(1X,I5,2X,2F9.1)') I, XMIN(I), XMAX(I)
10 CONTINUE
WRITE (NOUT,*) ' '
WRITE (NOUT,*) 'IRANK = ', IRANK
WRITE (NOUT,*) 'DFE = ', DFE, ' SCPE(1,1) = ', SCPE(1,1)
WRITE (NOUT,*) 'NRMISS = ', NRMISS
END
B
1
62.41
2 1.55
3
0.51
4 0.10
5
-0.14
R
1 2
3 4
5
1 3.6 26.9
173.6 42.4 108.2
2
0.0 20.4 12.3 -18.3
-14.2
3 0.0
0.0 52.5 1.1
-54.6
4 0.0
0.0 0.0 12.5
-12.9
5 0.0
0.0 0.0
0.0
3.4
Regressor
XMIN XMAX
1
1.0
1.0
2 1.0
21.0
3 26.0
71.0
4 4.0
23.0
5 6.0
60.0
IRANK = 5
DFE = 8.00000
SCPE(1,1) = 47.8637
NRMISS = 0
The data for the second example are taken from Maindonald (1984, pages 203−204). The data are saved in the matrix X. Here, the data are input into RGIVN a row at a time. The data set is small for clarity. However, the approach is generally useful when the data set is large and the entire data set cannot be stored in X. A multivariate regression model containing two dependent variables and three independent variables is fit.
USE RGIVN_INT
USE WRRRN_INT
USE UMACH_INT
IMPLICIT NONE
INTEGER INTCEP, LDB, LDR, LDSCPE, LDX, NCOEF, NCOL, NDEP, &
NIND, NOBS, J
PARAMETER (INTCEP=1, NCOL=5, NDEP=2, NIND=3, NOBS=9, &
LDSCPE=NDEP, LDX=NOBS, NCOEF=INTCEP+NIND, LDB=NCOEF, &
LDR=NCOEF)
!
INTEGER I, IDEP, IDO, IIND, INDDEP(1), INDIND(1), IRANK, &
NOUT, NRMISS, NROW
REAL B(LDB,NDEP), D(NCOEF), DFE, R(LDR,NCOEF), &
SCPE(LDSCPE,NDEP), TOL, X(LDX,NCOL), XMAX(NCOEF), &
XMIN(NCOEF)
!
DATA (X(1,J),J=1,NCOL)/7.0, 5.0, 6.0, 7.0, 1.0/
DATA (X(2,J),J=1,NCOL)/2.0, -1.0, 6.0, -5.0, 4.0/
DATA (X(3,J),J=1,NCOL)/7.0, 3.0, 5.0, 6.0, 10.0/
DATA (X(4,J),J=1,NCOL)/-3.0, 1.0, 4.0, 5.0, 5.0/
DATA (X(5,J),J=1,NCOL)/2.0, -1.0, 0.0, 5.0, -2.0/
DATA (X(6,J),J=1,NCOL)/2.0, 1.0, 7.0, -2.0, 4.0/
DATA (X(7,J),J=1,NCOL)/-3.0, -1.0, 3.0, 0.0, -6.0/
DATA (X(8,J),J=1,NCOL)/2.0, 1.0, 1.0, 8.0, 2.0/
DATA (X(9,J),J=1,NCOL)/2.0, 1.0, 4.0, 3.0, 0.0/
!
NROW = 1
IIND = -NIND
IDEP = -NDEP
DO 10 I=1, 9
IF (I .EQ. 1) THEN
IDO = 1
ELSE IF (I .EQ. 9) THEN
IDO = 3
ELSE
IDO = 2
END IF
CALL RGIVN (X(I:I, 1:NCOL), IIND, INDIND, IDEP, INDDEP, &
B, IDO=IDO, R=R, D=D,IRANK=IRANK, DFE=DFE,&
SCPE=SCPE,NRMISS=NRMISS, xmin=xmin, xmax=xmax)
10 CONTINUE
!
CALL WRRRN ('B', B)
CALL WRRRN ('R', R)
CALL WRRRN ('SCPE', SCPE)
CALL UMACH (2, NOUT)
WRITE (NOUT,*)
WRITE (NOUT,*) 'Regressor XMIN XMAX'
DO 20 I=1, NCOEF
WRITE (NOUT,'(1X,I5,2X,2F9.1)') I, XMIN(I), XMAX(I)
20 CONTINUE
WRITE (NOUT,*)
WRITE (NOUT,*) 'IRANK = ', IRANK
WRITE (NOUT,*) 'DFE = ', DFE
WRITE (NOUT,*) 'NRMISS = ', NRMISS
END
B
1 2
1 7.733
-1.633
2 -0.200 0.400
3 2.333
0.167
4 -1.667 0.667
R
1 2
3 4
1
3.00 6.00 3.00
12.00
2 0.00 10.00
4.00 2.00
3 0.00
0.00 4.00 2.00
4
0.00 0.00 0.00
6.00
SCPE
1 2
1
4.0 20.0
2 20.0
110.0
Regressor XMIN
XMAX
1
1.0 1.0
2 -3.0
7.0
3
-1.0 5.0
4 0.0
7.0
IRANK = 4
DFE =
5.00000
NRMISS = 0
The data for the third example are taken from Maindonald (1984, pages 104−106). The constant regressor and the independent variables X1, X2, and X3 are linearly dependent
USE RGIVN_INT
USE WRRRN_INT
USE UMACH_INT
INTEGER INTCEP, LDB, LDR, LDSCPE, LDX, NCOEF, NCOL, NDEP, &
NIND, NOBS
PARAMETER (INTCEP=1, NCOL=5, NDEP=1, NIND=4, NOBS=9, &
LDSCPE=NDEP, LDX=NOBS, NCOEF=INTCEP+NIND, LDB=NCOEF,&
LDR=NCOEF)
!
INTEGER I, IDEP, IIND, INDDEP(1), INDIND(1), &
IRANK, NOUT, NRMISS, NROW
REAL B(LDB,NDEP), D(NCOEF), DFE, R(LDR,NCOEF), &
SCPE(LDSCPE,NDEP), TOL, X(LDX,NCOL), XMAX(NCOEF), &
XMIN(NCOEF)
!
DATA (X(1,J),J=1,NCOL)/-1.0, 0.0, -0.5, 1.0, 0.0/
DATA (X(2,J),J=1,NCOL)/3.0, 0.0, 3.5, 1.0, 0.0/
DATA (X(3,J),J=1,NCOL)/2.0, -2.0, 3.5, -2.0, -2.0/
DATA (X(4,J),J=1,NCOL)/-2.0, -1.0, -1.0, 1.0, 1.0/
DATA (X(5,J),J=1,NCOL)/-1.0, 1.0, -1.0, -1.0, -1.0/
DATA (X(6,J),J=1,NCOL)/3.0, 3.0, 2.0, 1.0, 3.0/
DATA (X(7,J),J=1,NCOL)/2.0, 2.0, 1.5, 2.0, 4.0/
DATA (X(8,J),J=1,NCOL)/-2.0, -1.0, -1.0, -1.0, -2.0/
DATA (X(9,J),J=1,NCOL)/2.0, 1.0, 2.0, 1.0, 3.0/
!
NROW = NOBS
IIND = -NIND
IDEP = -NDEP
CALL RGIVN (X, IIND, INDIND, IDEP, INDDEP, B, r=r, d=d, &
irank=irank, dfe=dfe, scpe=scpe, nrmiss=nrmiss, &
xmin=xmin, xmax=xmax)
!
CALL WRRRN ('B', B)
CALL WRRRN ('R', R)
CALL UMACH (2, NOUT)
WRITE (NOUT,*)
WRITE (NOUT,*) 'Regressor Minimum Maximum'
DO 10 I=1, NCOEF
WRITE (NOUT,'(1X,I5,2X,2F9.1)') I, XMIN(I), XMAX(I)
10 CONTINUE
WRITE (NOUT,*)
WRITE (NOUT,*) 'IRANK = ', IRANK
WRITE (NOUT,*) 'DFE = ', DFE, ' SCPE(1,1) = ', SCPE(1,1)
WRITE (NOUT,*) 'NRMISS = ', NRMISS
END
B
1
0.056
2 0.167
3 0.500
4
0.000
5
1.000
R
1 2
3 4
5
1 3.000 2.000 1.000
3.000 1.000
2 0.000 6.000
2.000 5.000 1.000
3 0.000
0.000 4.000 -2.000 2.000
4
0.000 0.000 0.000 0.000
0.000
5 0.000 0.000 0.000
0.000 3.000
Regressor Minimum
Maximum
1 1.0
1.0
2
-2.0 3.0
3 -2.0
3.0
4
-1.0 3.5
5 -2.0
2.0
IRANK = 4
DFE = 5.00000
SCPE(1,1) = 6.00000
NMISS = 0
PHONE: 713.784.3131 FAX:713.781.9260 |