Builds multiple linear regression models using forward selection, backward selection, or stepwise selection.
COV NVAR by NVAR matrix containing
the variance-covariance matrix or sum of squares and crossproducts
matrix. (Input)
Only the upper triangle of COV
is referenced.
NOBS Number of observations. (Input)
AOV Vector of length 13 containing statistics relating to the analysis of variance for the final model in this invocation. (Output)\
I |
AOV(I) |
1 |
Degrees of freedom for regression |
2 |
Degrees of freedom for error |
3 |
Total degrees of freedom |
4 |
Sum of squares for regression |
5 |
Sum of squares for error |
6 |
Total sum of squares |
7 |
Regression mean square |
8 |
Error mean square |
9 |
F-statistic |
10 |
p-value |
11 |
R2 (in percent) |
12 |
Adjusted R2 (in percent) |
13 |
Estimated standard deviation of the model error |
14 |
Mean of the response (dependent) variable |
15 |
Coefficient of variation (in percent) |
COEF NVAR
− 1 by 5
matrix containing statistics relating to the regression coefficients for the
final model in this invocation. (Output)
The rows correspond to
the NVAR
− 1
variables with LEVEL(I) nonnegative, i.e.,
all variables but the dependent variable. The rows are in the same order as the
variables in COV
except that the dependent variable is excluded. Each row corresponding to a
variable not in the model is for the model supposing the additional variable was
in the model.
Col. |
Description |
1 |
Coefficient estimate |
2 |
Estimated standard error of the coefficient estimate |
3 |
t-statistic for the test that the coefficient is zero |
4 |
p-value for the two-sided t test |
5 |
Variance inflation factor. The square of the multiple correlation coefficient for the I-th regressor after all others can be obtained from COEF(I, 5) by the formula 1.0 − 1.0/COEF(I, 5). |
COVS NVAR
by NVAR
matrix that results after COV
has been swept on the columns corresponding to the variables in the
model. (Output, if INVOKE = 0 or
1;input/output, if INVOKE = 2 or 3)
The estimated variance-covariance matrix of the estimated regression
coefficients in the final model can be obtained by extracting the rows and
columns of COVS corresponding to the
independent variables in the final model and multiplying the elements of this
matrix by AOV(8). If COV
is not needed, COV
and COVS can occupy the same
storage locations.
INVOKE
Invocation option. (Input)
Default: INVOKE = 0.
INVOKE |
Action |
0 |
This is the only invocation of RSTEP for this variancecovariance matrix. Initialization, stepping, and wrap-up computations are performed. |
1 |
This is the first invocation of RSTEP, and additional calls to RSTEP will be made. Initialization and stepping is performed. |
2 |
This is an intermediate invocation of RSTEP and stepping is performed. |
3 |
This is the final invocation of RSTEP and stepping is performed. |
NVAR Number of
variables. (Input)
Default: NVAR = size (COV,2).
LDCOV Leading
dimension of COV
exactly as specified in the dimension statement in the calling
program. (Input)
Default: LDCOV = size (COV,1).
LEVEL Vector of
length NVAR
containing levels of priority for variables entering and leaving the
regression. (Input)
LEVEL(I) = −1 means the
I-th variable is
the dependent variable. LEVEL(I) = 0 means the I-th variable is never
to enter into the model. Other variables must be assigned a positive value to
indicate their level of entry into the model. A variable can enter the model
only after all variables with smaller nonzero levels of entry have entered.
Similarly, a variable can only leave the model after all variables with higher
levels of entry have left. Variables with the same level of entry compete for
entry (deletion) at each step.
NFORCE
Variables with levels 1, 2,
, NFORCE are forced into
the model as the independent variables. (Input)
Default: NFORCE = 0.
NSTEP Step
length option. (Input)
For nonnegative NSTEP. NSTEP steps are taken.
NSTEP = −1 means stepping
continues until completion.
Default: NSTEP = -1.
ISTEP Stepping
option. (Input)
Default: ISTEP = -1.
ISTEP |
Action |
-1 |
An attempt is made to remove a variable from the model (backward step). A variable is removed if its p-value exceeds POUT. During initialization, all candidate independent variables enter the model. |
1 |
An attempt is made to add a variable to the model (forward step). A variable is added if its p-value is less than PIN. During initialization, only the forced variables enter the model. |
0 |
A backward step is attempted. If a variable is not removed, a forward step is attempted. This is a stepwise step. Only the forced variables enter the model during initialization. |
PIN Largest
p-value for entering variables. (Input)
Variables with
p-values less than PIN may enter the
model. A common choice is PIN = 0.05.
Default:
PIN = .05.
POUT Smallest
p-value for removing variables. (Input)
Variables with
p-values greater than POUT may leave the
model. POUT must
be greater or equal to PIN. A common choice
is POUT = 0.10
(or 2 * PIN).
Default:
POUT = .10.
TOL Tolerance
used in determining linear dependence. (Input)
TOL = 100 * AMACH (4) is a common
choice. See documentation for AMACH
in theReference Material.
Default: TOL = 1.e-5 for single
precision and 2.d 14 for double precision.
IPRINT Printing
option. (Input)
Default: IPRINT = 0.
IPRINT |
Action |
0 |
No printing is performed. |
1 |
Printing is performed on the final invocation. |
2 |
Printing is performed after each step and on the final invocation. |
SCALE Vector of length NVAR containing the initial diagonal entries in COV. (Output, if INVOKE = 0 or 1; input, if INVOKE = 2 or 3)
HIST Vector of length NVAR containing the recent history of variables. (Output, if INVOKE = 0 or 1; input/output, otherwise)
HIST(I) Meaning
k > 0 I-th variable was added to the model during the k-th step.
k < 0 I-th variable was deleted from the model during the k-th step.
0 I-th variable has never been in the model.
0.5 I-th variable was added into the model during initialization.
IEND Completion indicator. (Output)
IEND Meaning
0 Additional steps may be possible.
1 No additional steps are possible.
LDCOEF Leading
dimension of exactly as specified in the dimension statement in the calling
program. (Input)
Default: LDCOEF = size (COEF,1).
LDCOVS Leading
dimension of COVS exactly as specified
in the dimension statement in the calling program.
(Input)
Default: LDCOVS = size (COVS,1).
Generic: CALL RSTEP (COV, NOBS, AOV, COEF, COVS [, ])
Specific: The specific interface names are S_RSTEP and D_RSTEP.
Single: CALL RSTEP (INVOKE, NVAR, COV, LDCOV, LEVEL, NFORCE, NSTEP, ISTEP, NOBS, PIN, POUT, TOL, IPRINT, SCALE, HIST, IEND, AOV, COEF, LDCOEF, COVS, LDCOVS)
Double: The double precision name is DRSTEP.
Routine RSTEP builds a multiple linear regression model using forward selection, backward selection, or forward stepwise (with a backward glance) selection. The routine RSTEP is designed so that the user can monitor, and perhaps change, the variables added (deleted) to (from) the model after each step. In this case, multiple calls to RSTEP (with INVOKE = 1, 2, 2, ..., 3) are made. Alternatively, RSTEP can be invoked once (with INVOKE = 0) in order to perform the stepping until a final model is selected.
Levels of priority can be assigned to the candidate independent variables. All variables with a priority level of 1 must enter the model before any variable with a priority level of 2. Similarly, variables with a level of 2 must enter before variables with a level of 3, etc.
Variables can also be forced into the model. If equal levels of priority are to be assumed, the levels of priority can all be set to 1.
Typically, the intercept is forced into all models and is not a candidate variable. In this case, a sum of squares and crossproducts matrix for the independent and dependent variables corrected for the mean is input for COV. Routine CORVC in Chapter 3, Correlation can be used to compute the corrected sum of squares and crossproducts. Routine RORDM in Chapter 19, Utilities, can be used to reorder this matrix, if required. Other possibilities are
1. The intercept is not in the model. A raw (uncorrected) sum of squares and crossproducts matrix for the independent and dependent variables is required for COV. NOBS must be set to one greater than the number of observations. IMSL routine MXTXF (IMSL MATH/LIBRARY) can be used to compute the raw sum of squares and crossproducts matrix.
2. An intercept is to be a candidate variable. A raw (uncorrected) sum of squares and crossproducts matrix for the constant regressor (= 1), independent and dependent variables is required for COV. In this case, COV contains one additional row and column corresponding to the constant regressor. This row/column contains the sum of squares and crossproducts of the constant regressor with the independent and dependent variables. The remaining elements in COV are the same as in the previous case. NOBS must be set to one greater than the number of observations.
The stepwise regression algorithm is due to Efroymson (1960). Routine RSTEP uses sweeps of COV to move variables in and out of the model (Hemmerle 1967, Chapter 3). The SWEEP operator discussed by Goodnight (1979) is used. A description of the stepwise algorithm is given also by Kennedy and Gentle (1980, pages 335−340). The advantage of stepwise model building over all possible regressions (see routine RBEST) is that it is less demanding computationally when the number of candidate independent variables is very large. However, there is no guarantee that the model selected will be the best model (highest R2) for any subset size of independent variables.
1. Workspace may be explicitly provided, if desired, by use of R2TEP/DR2TEP. The reference is:
CALL R2TEP (INVOKE, NVAR, COV, LDCOV, LEVEL, NFORCE, NSTEP, ISTEP, NOBS, PIN, POUT, TOL, IPRINT, SCALE,HIST, IEND, AOV, COEF, LDCOEF, COVS, LDCOVS, SWEPT, IWK)
The additional arguments are as follows:
SWEPT Work vector of length NVAR with
information to indicate the independent variables in the model.
(Output)
SWEPT(I) = 1.0 indicates that independent
variable I
is in the model. Otherwise, SWEPT(I) = −1.0. Routine
RSUBM can be called with the arguments COVS and SWEPT to obtain
the part of COVS pertaining to the current model.
IWK Integer work vector of length 2 * NVAR.
2. Informational errors
Type Code
3 1 Based on TOL, there are linear dependencies among the variables to be forced.
4 2 No variables entered the model. Elements of AOV are set to NaN.
Both examples use a data set from Draper and Smith (1981, pages 629−630). A corrected sum of squares and crossproducts matrix for this data is given in the DATA statement and can be computed using routine CORVC in Chapter 3, Correlation. The first four columns are for the independent variables and the last column is for the dependent variable. Here, RSTEP is invoked using the backward stepping option.
USE RSTEP_INT
IMPLICIT NONE
INTEGER LDCOEF, LDCOV, LDCOVS, NVAR
PARAMETER (NVAR=5, LDCOEF=NVAR, LDCOV=NVAR, LDCOVS=NVAR)
!
INTEGER IEND, IPRINT, LEVEL(NVAR), NOBS
REAL AOV(13), COEF(LDCOEF,5), COV(LDCOV,NVAR), &
COVS(LDCOVS,NVAR), HIST(NVAR), SCALE(NVAR)
!
DATA COV/415.231, 251.077, -372.615, -290.000, 775.962, 251.077, &
2905.69, -166.538, -3041.00, 2292.95, -372.615, -166.538, &
492.308, 38.0000, -618.231, -290.000, -3041.00, 38.0000, &
3362.00, -2481.70, 775.962, 2292.95, -618.231, -2481.70, &
2715.76/
DATA LEVEL/4*1, -1/
!
NOBS = 13
IPRINT = 2
CALL RSTEP (COV, NOBS, AOV, COEF, COVS, IPRINT=IPRINT)
!
END
BACKWARD ELIMINATION
STEP 0: 4 variable(s)
entered.
Dependent R-squared Adjusted Est. Std.
Dev.
Variable (percent) R-squared of Model
Error
5
98.238
97.356
2.446
* * * Analysis of Variance * *
*
Sum of
Mean
Prob. of
Source
DF Squares Square
Overall F Larger F
Regression
4
2667.9 667.0
111.480
0.0000
Error
8
47.9
6.0
Total
12
2715.8
* * * Inference on Coefficients * *
*
(Conditional on the Selected
Model)
Coef.
Standard
Prob. of Variance
Variable
Estimate Error t-statistic
Larger t Inflation
1 1.551
0.7448 2.082
0.0709
38.5
2 0.510
0.7238 0.704
0.5012 254.4
3
0.102
0.7547 0.135
0.8963
46.9
4
-0.144 0.7091
-0.204 0.8437
282.5
STEP 1 : Variable 3 removed.
Dependent
R-squared Adjusted Est. Std. Dev.
Variable
(percent) R-squared of Model
Error
5
98.234
97.645
2.309
* * * Analysis of Variance * *
*
Sum of
Mean
Prob. of
Source
DF Squares Square
Overall F Larger F
Regression
3 2667.8
889.3 166.835
0.0000
Error
9
48.0
5.3
Total
12
2715.8
* * * Inference on Coefficients * *
*
(Conditional on the Selected
Model)
Coef.
Standard
Prob. of Variance
Variable
Estimate Error t-statistic
Larger t Inflation
1 1.452
0.1170 12.410
0.0000
1.07
2 0.416
0.1856 2.242
0.0517
18.78
4
-0.237 0.1733
-1.365 0.2054
18.94
* * *
Statistics for Variables Not in the Model * *
*
Coef. Standard t-statistic Prob.
of Variance
Variable
Estimate Error to
enter Larger t
Inflation
3 0.102
0.7547 0.135
0.8963 46.87
STEP 2 : Variable
4 removed.
Dependent R-squared Adjusted Est. Std.
Dev.
Variable (percent) R-squared of Model
Error
5
97.868
97.441
2.406
* * * Analysis of Variance * *
*
Sum of
Mean
Prob. of
Source
DF Squares Square
Overall F Larger F
Regression
2 2657.9
1328.9 229.502
0.0000
Error
10
57.9
5.8
Total
12
2715.8
* * * Inference on Coefficients * *
*
(Conditional on the Selected
Model)
Coef.
Standard
Prob. of Variance
Variable
Estimate Error
t-statistic Larger t
Inflation
1 1.468
0.1213 12.105
0.0000
1.06
2 0.662
0.0459 14.442
0.0000
1.06
* * *
Statistics for Variables Not in the Model * *
*
Coef. Standard t-statistic Prob.
of Variance
Variable
Estimate Error to
enter Larger t
Inflation
3 0.250
0.1847 1.354
0.2089
3.14
4
-0.237 0.1733
-1.365 0.2054
18.94
* * * Backward Elimination Summary * *
*
Variable Step
Removed
3
1
4 2
This example uses the data set in Example 1. Here, RSTEP is invoked using the forward stepwise option.
USE RSTEP_INT
IMPLICIT NONE
INTEGER LDCOEF, LDCOV, LDCOVS, NVAR
PARAMETER (NVAR=5, LDCOEF=NVAR, LDCOV=NVAR, LDCOVS=NVAR)
!
INTEGER IEND, IPRINT, ISTEP, LEVEL(NVAR), NOBS
REAL AOV(13), COEF(LDCOEF,5), COV(LDCOV,NVAR), &
COVS(LDCOVS,NVAR), HIST(NVAR), SCALE(NVAR)
!
DATA COV/415.231, 251.077, -372.615, -290.000, 775.962, 251.077, &
2905.69, -166.538, -3041.00, 2292.95, -372.615, -166.538, &
492.308, 38.0000, -618.231, -290.000, -3041.00, 38.0000, &
3362.00, -2481.70, 775.962, 2292.95, -618.231, -2481.70, &
2715.76/
DATA LEVEL/4*1, -1/
!
ISTEP = 1
NOBS = 13
IPRINT = 2
CALL RSTEP (COV, NOBS, AOV, COEF, COVS, ISTEP=ISTEP, IPRINT=IPRINT)
!
END
FORWARD SELECTION
STEP 0: No variables
entered.
* * *
Statistics for Variables Not in the Model * *
*
Coef. Standard t-statistic Prob.
of Variance
Variable
Estimate Error to
enter Larger t
Inflation
1 1.869
0.5264 3.550
0.0046
1
2 0.789
0.1684 4.686
0.0007
1
3 -1.256
0.5984 -2.098
0.0598
1
4 -0.738
0.1546 -4.775
0.0006 1
STEP
1 : Variable 4 entered.
Dependent R-squared
Adjusted Est. Std. Dev.
Variable (percent)
R-squared of Model Error
5 67.454
64.496
8.964
* * * Analysis of Variance * *
*
Sum of
Mean
Prob. of
Source
DF Squares Square
Overall F Larger F
Regression
1 1831.9
1831.9 22.799
0.0006
Error
11
883.9
80.4
Total
12
2715.8
* * * Inference on Coefficients * *
*
(Conditional on the Selected Model)
Coef.
Standard
Prob. of Variance
Variable
Estimate Error t-statistic
Larger t Inflation
4 -0.738
0.1546 -4.775
0.0006
1.00
* * *
Statistics for Variables Not in the Model * *
*
Coef. Standard t-statistic Prob.
of Variance
Variable
Estimate Error to
enter Larger t
Inflation
1 1.440
0.1384 10.403
0.0000
1.06
2 0.311
0.7486 0.415
0.6867
18.74
3
-1.200 0.1890
-6.348 0.0001
1.00
STEP 2 : Variable 1
entered.
Dependent R-squared Adjusted Est. Std.
Dev.
Variable (percent) R-squared of Model
Error
5
97.247
96.697
2.734
* * * Analysis of Variance * *
*
Sum of
Mean
Prob. of
Source
DF Squares Square
Overall F Larger F
Regression
2 2641.0
1320.5 176.636
0.0000
Error
10
74.8
7.5
Total
12
2715.8
* * * Inference on Coefficients * *
*
(Conditional on the Selected
Model)
Coef.
Standard
Prob. of Variance
Variable
Estimate Error t-statistic
Larger t Inflation
1 1.440
0.1384 10.403
0.0000
1.06
4
-0.614 0.0486
-12.622 0.0000
1.06
* * *
Statistics for Variables Not in the Model * *
*
Coef. Standard t-statistic Prob.
of Variance
Variable
Estimate Error to
enter Larger t
Inflation
2 0.416
0.1856 2.242
0.0517
18.78
3
-0.410 0.1992
-2.058 0.0697
3.46
* * * Forward Selection Summary * *
*
Variable Step
Entered
1
2
4 1
For an extended version of Example 2 that in addition computes the intercept and standard error for the final model from RSTEP, see Example 2 for routine RSUBM.
PHONE: 713.784.3131 FAX:713.781.9260 |