Analyzes a simple linear regression model.
X — NOBS by NCOL matrix containing the data. (Input)
IRSP — Column number IRSP of X contains the data for the response (dependent) variable. (Input)
IND — Column number IND of X contains the data for the independent (explanatory) variable. (Input)
AOV — Vector of length 15 containing statistics relating to the analysis of variance. (Output)
I |
AOV(I) |
1 |
Degrees of freedom for regression |
2 |
Degrees of freedom for error |
3 |
Total degrees of freedom |
4 |
Sum of squares for regression |
5 |
Sum of squares for error |
6 |
Total sum of squares |
7 |
Regression mean square |
8 |
Error mean square |
9 |
F-statistic |
10 |
p-value |
11 |
R2 (in percent) |
12 |
Adjusted R2 (in percent) |
13 |
Estimated standard deviation of the model error |
14 |
Mean of the response (dependent) variable |
15 |
Coefficient of variation (in percent) |
If INTCEP = 1, the regression and total are corrected for the mean. If INTCEP = 0, the regression and total are not corrected for the mean, and AOV(14) and AOV(15) are set to NaN (not a number).
COEF — INTCEP
+ 1 by 5 matrix containing statistics relating the regression
coefficients. (Output)
If INTCEP
= 1, the first row corresponds to the intercept. Row INTCEP
+ 1 corresponds to the coefficient for the slope. The statistics in the columns
are
Col. |
Description |
1 |
Coefficient estimate |
2 |
Estimated standard error of the coefficient estimate |
3 |
t-statistic for the test that the coefficient is zero |
4 |
p-value for the two-sided t test |
5 |
Variance inflation factor |
COVB — INTCEP + 1 by INTCEP + 1 matrix that is the estimated variance-covariance matrix of the estimated regression coefficients. (Output)
TESTLF — Vector of length 10 containing statistics relating to the test for lack of fit of the model. (Output)
Elem |
Description |
1 |
Degrees of freedom for lack of fit |
2 |
Degrees of freedom for pure error |
3 |
Degrees of freedom for error (TESTLF(1) + TESTLF(2)) |
4 |
Sum of squares for lack of fit |
5 |
Sum of squares for pure error |
6 |
Sum of squares for error |
7 |
Mean square for lack of fit |
8 |
Mean square for pure error |
9 |
F statistic |
10 |
p-value |
If there are no replicates in the data set, a test for lack of fit cannot be performed. In this case, elements 7, 8, 9, and 10 of TESTLF are set to NaN (not a number).
CASE — NOBS by 12 matrix
containing case statistics. (Output)
Columns 1 through 12
contain the following:
Col. |
Description |
1 |
Observed response |
2 |
Predicted response |
3 |
Residual |
4 |
Leverage |
5 |
Standardized residual |
6 |
Jackknife residual |
7 |
Cook’s distance |
8 |
DFFITS |
9, 10 |
Confidence interval on the mean |
11, 12 |
Prediction interval |
NOBS — Number of
observations. (Input)
Default: NOBS = size (X,1).
NCOL — Number of
columns in X.
(Input)
Default: NCOL = size (X,2).
LDX — Leading
dimension of X
exactly as specified in the dimension statement in the calling
program. (Input)
Default: LDX = size (X,1).
INTCEP —
Intercept option. (Input)
Default: INTCEP = 1.
INTCEP Action
0 An intercept is not in the model.
1 An intercept is in the model.
IFRQ — Frequency
option. (Input)
IFRQ = 0 means that
all frequencies are 1.0. For positive IFRQ, column number
IFRQ of X contains the
frequencies. If X(I, IFRQ) = 0.0, none of
the remaining elements of row I of X
are referenced, and updating of statistics is skipped for row I.
Default: IFRQ = 0.
IWT — Weighting
option. (Input)
IWT = 0 means that all
weights are 1.0. For positive IWT, column number
IWT of X contains the
weights.
Default: IWT = 0.
IPRED —
Prediction interval option. (Input)
IPRED
= 0 means that prediction intervals are computed for a single future response.
For positive IPRED,
a prediction interval is computed on the average of future responses, and column
number IPRED
of X contains
the number of future responses in each average.
Default: IPRED =0.
CONPCM —
Confidence level for two-sided interval estimates on the mean, in
percent. (Input)
CONPCM
percent confidence intervals are computed, hence, CONPCM
must be greater than or equal to 0.0 and less than 100.0. CONPCM
often will be 90.0, 95.0, or 99.0. For one-sided intervals with confidence level
ONECL, where
ONECL is greater
than or equal to 50.0 and less than 100.0, set CONPCM
= 100.0 −
2.0 * (100.0
− ONECL).
Default:
CONPCM =
95.0.
CONPCP —
Confidence level for two-sided prediction intervals, in percent.
(Input)
CONPCP
percent prediction intervals are computed, hence, CONPCP
must be greater than or equal to 0.0 and less than 100.0. CONPCP
often will be 90.0, 95.0, or 99.0. For one-sided intervals with confidence level
ONECL, where
ONECL is greater
than or equal to 50.0 and less than 100.0, set CONPCP
= 100.0 −
2.0 * (100.0
− ONECL).
Default:
CONPCP =
95.0.
IPRINT — Printing
option. (Input)
Default: IPRINT = 0.
IPRINT |
Action |
0 |
No printing is performed. |
1 |
AOV, COEF, TESTLF, and unusual rows of CASE are printed. |
2 |
AOV, COEF, TESTLF, and unusual rows of CASE are printed. A plot of the data with the regression line is printed. |
3 |
All printing is performed. A plot of the data with the regression line, a plot of the standardized residuals versus the independent variable, and a half-normal probability plot of the standardized residuals are printed. |
LDCOEF — Leading
dimension of COEF exactly as
specified in the dimension statement in the calling program.
(Input)
Default: LDCOEF = size (COEF,1).
LDCOVB — Leading
dimension of COVB exactly as
specified in the dimension statement in the calling program.
(Input)
Default: LDCOVB = size (COVB,1).
LDCASE — Leading
dimension of CASE exactly as
specified in the dimension statement in the calling program.
(Input)
Default: LDCASE = size (CASE,1).
NRMISS — Number
of rows of data encountered containing missing values for the independent,
dependent, weight, or frequency variables. (Output)
NaN (not a
number) is used as the missing value code. Any row of X containing NaN as a
value of the independent, dependent, weight, or frequency variables is omitted
from the computations for fitting the model.
Generic: CALL RONE (X, IRSP, IND, AOV, COEF, COVB, TESTLF, CASE [,…])
Specific: The specific interface names are S_RONE and D_RONE.
Single: CALL RONE (NOBS, NCOL, X, LDX, INTCEP, IRSP, IND, IFRQ, IWT, IPRED, CONPCM, CONPCP, IPRINT, AOV, COEF, LDCOEF, COVB, LDCOVB, TESTLF, CASE, LDCASE, NRMISS)
Double: The double precision name is DRONE.
Routine RONE performs an analysis for the simple linear regression model. In addition to the fit, summary statistics (analysis of variance, t tests, lack-of-fit test), and confidence intervals and diagnostics for individual cases are computed. With the printing option, diagnostic plots can also be produced. Draper and Smith (1981, chapter 1) give formulas for many of the statistics computed by RONE. For definitions of the case diagnostics (stored in CASE), see the Usage Notes of this chapter.
1. Workspace may be explicitly provided, if desired, by use of R2NE/DR2NE. The reference is:
CALL R2NE (NOBS, NCOL, X, LDX, INTCEP, IRSP, IND, IFRQ, IWT, IPRED, CONPCM, CONPCP, IPRINT, AOV, COEF, LDCOEF, COVB, LDCOVB, TESTLF, CASE, LDCASE, NRMISS, IWK, WK)
The additional arguments are as follows:
IWK — Work vector of length NOBS.
WK — Work vector of length 3 * NOBS.
2. Informational errors
Type Code
3 5 CONPCM is less than 50.0. Confidence percentages commonly used are 90.0, 95.0, and 99.0.
3 6 CONPCP is less than 50.0. Confidence percentages commonly used are 90.0, 95.0, and 99.0.
4 1 Negative weight encountered.
4 2 Negative frequency encountered.
4 7 Each row of X contains NaN.
This example fits a line to a set of data discussed by Draper and Smith (1981, pages 9−33). The response y is the amount of steam used per month (in pounds), and the independent variable x is the average atmospheric temperature (in degrees Fahrenheit). The IPRINT = 1 option is selected. Hence, plots are not produced and only unusual cases are printed. Note in the case analysis, with the default page width, the observation number and the associated 12 statistics require two lines of output. (Routine PGOPT, Chapter 19, Utilities;, can be invoked to increase the page width to put all 12 statistics on the same line.) Also note that observation 11 is labeled with a “Y” to indicate an unusual y (response). The residual for this case is about 2 standard deviations from zero.
USE RONE_INT
IMPLICIT NONE
INTEGER INTCEP, LDCASE, LDCOEF, LDCOVB, LDX, NCOEF, NCOL, NOBS
INTEGER J
PARAMETER (NOBS=25, LDX=25, LDCASE=25, INTCEP=1, NCOEF=INTCEP+1, &
LDCOEF=NCOEF, LDCOVB=NCOEF, NCOL=2)
!
INTEGER IND, IPRINT, IRSP, NRMISS
REAL AOV(15), CASE(LDCASE,12), COEF(LDCOEF,5), CONPCP, &
COVB(LDCOVB,NCOEF), TESTLF(10), X(LDX,NCOL)
!
DATA (X(1,J),J=1,2) /35.3, 10.98/
DATA (X(2,J),J=1,2) /29.7, 11.13/
DATA (X(3,J),J=1,2) /30.8, 12.51/
DATA (X(4,J),J=1,2) /58.8, 8.40/
DATA (X(5,J),J=1,2) /61.4, 9.27/
DATA (X(6,J),J=1,2) /71.3, 8.73/
DATA (X(7,J),J=1,2) /74.4, 6.36/
DATA (X(8,J),J=1,2) /76.7, 8.50/
DATA (X(9,J),J=1,2) /70.7, 7.82/
DATA (X(10,J),J=1,2) /57.5, 9.14/
DATA (X(11,J),J=1,2) /46.4, 8.24/
DATA (X(12,J),J=1,2) /28.9, 12.19/
DATA (X(13,J),J=1,2) /28.1, 11.88/
DATA (X(14,J),J=1,2) /39.1, 9.57/
DATA (X(15,J),J=1,2) /46.8, 10.94/
DATA (X(16,J),J=1,2) /48.5, 9.58/
DATA (X(17,J),J=1,2) /59.3, 10.09/
DATA (X(18,J),J=1,2) /70.0, 8.11/
DATA (X(19,J),J=1,2) /70.0, 6.83/
DATA (X(20,J),J=1,2) /74.5, 8.88/
DATA (X(21,J),J=1,2) /72.1, 7.68/
DATA (X(22,J),J=1,2) /58.1, 8.47/
DATA (X(23,J),J=1,2) /44.6, 8.86/
DATA (X(24,J),J=1,2) /33.4, 10.36/
DATA (X(25,J),J=1,2) /28.6, 11.08/
!
IRSP = 2
IND = 1
CONPCP = 99.0
IPRINT = 1
CALL RONE (X, IRSP, IND, AOV, COEF, COVB, TESTLF, CASE, &
CONPCP=CONPCP, IPRINT=IPRINT, NRMISS=NRMISS)
!
END
R-squared Adjusted Est. Std.
Dev.
Coefficient of
(percent) R-squared of Model
Error Mean Var.
(percent)
71.444
70.202
0.8901
9.424 9.445
* * * Analysis of Variance * * *
Sum of Mean Prob. of
Source DF Squares Square Overall F Larger F
Regression 1 45.59 45.59 57.543 0.0000
Residual 23 18.22 0.79
Corrected Total 24 63.82
* * * Inference on Coefficients * * *
Standard Prob. of Variance
Coef. Estimate Error t-statistic Larger |t| Inflation
1 13.62 0.5815 23.43 0.0000 10.67
2 -0.08 0.0105 -7.59 0.0000 1.00
* * * Test for Lack of Fit * * *
Sum of Mean Prob. of
Source DF Squares Square Overall F Larger F
Lack of fit 22 17.40 0.7911 0.966 0.6801
Pure error 1 0.82 0.8192
Residual 23 18.22
* * * Case Analysis * * *
Obs. Observed Predicted Residual Leverage Std. Res. Jack Res.
Cook’s D DFFITS 95.0% CI 95.0% CI 99.0% PI 99.0% PI
Y 11 8.2400 9.9189 -1.6789 0.0454 -1.9305 -2.0625
0.0886 -0.4497 9.5267 10.3112 7.3640 12.4739
Figure 2- 2 Plot of Line and 99% One-at-a-Time Prediction Intervals
This example fits a line to a data set discussed by Draper
and Smith (1981, pages 38−40). The data
set contains several repeated x values in order to assess lack of fit of
the straight line. The
IPRINT
= 1 option is selected. Hence, plots are not produced and only unusual cases are
printed. Note in the case analysis that observations 1 and 2 are labeled with an
“X”
to indicate an unusual x value. Each have leverage 0.1944 that exceeds
the average leverage of p/n = 2/24 by a factor of 2.
USE RONE_INT
IMPLICIT NONE
INTEGER LDCASE, LDCOEF, LDCOVB, LDX, NCOEF, NCOL, NOBS,J
INTEGER INTCEP, NRMISS
PARAMETER (INTCEP=1, NCOL=2, NOBS=24, LDCASE=NOBS, LDX=NOBS, &
NCOEF=INTCEP+1, LDCOEF=NCOEF, LDCOVB=NCOEF)
!
INTEGER IFRQ, IND, IPRED, IPRINT, IRSP
REAL AOV(15), CASE(LDCASE,12),COEF(LDCOEF,5), &
COVB(LDCOVB,NCOEF), TESTLF(10), X(LDX,NCOL)
!
DATA (X(1,J),J=1,2) /2.3, 1.3/
DATA (X(2,J),J=1,2) /1.8, 1.3/
DATA (X(3,J),J=1,2) /2.8, 2.0/
DATA (X(4,J),J=1,2) /1.5, 2.0/
DATA (X(5,J),J=1,2) /2.2, 2.7/
DATA (X(6,J),J=1,2) /3.8, 3.3/
DATA (X(7,J),J=1,2) /1.8, 3.3/
DATA (X(8,J),J=1,2) /3.7, 3.7/
DATA (X(9,J),J=1,2) /1.7, 3.7/
DATA (X(10,J),J=1,2) /2.8, 4.0/
DATA (X(11,J),J=1,2) /2.8, 4.0/
DATA (X(12,J),J=1,2) /2.2, 4.0/
DATA (X(13,J),J=1,2) /5.4, 4.7/
DATA (X(14,J),J=1,2) /3.2, 4.7/
DATA (X(15,J),J=1,2) /1.9, 4.7/
DATA (X(16,J),J=1,2) /1.8, 5.0/
DATA (X(17,J),J=1,2) /3.5, 5.3/
DATA (X(18,J),J=1,2) /2.8, 5.3/
DATA (X(19,J),J=1,2) /2.1, 5.3/
DATA (X(20,J),J=1,2) /3.4, 5.7/
DATA (X(21,J),J=1,2) /3.2, 6.0/
DATA (X(22,J),J=1,2) /3.0, 6.0/
DATA (X(23,J),J=1,2) /3.0, 6.3/
DATA (X(24,J),J=1,2) /5.9, 6.7/
!
IRSP = 1
IND = 2
IPRINT = 1
CALL RONE (X, IRSP, IND, AOV, COEF, COVB, TESTLF, CASE, &
IPRINT=IPRINT, NRMISS=NRMISS)
END
R-squared Adjusted Est. Std.
Dev.
Coefficient of
(percent) R-squared of Model
Error Mean Var.
(percent)
22.983
19.483
0.9815
2.858 34.34
* * * Analysis of Variance * * *
Sum of Mean Prob. of
Source DF Squares Square Overall F Larger F
Regression 1 6.32 6.325 6.565 0.0178
Residual 22 21.19 0.963
Corrected Total 23 27.52
* * * Inference on Coefficients * * *
Standard Prob. of Variance
Coef. Estimate Error t-statistic Larger |t| Inflation
1 1.436 0.5900 2.435 0.0235 8.672
2 0.338 0.1319 2.562 0.0178 1.000
* * * Test for Lack of Fit * * *
Sum of Mean Prob. of
Source DF Squares Square Overall F Larger F
Lack of fit 11 8.72 0.793 0.700 0.7183
Pure error 11 12.47 1.134
Residual 22 21.19
* * * Case Analysis * * *
Obs. Observed Predicted Residual Leverage Std. Res. Jack Res.
Cook’s D DFFITS 95.0% CI 95.0% CI 95.0% PI 95.0% PI
X 1 2.3000 1.8756 0.4244 0.1944 0.4817 0.4731
0.0280 0.2324 0.9783 2.7730 -0.3489 4.1002
X 2 1.8000 1.8756 -0.0756 0.1944 -0.0859 -0.0839
0.0009 -0.0412 0.9783 2.7730 -0.3489 4.1002
Y 13 5.4000 3.0245 2.3755 0.0460 2.4780 2.8515
0.1481 0.6264 2.5877 3.4612 0.9426 5.1063
Y 24 5.9000 3.7002 2.1998 0.1537 2.4363 2.7855
0.5391 1.1873 2.9021 4.4983 1.5138 5.8866
Figure 2- 3 Plot of Leverages hi and the Average (p/n = 2/24)
PHONE: 713.784.3131 FAX:713.781.9260 |