Computes case statistics and diagnostics given data points, coefficient estimates
and the R matrix for a fitted general linear model.
X NRX by NCOL matrix containing the data. (Input)
IRSP Column number IRSP of X contains the data for the response (dependent) variable. (Input)
B Vector of length NCOEF containing a least-squares solution
for the regression coefficients. (Input)
R NCOEF
by NCOEF
upper triangular matrix containing the R matrix. (Input)
The R matrix can come
from a regression fit based on a QR decomposition of the matrix of
regressors or based on a Cholesky factorization RTR of the
matrix of sums of squares and crossproducts of the regressors. Elements to the
right of a diagonal element of R that is zero must also be zero. A
zero row indicates a nonfull rank model. For an
R matrix that comes from a
regression fit with linear equality restrictions on the parameters, each row of
R corresponding to a
restriction must have a corresponding diagonal element that is negative. The
remaining rows of R must
have positive diagonal elements. Only the upper triangle of R is referenced.
DFE Degrees of freedom for error. (Input)
SSE Sum of squares for error. (Input)
CASE NRX by 12 matrix
containing the case statistics. (Output)
Columns 1 through 12
contain the following:
Col. |
Description |
1 |
Observed response |
2 |
Predicted response |
3 |
Residual |
4 |
Leverage |
5 |
Standardized residual |
6 |
Jackknife residual |
7 |
Cooks distance |
8 |
DFFITS |
9, 10 |
Confidence interval on the mean |
11, 12 |
Prediction interval |
IDO Processing
option. (Input)
Default: IDO
= 0.
IDO |
Action |
0 |
This is the only invocation of RCASE for this data set, and all the data are input at once. |
1 |
This is the first invocation, and additional calls to RCASE will be made. Case statistics are computed for the data in X. |
2 |
This is an intermediate or final invocation of RCASE. Case statistics are computed for the data in X. |
NRX Number of
rows in X.
(Input)
Default: NRX
= size (X,1).
NCOL Number of
columns in X.
(Input)
Default: NCOL
= size (X,2).
LDX Leading
dimension of X
exactly as specified in the dimension statement in the calling
program. (Input)
Default: LDX
= size (X,1).
INTCEP
Intercept option. (Input)
Default: INTCEP
= 1.
INTCEP |
Action |
0 |
An intercept is not in the model. |
1 |
An intercept is in the model. |
IEF Effect
option. (Input)
Default: IEF
= -1.
The absolute value of IEF
is the number of effects (sources of variation) due to the model. The sign of
IEF
specifies the following options.
IEF Meaning
< 0 Each effect corresponds to a single regressor (coefficient) in the model. In this case, arguments NCLVAR, INDCL, NCLVAL, CLVAL, NVEF, INDEF, and IDUMMY are not referenced.
> 0 Each effect corresponds to one or more regressors. A general linear model is specified through the arguments NCLVAR, INDCL, NCLVAL, CLVAL, NVEF, INDEF, and IDUMMY.
0 There are no effects in the model. INTCEP must equal 1.
NCLVAR Number of classification variables. (Input, if IEF is positive)
INDCL Index vector of length NCLVAR containing the column numbers of X that are the classification variables. (Input, if IEF is positive)
NCLVAL Vector
of length NCLVAR
containing the number of values taken on by each classification
variable. (Input, if IEF
is positive)
NCLVAL(I) is the number of
distinct values for the I-th classification
variable.
CLVAL Vector of
length NCLVAL(1)
+ NCLVAL(2) +
+ NCLVAL(NCLVAR)
containing the values of the classification variables. (Input, if
IEF
is positive)
The first NCLVAL(1) variables
contain the values of the first classification variable. The next NCLVAL(2) variables
contain the values of the second classification variable.
The last NCLVAL(NCLVAR)
variables contain the values of the last classification variable.
NVEF Vector of length IEF containing the number of variables associated with each effect in the model. (Input, if IEF is positive)
INDEF Index
vector of length NVEF(1)
+ NVEF(2)
+
+ NVEF(IEF).
(Input, if IEF
is positive)
The first NVEF(1)
elements give the column numbers of X for each variable in
the first effect. The next NVEF(2)
elements give the column numbers for each variable in the second effect.
The last NVEF(NEF) elements give the
column numbers for each variable in the last effect.
IDUMMY Dummy
variable option. (Input, if IEF
is positive)
Some indicator variables are defined for the I-th class variable as
follows: Let
J = NCLVAL(1) + NCLVAL(2) +
+ NCLVAL(I − 1). NCLVAL(I) indicator variables
are defined such that for K = 1, 2,
, NCLVAL(I) the K-th indicator
variable for row
M of X takes the value 1.0
if X(M, INDCL(I)) = CLVAL(J + K) and equals 0.0
otherwise. Dummy variables are generated from these indicator variables in one
of the three following ways:
IDUMMY Method
0, 1 The NCLVAL(I) indicator variables are the dummy variables (In RCASE, the computations for IDUMMY = 0 and IDUMMY = 1 are the same. The two values 0 and 1 are provided so that RCASE can be called after routine RGLM with no change in IDUMMY.)
2 The first NCLVAL(I) − 1 indicator variables are the dummy variables. The last indicator variable is omitted.
3
The K-th indicator
variable minus the NCLVAL (I)-th indicator
variable is the
K-th dummy variable
(K = 1, 2,
, NCLVAL(I) − 1).
IWT Weighting
option. (Input)
IWT = 0 means that all
weights are 1.0. For positive IWT, column number
IWT of X contains the
weights, and the computed prediction interval uses SSE/(DFE * X(I, IWT)) for the
estimated variance of a future response.
Default: IWT
= 0.
IPRED
Prediction interval option. (Input)
IPRED
= 0 means that prediction intervals are desired for a single future response.
For positive IPRED,
column number IPRED
of X contains
the number of future responses for which a prediction interval is desired on the
average of the future responses.
Default: IPRED
= 0.
CONPCM
Confidence level for two-sided interval estimates on the mean, in
percent. (Input)
CONPCM
percent confidence intervals are computed, hence, CONPCM
must be greater than or equal to 0.0 and less than 100.0. CONPCM
often will be 90.0, 95.0, or 99.0. For one-sided intervals with confidence level
ONECL, where
ONECL is greater
than or equal to 50.0 and less than 100.0, set CONPCM
= 100.0 −
2.0 * (100.0
− ONECL).
Default:
CONPCM
= 95.0.
CONPCP
Confidence level for two-sided prediction intervals, in percent.
(Input)
CONPCP
percent prediction intervals are computed, hence, CONPCP
must be greater than or equal to 0.0 and less than 100.0. CONPCP
often will be 90.0, 95.0, or 99.0. For one-sided intervals with confidence level
ONECL, where
ONECL is greater
than or equal to 50.0 and less than 100.0, set CONPCP
= 100.0 −
2.0 * (100.0
− ONECL).
Default:
CONPCP
= 95.0.
PRINT Printing
option. (Input)
Default: PRINT
= N.
PRINT is a character
string indicating what is to be printed. The PRINT string is
composed of one-character print codes to control printing. These print codes are
given as follows:
PRINT(I : I) Printing that Occurs
A All
N None
1 Observed response
2 Predicted response
3 Residual
4 Leverage
5 Standardized residual
6 Jackknife residual
7 Cooks distance
8 DFFITS
M Confidence interval on the mean
P Prediction interval
X Influential cases (unusual x-value)
Y Outlier cases (unusual y-value)
The concatenated print codes A, N, 1, , P that comprise the PRINT string give the combination of statistics to be printed. Concatenation of these codes with print codes X or Y restricts printing to cases determined to be influential or outliers. Here are a few examples.
|
Printing Action |
A |
All. |
N |
None. |
46 |
Leverage and jackknife residual for all cases. |
AXY |
All statistics are printed for cases that are highly influential or are outliers. |
46XY |
Leverage and jackknife residual are printed forcases that are highly influential or are outliers. |
IOBS Number of
the observation corresponding to the first row of X. (Input)
This observation number is used only for printing the row labels for the
individual case statistics.
Default: IOBS
= size (X,1).
NCOEF Number of
regression coefficients in the model. (Input)
Default: NCOEF
= size (B,1).
LDR Leading
dimension of R exactly as
specified in the dimension statement in the calling program.
(Input)
Default: LDR
= size (R,1).
LDCASE Leading
dimension of CASE exactly as
specified in the dimension statement in the calling program.
(Input)
Default: LDCASE
= size (CASE,1).
NRMISS Number
of rows of CASE
containing NaN (not a number). (Output)
If any row of data
contains NaN as a value of a variable other than the response variable, columns
3 through 12 of the corresponding row in CASE
are set to NaN. If the response is missing, columns 1, 3, and 5 through 8 are
set to NaN.
Generic: CALL RCASE (X, IRSP, B, R, DFE, SSE, CASE [, ])
Specific: The specific interface names are S_RCASE and D_RCASE.
Single: CALL RCASE (IDO, NRX, NCOL, X, LDX, INTCEP, IEF, NCLVAR, INDCL, NCLVAL, CLVAL, NVEF, INDEF, IDUMMY, IRSP, IWT, IPRED, CONPCM, CONPCP, PRINT, IOBS, NCOEF, B, R, LDR, DFE, SSE, CASE, LDCASE, NRMISS)
Double: The double precision name is DRCASE.
The general linear model used by routine RCASE is
y = X β + ɛ
where y is the n Χ 1 vector of
responses, X is the n Χ p matrix
of regressors, β is the p
Χ 1 vector
of regression coefficients, and ɛ is the n Χ 1 vector of
errors whose elements are independently normally distributed with mean 0 and
variance σ2/wi. The model used by
RCASE
also permits linear equality restrictions on β. From a
general linear model fitted using the wis as the weights,
routine RCASE
computes confidence intervals and statistics for the individual cases that
constitute the data set. Let xi be a column vector
containing elements of the i-th row of X. Let
W = diag(w1, w2,
, wn). The leverage is
defined as
(In the case of linear equality restrictions on β, the leverage is defined in terms of the reduced model.) Put D = diag(d1, d2, , dp) with dj = 1 if the j-th diagonal element of R is positive and 0 otherwise. The leverage is computed as hi = (aTDa)wi where a is a solution to RTa = xi. The estimated variance of
is given by his2/wi, where s2 = SSE/DFE. The computation of the remainder of the case statistics follows easily from their definitions. See the Diagnostics for Individual Cases section in the chapter introduction for definitions of the case diagnostics.
Often predicted values and confidence intervals are desired for combinations of settings of the effect variables not used in computing the regression fit. This can be accomplished using a single data matrix by including these settings of the variables as part of the data matrix and by setting the response equal to NaN (not a number). NaN can be retrieved by the invocation of the function AMACH(6). The regression routine performing the fit will omit the case, and RCASE will compute a predicted value and confidence interval for the missing response from the given settings of the effect variables.
The type 3 informational errors can occur if the input variables X, R, B and SSE are not consistent with each other or if excessive rounding has occurred in their computation. The type 3 error message with error code 2 arises when X contains a row not in the space spanned by the rows of R. An examination of the model that was fitted and the X for which diagnostics are to be computed is required in order to insure that only linear combinations of the regression coefficients that can be estimated from the fitted model are specified in X. For further details, see the discussion of estimable functions given by Maindonald (1984, pages 166−168) and Searle (1971, pages 180−188).
1. Workspace may be explicitly provided, if desired, by use of R2ASE/DR2ASE. The reference is:
CALL R2ASE (IDO, NRX, NCOL, X, LDX, INTCEP, IEF, NCLVAR, INDCL, NCLVAL, CLVAL, NVEF, INDEF, IDUMMY, IRSP, IWT, IPRED, CONPCM, CONPCP, PRINT, IOBS, NCOEF, B, R, LDR, DFE, SSE, CASE, LDCASE, NRMISS, WK)
The additional argument is:
WK Work vector of length NCOEF + 1.
2. Informational errors
Type Code
4 1 A weight is negative. Weights must be nonnegative.
3 2 The linear combination of the regression coefficients specified is not estimable within the preset tolerance.
3 3 A leverage much greater than 1.0 was computed. It is set to 1.0.
3 4 A deleted residual mean square much less than 0.0 was computed. It is set to 0.0.
4 5 A number of future observations for the prediction interval is nonpositive. It must be positive.
A multiple linear regression model is fitted and case statistics computed for data discussed by Cook and Weisberg (1982, page 103). The fitted model is
Some of the statistics in row 6 of the output matrix CASE are undefined (0.0/0.0) and are set to NaN (not a number). Some statistics in row 4 of CASE are set to Inf (positive machine infinity). The values of NaN and positive machine infinity can be retrieved by routine AMACH (or DMACH when using double precision), which is documented in the section Machine-Dependent Constants in Reference Material.
USE RCASE_INT
USE RGIVN_INT
IMPLICIT NONE
INTEGER INTCEP, LDB, LDCASE, LDR, LDSCPE, LDX, NCOEF, NCOL, &
NDEP, NIND, NROW, J, NRMISS
PARAMETER (INTCEP=1, NCOL=3, NDEP=1, NIND=2, NROW=7, &
LDCASE=NROW, LDSCPE=NDEP, LDX=NROW, &
NCOEF=INTCEP+NIND, LDB=NCOEF, LDR=NCOEF)
!
INTEGER IDEP, IEF, IIND, INDDEP(1), INDIND(1), IOBS, IRSP
REAL B(LDB,NDEP), CASE(LDCASE,12), DFE, R(LDR,NCOEF), &
SCPE(LDSCPE,NDEP), SSE, X(LDX,NCOL)
CHARACTER PRINT*1
!
DATA (X(1,J),J=1,NIND+NDEP) /1.0, 1.0, 3.0/
DATA (X(2,J),J=1,NIND+NDEP) /1.0, 2.0, 4.0/
DATA (X(3,J),J=1,NIND+NDEP) /1.0, 3.0, 5.0/
DATA (X(4,J),J=1,NIND+NDEP) /1.0, 4.0, 7.0/
DATA (X(5,J),J=1,NIND+NDEP) /1.0, 5.0, 7.0/
DATA (X(6,J),J=1,NIND+NDEP) /0.0, 6.0, 8.0/
DATA (X(7,J),J=1,NIND+NDEP) /1.0, 7.0, 9.0/
!
IIND = -NIND
IDEP = -NDEP
CALL RGIVN (X, IIND, INDIND, IDEP, INDDEP, B, R=R, DFE=DFE, SCPE=SCPE)
IEF = -NIND
IRSP = NCOL
PRINT = 'A'
IOBS = 1
SSE = SCPE(1,1)
CALL RCASE (X, IRSP, B(1:,1), R, DFE, SSE, CASE, ief=ief, &
PRINT=PRINT, iobs=iobs, ncoef=ncoef, nrmiss=nrmiss)
!
END
* * * Case Analysis * * *
Obs.
Observed Predicted Residual Leverage Std.
Res. Jack
Res.
Cooks D DFFITS 95.0% CI 95.0%
CI 95.0% PI 95.0%
PI
1
3.0000 3.1286
-0.1286 0.4714
-0.3886
-0.3430
0.0449 -0.3240
2.2609 3.9962
1.5957
4.6614
2
4.0000 4.1429
-0.1429 0.2857
-0.3714 -0.3273
0.0184
-0.2070 3.4674
4.8183 2.7100
5.5757
3
5.0000 5.1571
-0.1571 0.1857
-0.3826
-0.3376
0.0111 -0.1612
4.6126 5.7017
3.7812 6.5331
Y
4 7.0000
6.1714 0.8286
0.1714 2.0000
Inf
0.2759 Inf
5.6482 6.6946
4.8038
7.5391
5
7.0000 7.1857
-0.1857 0.2429
-0.4689
-0.4178
0.0235 -0.2366
6.5630 7.8084
5.7770 8.5945
X
6 8.0000
8.0000 0.0000
1.0000
NaN
NaN
NaN NaN
6.7364 9.2636
6.2129
9.7871
7
9.0000 9.2143
-0.2143 0.6429
-0.7878
-0.7423
0.3724 -0.9959
8.2011 10.2275
7.5946 10.8339
Figure 2- 6 Plot of Leverages hi and the Average (p/n = 3/7)
A one-way analysis of covariance model is fitted to the turkey data discussed by Draper and Smith (1981, pages 243−249). The response variable is turkey weight y (in pounds). There are three groups of turkeys corresponding to the three states where they were reared. The age of a turkey (in weeks) is the covariate. The explanatory variables are group, age, and interaction. The model is
where α3 = 0 and β3 = 0. Routine RGLM is used to fit the model. The option IDUMMY = 2 is used. The fitted model gives three separate lines, one for each state where the turkeys were reared. Then, RCASE is used to compute case statistics from the fitted model.
USE RCASE_INT
USE RGLM_INT
USE AMACH_INT
INTEGER IDEP, IEF, INTCEP, LDB, LDCASE, LDR, LDSCPE, LDX, &
MAXB, MAXCL, NCLVAR, NCOL, NROW
PARAMETER (IDEP=1, IEF=3, INTCEP=1, MAXB=6, MAXCL=3, NCLVAR=1, &
NCOL=3, NROW=13, LDB=MAXB, LDCASE=NROW, LDR=MAXB, &
LDSCPE=IDEP, LDX=NROW)
!
INTEGER IDUMMY, INDCL(NCLVAR), INDDEP(IDEP), &
INDEF(4), IOBS, IRANK, IRBEF(IEF+1), IRSP, &
NCLVAL(NCLVAR), NCOEF, NRMISS, NVEF(IEF)
REAL B(LDB,IDEP), CASE(LDCASE,12), CLVAL(MAXCL), &
DFE, R(LDR,MAXB),SCPE(LDSCPE,IDEP), SSE, X(LDX,NCOL)
CHARACTER PRINT
!
DATA (X(1,J),J=1,3) /25.0, 13.8, 3.0/
DATA (X(2,J),J=1,3) /28.0, 13.3, 1.0/
DATA (X(3,J),J=1,3) /20.0, 8.9, 1.0/
DATA (X(4,J),J=1,3) /32.0, 15.1, 1.0/
DATA (X(5,J),J=1,3) /22.0, 10.4, 1.0/
DATA (X(6,J),J=1,3) /29.0, 13.1, 2.0/
DATA (X(7,J),J=1,3) /27.0, 12.4, 2.0/
DATA (X(8,J),J=1,3) /28.0, 13.2, 2.0/
DATA (X(9,J),J=1,3) /26.0, 11.8, 2.0/
DATA (X(10,J),J=1,3) /21.0, 11.5, 3.0/
DATA (X(11,J),J=1,3) /27.0, 14.2, 3.0/
DATA (X(12,J),J=1,3) /29.0, 15.4, 3.0/
DATA (X(13,J),J=1,3) /23.0, 13.1, 3.0/
DATA INDCL/3/, NVEF/1, 1, 2/, INDEF/3, 1, 1, 3/, INDDEP/2/
!
IDUMMY = 2
CALL RGLM (X, INDCL, NVEF, INDEF, IDEP, INDDEP, MAXCL, B, &
IDUMMY=IDUMMY, NCLVAL=NCLVAL, CLVAL=CLVAL, IRBEF=IRBEF, &
R=R, DFE=DFE, SCPE=SCPE)
!
PRINT = 'A'
IRSP = INDDEP(1)
IPRED = 0
PRINT = 'A'
IOBS = 1
NCOEF = IRBEF(IEF+1) - 1
SSE = SCPE(1,1)
CALL RCASE (X, IRSP, B(1:, 1), R, DFE, SSE, CASE, IEF=IEF, &
NCLVAR=NCLVAR, INDCL=INDCL, NCLVAL=NCLVAL, CLVAL=CLVAL, &
NVEF=NVEF, INDEF=INDEF, IDUMMY=IDUMMY, IOBS=IOBS, &
PRINT=PRINT, NCOEF=NCOEF)
!
END
* * * Case Analysis * * *
Obs. Observed
Predicted Residual Leverage Std. Res. Jack
Res.
Cooks
D DFFITS 95.0% CI 95.0%
CI 95.0% PI 95.0% PI
1
13.8000 13.6000
0.2000 0.2000
0.7040
0.6762
0.0207 0.3381
13.2641 13.9359 12.7773
14.4227
2 13.3000
13.1901 0.1099
0.3187 0.4192
0.3930
0.0137 0.2688
12.7661 13.6141 12.3276
14.0526
3 8.9000
9.1418 -0.2418
0.5824 -1.1779
-1.2178
0.3225
-1.4383 8.5686
9.7149 8.1970
10.0865
4 15.1000
15.2143 -0.1143
0.7143 -0.6732
-0.6444
0.1888
-1.0189 14.5795 15.8490
14.2309 16.1976
5
10.4000 10.1538
0.2462 0.3846
0.9879
0.9860
0.1017 0.7795
9.6881 10.6196
9.2701 11.0376
6
13.1000 13.3300
-0.2300 0.7000
-1.3221
-1.4131
0.6797
-2.1585 12.7016
13.9584 12.3507
14.3093
7 12.4000
12.3900 0.0100
0.3000 0.0376
0.0348
0.0001 0.0228
11.9786 12.8014 11.5337
13.2463
8 13.2000
12.8600 0.3400
0.3000 1.2795
1.3533
0.1169 0.8859
12.4486 13.2714 12.0037
13.7163
9 11.8000
11.9200 -0.1200
0.7000 -0.6898
-0.6615
0.1850
-1.0104 11.2916 12.5484
10.9407 12.8993
10
11.5000 11.8200
-0.3200 0.6000
-1.5930 -1.8472
0.6344 -2.2623 11.2382
12.4018 10.8700
12.7700
11 14.2000
14.4900 -0.2900
0.3000 -1.0913
-1.1091
0.0851
-0.7261 14.0786 14.9014
13.6337 15.3463
12
15.4000 15.3800
0.0200 0.6000
0.0996
0.0922
0.0025 0.1130
14.7982 15.9618 14.4300
16.3300
13 13.1000
12.7100 0.3900
0.3000 1.4676
1.6330
0.1538 1.0691
12.2986 13.1214 11.8537
13.5663
PHONE: 713.784.3131 FAX:713.781.9260 |