Computes diagnostics for detection of outliers and influential data points given residuals and the R matrix for a fitted general linear model.
X NRX by NCOL matrix containing the data. (Input)
IIND
Independent variable option. (Input)
The absolute value of IIND is the number of
independent (explanatory) variables. The sign of IIND specifies the
following options:
IIND |
Meaning |
< 0 |
The data for the −IIND independent variables are given in the first −IIND columns of X. |
> 0 |
The data for the IIND independent variables are in the columns of X whose column numbers are given by the elements of INDIND. |
= 0 |
There are no independent variables. |
The regressors are the constant regressor (if INTCEP = 1) and the independent variables.
INDIND Index
vector of length IIND containing the
column numbers of X that are the
independent (explanatory) variables. (Input, if IIND is positive)
If IIND is
nonpositive, INDIND is not
referenced and can be a vector of length one.
R INTCEP
+ |IIND| by
INTCEP
+ |IIND| upper
triangular matrix containing the R matrix. (Input)
The R matrix can come
from a regression fit based on a QR decomposition of the matrix of
regressors or based on a Cholesky factorization RTR of the
matrix of sums of squares and crossproducts of the regressors. Elements to the
right of a diagonal element of R that is zero must also be zero. A
zero row indicates a nonfull rank model. For an R matrix that comes from a
regression fit with linear equality restrictions on the parameters, each row of
R corresponding to a
restriction must have a corresponding diagonal element that is negative. The
remaining rows of R must
have positive diagonal elements.
DFE Degrees of freedom for error. (Input)
SSE Sum of squares for error. (Input)
E Vector of
length NRX with
the residuals. (Input)
If a residual is not known, e.g., the
value for the dependent (response) variable was missing, the input value of the
corresponding element of E should equal NaN
(not a number).
OTIN NRX by 6 matrix
containing diagnostics for detection of outliers and influential
cases. (Output)
The columns of OTIN
contain the following:
Col. |
Description |
1 |
Residual |
2 |
Leverage (diagonal element of the Hat matrix) |
3 |
Standardized residual |
4 |
Jackknife (deleted) residual |
5 |
Cooks Distance |
6 |
DFFITS |
NRX Number of
rows of data. (Input)
Default: NRX = size (X,1).
NCOL Number of
columns in X.
(Input)
Default: NCOL = size (X,2).
LDX Leading
dimension of X
exactly as specified in the dimension statement in the calling
program. (Input)
Default: LDX = size (X,1).
INTCEP
Intercept option. (Input)
Default: INTCEP = 1.
INTCEP |
Action |
0 |
An intercept is not in the model. |
1 |
An intercept is in the model. |
IWT Weighting
option. (Input)
IWT = 0 means that all
weights are 1.0. For positive IWT, column number
IWT of X contains the
weights.
Default: IWT = 0.
LDR Leading
dimension of R
exactly as specified in the dimension statement in the calling
program. (Input)
Default: LDR = size (R,1).
LDOTIN Leading
dimension of OTIN exactly as
specified in the dimension statement in the calling program.
(Input)
Default: LDOTIN = size (OTIN,1).
NRMISS Number
of rows of OTIN
containing NaN (not a number). (Output)
If any row of data
contains NaN as a value of the independent variable or weight, elements in
columns 2 thru 6 of the corresponding row in OTIN are set to NaN.
If the residual is missing, elements in columns 3 thru 6 are set to NaN.
Generic: CALL ROTIN (X, IIND, INDIND, R, DFE, SSE, E, OTIN [, ])
Specific: The specific interface names are S_ROTIN and D_ROTIN.
Single: CALL ROTIN (NRX, NCOL, X, LDX, INTCEP, IIND, INDIND, IWT, R, LDR, DFE, SSE, E, OTIN, LDOTIN, NRMISS)
Double: The double precision name is DROTIN.
The multiple regression model used by routine ROTIN is
y = X β + ɛ
where y is the n Χ 1 vector of responses, X is the n Χ p matrix of regressors, β is the p Χ 1 vector of regression coefficients, and ɛ is the n Χ 1 vector of errors whose elements are independently normally distributed with mean 0 and variance σ2/wi. The model used by ROTIN also permits linear equality restrictions on β. From a multiple regression model fit using the wis as the weights, routine ROTIN computes diagnostics for outliers and influential cases. Let xi be a column vector containing elements of the i-th row of X. Let W = diag(w1, w2, , wn). The leverage is defined as
(In the case of linear equality restrictions on β, the leverage is defined in terms of the reduced model.) Put D = diag(d1, d2, , dp) with dj = 1 if the j-th diagonal element of R is positive and 0 otherwise. The leverage is computed as hi = (aTDa)wi where a is a solution to RTa = xi. The computation of the remainder of the case diagnostics follows easily from their definitions. See the Diagnostics for Individual Cases section in the chapter introduction for definitions of the case diagnostics .
The type 3 informational errors can occur if the input variables X, R, E and SSE are not consistent with each other or if excessive rounding has occurred in their computation.The type 3 error message with error code 2 arises when X contains a row not in the space spanned by the rows of R. An examination of the model that was fitted and the X for which diagnostics are to be computed is required in order to insure that only linear combinations of the regression coefficients that can be estimated from the fitted model are specified in X. For further details, see the discussion of estimable functions given by Maindonald (1984, pages 166−168) and Searle (1971, pages 180−188).
1. Workspace may be explicitly provided, if desired, by use of R2TIN/DR2TIN. The reference is:
CALL R2TIN (NRX, NCOL, X, LDX, INTCEP, IIND, INDIND, IWT, R, LDR, DFE, SSE, E, OTIN, LDOTIN, NRMISS, WK)
The additional argument is:
WK Work vector of length INTCEP + |IIND|.
2. Informational errors
Type Code
3 2 The linear combination of the regression coefficients specified is not estimable within the preset tolerance.
3 3 A leverage much greater than 1.0 was computed. It is set to 1.0.
3 4 A deleted residual mean square much less than 0.0 was computed. It is set to 0.0.
4 1 A weight is negative. Weights must be nonnegative.
A multiple linear regression model is fit and case statistics computed for data discussed by Cook and Weisberg (1982, page 103). The fitted model is
Some of the statistics in row 6 of the output matrix OTIN are undefined (0.0/0.0) and are set to NaN (not a number). Some statistics in row 4 of OTIN are infinite and are set to machine infinity. The values of NaN and machine infinity can be retrieved by routine AMACH (or DMACH when using double precision), which is documented in Reference Material.
! SPECIFICATIONS FOR LOCAL VARIABLES
! SPECIFICATIONS FOR LOCAL VARIABLES
USE IMSL_LIBRARIES
IMPLICIT NONE
INTEGER INTCEP, LDB, LDOTIN, LDR, LDSCPE, LDX, NCOEF, NCOL, &
NDEP, NIND, NROW, J
PARAMETER (INTCEP=1, NCOL=3, NDEP=1, NIND=2, NROW=7, &
LDOTIN=NROW, LDSCPE=NDEP, LDX=NROW, &
NCOEF=INTCEP+NIND, LDB=NCOEF, LDR=NCOEF)
!
INTEGER I, IDEP, IIND, INDDEP(1), INDIND(1), NOUT, NRMISS
REAL B(LDB,NDEP), D(NCOEF), DFE, E(NROW), &
OTIN(LDOTIN,6), R(LDR,NCOEF), SCPE(LDSCPE,NDEP), &
SSE, X(LDX,NCOL), XMAX(NCOEF), XMIN(NCOEF)
CHARACTER CLABEL(7)*10, RLABEL(1)*6
!
DATA CLABEL/'Obs.', 'Residual', 'Leverage', 'Std. Res.', &
'Jack. Res.', 'Cook''s D', 'DFFITS'/
DATA RLABEL/'NUMBER'/
!
DATA (X(1,J),J=1,NIND+NDEP) /1.0, 1.0, 3.0/
DATA (X(2,J),J=1,NIND+NDEP) /1.0, 2.0, 4.0/
DATA (X(3,J),J=1,NIND+NDEP) /1.0, 3.0, 5.0/
DATA (X(4,J),J=1,NIND+NDEP) /1.0, 4.0, 7.0/
DATA (X(5,J),J=1,NIND+NDEP) /1.0, 5.0, 7.0/
DATA (X(6,J),J=1,NIND+NDEP) /0.0, 6.0, 8.0/
DATA (X(7,J),J=1,NIND+NDEP) /1.0, 7.0, 9.0/
!
IIND = -NIND
IDEP = -NDEP
CALL RGIVN (X, IIND, INDIND, IDEP, INDDEP, B, R=R, DFE=DFE, SCPE=SCPE)
SSE = SCPE(1,1)
! Compute residuals.
DO 10 I=1, NROW
E(I) = X(I,NCOL) - B(1,1) - SDOT(NIND, B((INTCEP+1): ,1), &
1, X(I:, 1), LDX)
10 CONTINUE
!
CALL ROTIN (X, IIND, INDIND, R, DFE, SSE, E, OTIN, &
NRMISS=NRMISS)
!
CALL WRRRL ('OTIN', OTIN, RLABEL, CLABEL, FMT='(F10.3)')
CALL UMACH (2, NOUT)
WRITE (NOUT,*) 'NRMISS = ', NRMISS
!
END
OTIN
Obs.
Residual Leverage Std. Res. Jack.
Res. Cooks D DFFITS
1 -0.129
0.471 -0.389
-0.343 0.045
-0.324
2
-0.143 0.286
-0.371 -0.327
0.018 -0.207
3 -0.157
0.186 -0.383
-0.338 0.011
-0.161
4
0.829
0.171
2.000
Inf
0.276 Inf
5 -0.186
0.243 -0.469
-0.418 0.024
-0.237
6
0.000
1.000
NaN
NaN
NaN NaN
7 -0.214
0.643 -0.788
-0.742 0.372
-0.996
NRMISS = 1
In this example, routine RNLIN is first invoked to fit the following nonlinear regression model discussed by Neter, Wasserman, and Kutner (1983, pages 475−478):
Then, ROTIN is used to compute case diagnostics. In addition, the leverage output by ROTIN is used to construct asymptotic confidence intervals on the mean of the nonlinear regression function evaluated at xi. The asymptotic 95% confidence intervals are computed using the formula:
where hi is the computed leverage, t.975,DFE is the 97.5 percentile of the t distribution with DFE degrees of freedom as computed by routine TIN (see Chapter 17, Probability Distribution Funtions and Inverses;), and s2 equals SSE/DFE.
USE IMSL_LIBRARIES
IMPLICIT NONE
INTEGER LDOTIN, LDR, NOBS, NPARM, NRX
PARAMETER (NOBS=15, NPARM=2, NRX=1, LDOTIN=NRX, LDR=NPARM)
!
INTEGER IDERIV, IDUMMY(1), IEND, IOBS, IRANK, J, NOUT, NRMISS
REAL A, DE(NPARM, 1), DFE, E(1), FRQ, OTIN(LDOTIN,6), &
R(LDR,NPARM), SQRT, SSE, THETA(NPARM), WT, Y, &
YHAT
INTRINSIC SQRT
EXTERNAL EXAMPL
!
DATA THETA/60.0, -0.03/
!
CALL UMACH (2, NOUT)
!
IDERIV = 1
CALL RNLIN (EXAMPL, THETA, R=R, DFE=DFE, SSE=SSE)
!
WRITE (NOUT,*) ' Obs. Pred. Res. Lev. St Res Del Res Cook '// &
'D DFFIT Conf Interval'
DO 10 IOBS=1, NOBS
CALL EXAMPL (NPARM, THETA, 0, IOBS, FRQ, WT, E, DE, IEND)
CALL EXAMPL (NPARM, THETA, 1, IOBS, FRQ, WT, E, DE, IEND)
CALL EXAMPL (NPARM, THETA, 2, IOBS, FRQ, WT, Y, DE, IEND)
YHAT = Y - E(1)
CALL ROTIN (DE, -NPARM, IDUMMY, R, DFE, SSE, E, OTIN, &
NRX=NRX, LDX=1, INTCEP=0)
A = TIN(0.975,DFE)*SQRT((SSE/DFE)*OTIN(1,2))
WRITE (NOUT,'(F5.1,10F7.2)') Y, YHAT, (OTIN(1,J),J=1,6), &
YHAT - A, YHAT + A
10 CONTINUE
END
!
SUBROUTINE EXAMPL (NPARM, THETA, IOPT, IOBS, FRQ, WT, E, DE, &
IEND)
INTEGER NPARM, IOPT, IOBS, IEND
REAL THETA(NPARM), FRQ, WT, E(1), DE(NPARM, 1)
!
INTEGER NOBS
PARAMETER (NOBS=15)
!
REAL EXP, XDATA(NOBS), YDATA(NOBS)
INTRINSIC EXP
!
DATA YDATA/54.0, 50.0, 45.0, 37.0, 35.0, 25.0, 20.0, 16.0, 18.0, &
13.0, 8.0, 11.0, 8.0, 4.0, 6.0/
DATA XDATA/2.0, 5.0, 7.0, 10.0, 14.0, 19.0, 26.0, 31.0, 34.0, &
38.0, 45.0, 52.0, 53.0, 60.0, 65.0/
!
IF (IOBS .LE. NOBS) THEN
WT = 1.0E0
FRQ = 1.0E0
IEND = 0
IF (IOPT .EQ. 0) THEN
E(1) = YDATA(IOBS) - THETA(1)*EXP(THETA(2)*XDATA(IOBS))
ELSE IF (IOPT .EQ. 1) THEN
DE(1, 1) = -EXP(THETA(2)*XDATA(IOBS))
DE(2, 1) = -THETA(1)*XDATA(IOBS)*EXP(THETA(2)*XDATA(IOBS))
ELSE IF (IOPT .EQ. 2) THEN
E(1) = YDATA(IOBS)
END IF
ELSE
IEND = 1
END IF
RETURN
END
Obs. Pred. Res. Lev. St
Res Del Res Cook D DFFIT Conf Interval
54.0 54.15
-0.14 0.40 -0.09 -0.09 0.00
-0.07 51.19 56.53
50.0 48.08 1.92
0.24 1.13 1.14 0.21 0.65
49.84 54.00
45.0 44.42 0.58
0.18 0.33 0.32 0.01 0.15
43.79 47.37
37.0 39.45 -2.45 0.13
-1.34 -1.39 0.13 -0.54 33.04
36.07
35.0 33.67 1.33 0.11
0.72 0.71 0.03 0.24 34.96
37.70
25.0 27.62 -2.63 0.11 -1.42
-1.49 0.12 -0.52 21.00 23.75
20.0
20.94 -0.94 0.12 -0.51 -0.50
0.02 -0.18 17.61 20.51
16.0 17.18
-1.18 0.12 -0.65 -0.63 0.03
-0.23 13.35 16.29
18.0 15.26 2.74
0.12 1.50 1.58 0.15 0.58
19.29 22.20
13.0 13.02 -0.02 0.11
-0.01 -0.01 0.00 0.00 11.56
14.40
8.0 9.87 -1.87 0.10
-1.01 -1.01 0.06 -0.33 4.81
7.45
11.0 7.48 3.52 0.08
1.88 2.12 0.15 0.62 13.33
15.70
8.0 7.19 0.81
0.08 0.43 0.42 0.01
0.12 7.64 9.97
4.0 5.45
-1.45 0.06 -0.77 -0.75 0.02
-0.19 1.53 3.57
6.0
4.47 1.53 0.05 0.80
0.79 0.02 0.18 6.61 8.45
PHONE: 713.784.3131 FAX:713.781.9260 |