|
Computes a lack of fit test based on near replicates for a fitted regression model.
X — NOBS by NCOL matrix containing the data. (Input)
IIND — Independent variable option. (Input)
IIND Meaning
< 0 The first −IIND columns of X contain the independent (explanatory) variables.
> 0 The IIND independent variables are specified by the column numbers in INDIND.
= 0 There are no independent variables.
There are NCOEF = INTCEP + |IIND| regressors—the intercept (if INTCEP = 1) and the independent variables.
INDIND — Index
vector of length IIND containing the
column numbers of X that are the
independent variables. (Input, if IIND is positive)
If IIND is
nonnegative, INDIND is not
referenced and can be a vector of length one.
IRSP — Column number IRSP of X contains data for the response (dependent) variable. (Input)
B — Vector of length NCOEF containing a least-squares solution
for the regression coefficients. (Input)
R — NCOEF
by NCOEF
upper triangular matrix containing the R matrix. (Input)
The R matrix can come
from a regression fit based on a QR decomposition of the matrix of
regressors or based on a Cholesky factorization RTR of the
matrix of sums of squares and crossproducts of the regressors. Elements to the
right of a diagonal element of R that is zero must also be zero. A
zero row indicates a nonfull rank model. For an
R matrix that comes from a
regression fit with linear equality restrictions on the parameters, each row of
R corresponding to a
restriction must have a corresponding diagonal element that is negative. The
remaining rows of R must
have positive diagonal elements. Only the upper triangle of R is referenced.
DFE — Degrees of freedom for error from the fitted regression. (Input)
SSE — Sum of squares for error from the fitted regression. (Input)
NGROUP — Number
of groups. (Input)
A cluster analysis based on NGROUP groups is
performed. A good choice for NGROUP is the number
of groups of near replicates in the data set.
IGROUP — Vector
of length NOBS
specifying group numbers. (Input, if ICLUST = 0;
output, if ICLUST ≥ 1)
IGROUP(I) = J means row I of X is in the J-th group of near
replicates
(J = 0, 1, 2, …, NGROUP). Here, J = 0 indicates the
group of observations not used in the analysis because NaN (not a number) was
input for one or more of the values of the response, independent, frequency, or
weight variables.
TESTLF — Vector of length 10 containing statistics relating to the test for lack of fit of the model. (Output)
Elem. Description
1 Degrees of freedom for lack of fit.
2 Degrees of freedom for error from the expanded model (one-way analysis of covariance model using clusters of near replicates as the groups).
3 Degrees of freedom for error (DFE = TESTLF(1) + TESTLF(2)).
4 Sum of squares for lack of fit.
5 Sum of squares for error from the expanded model.
6 Sum of squares for error (SSE = TESTLF(4) + TESTLF(5)).
7 Mean square for lack of fit.
8 Mean square for error from the expanded model.
9 F statistic.
10 p-value.
NOBS — Number of
observations. (Input)
Default: NOBS = size (X,1).
NCOL — Number of
columns in X.
(Input)
Default: NCOL = size (X,2).
LDX — Leading
dimension of X
exactly as specified in the dimension statement in the calling
program. (Input)
Default: LDX = size (X,1).
INTCEP —
Intercept option. (Input)
Default: INTCEP = 1.
INTCEP Action
0 An intercept is not in the model.
1 An intercept is in the model.
IFRQ — Frequency
option. (Input)
IFRQ = 0 means that
all frequencies are 1.0. For positive IFRQ, column number
IFRQ of X contains the
frequencies.
Default: IFRQ = 0.
IWT — Weighting
option. (Input)
IWT = 0 means that all
weights are 1.0. For positive IWT, column number
IWT of X contains the
weights.
Default: IWT = 0.
LDR — Leading
dimension of R
exactly as specified in the dimension statement in the calling
program. (Input)
Default: LDR = size (R,1).
ICLUST —
Clustering option. (Input)
Default: ICLUST = 1.
ICLUST Meaning
0 Cluster groups are input in IGROUP.
1 Cluster groups are obtained using Euclidean distance.
2 Cluster groups are obtained using Mahalanobis distance.
MAXIT — Maximum
number of iterations for the cluster analysis to determine near
replicates. (Input, if ICLUST is positive,
otherwise, MAXIT
is not referenced)
MAXIT = 30 is usually
sufficient for convergence.
Default: MAXIT = 30.
TOL — Tolerance
used in determining linear dependence for the one-way analysis of covariance
model using clusters as the groups. (Input)
TOL
= EPS2∕3 is a good choice. For RLOFN,
EPS = AMACH(4).
See documentation for AMACH
in Reference Material.
Default: TOL = 2.4e-5 for
single precision and 3.6d – 11 for double precision.
Generic: CALL RLOFN (X, IIND, INDIND, IRSP, B, R, DFE, SSE, NGROUP, IGROUP, TESTLF [,…])
Specific: The specific interface names are S_RLOFN and D_RLOFN.
Single: CALL RLOFN (NOBS, NCOL, X, LDX, INTCEP, IIND, INDIND, IRSP, IFRQ, IWT, B, R, LDR, DFE, SSE, ICLUST, MAXIT, TOL, NGROUP, IGROUP, TESTLF)
Double: The double precision name is DRLOFN.
Routine RLOFN computes a lack of fit test based on near replicates for a fitted regression model. The data need not be sorted prior to invoking RLOFN. The column indices of X for determining near replicates must correspond to the independent variables in the original fitted model. If the groups of near replicates are known prior to invoking RLOFN, the option ICLUST = 0 allows RLOFN to bypass the computation of the groups.
The data can contain missing values indicated by NaN. (NaN is AMACH(6). Routine AMACH is described in the section “Machine-Dependent Constants” in the Reference Material.) For ICLUST equal to 1 or 2, any row of X containing NaN as a value for the response, weight, frequency, or independent variables is omitted from the analysis. For ICLUST equal to 0, if the i-th row of X contains NaN for one of the variables in the analysis, the i-th element of IGROUP must be 0 on input.
Routine KMEAN
(see Chapter 11, Cluster Analysis;) is used to compute k clusters or groups of
near replicates. Prior to invoking KMEAN,
a detached sort of the independent variables in the regression model is
performed using routine SROWR
(See Chapter 19, Utilities.p<.STCH19.DOC!SROWR;;). If there are fewer than
NGROUP
distinct observations, a warning message is issued and k is set equal to the
number of distinct observations. Otherwise, k equals NGROUP.
For purposes of the cluster analysis,
ICLUST
= 1 specifies Euclidean distance and ICLUST
= 2 specifies Mahalanobis distance. For Mahalanobis distance, the data are
transformed before invoking KMEAN
so that the Euclidean metric applied by KMEAN
for the transformed data is equivalent to the sample Mahalanobis distance for
the original (untransformed) data.
Let X be the n × p matrix of regressors, and let R be the upper triangular matrix computed from the fitted regression model. The matrix R can be computed by routines RGLM, RGIVN, or RLEQU for fitting the regression model. A linear equality restriction on the regression parameters corresponds to a row of R with a negative diagonal element. Let D be a p × p diagonal matrix with diagonal elements
Let
be the i-th row of X, and let ti = Dsi where si satisfies
RTsi = xi
Then, the Mahalanobis distance from xi to xj equals the Euclidean distance from ti to tj because
Once the clusters are identified by KMEAN
an expanded regression model—a one-way analysis of covariance model–is fitted to
the original (untransformed) data. Denote the original model by
y =
X β
+ ɛ and the
expanded model by y = X β + Z γ +
ɛ. The
added regressors that are contained in the n × k matrix Z in the
expanded model are indicator variables specifying cluster membership. The lack
of fit test that is computed is an exact test of the hypothesis that γ = 0 in
the expanded model. This test was proposed as a lack of fit test by Christensen
(1989).
Let SSE(X, Z) be the error sum of squares from the fit of the expanded model and let SSE(X) be the error sum of squares from the fit of the original model. The lack of fit sum of squares is SSE(X) − SSE(X, Z) and the lack of fit degrees of freedom are DFE(X) − DFE (X, Z). The F statistic for the test of the null hypothesis of no lack of fit is
Under the hypothesis of no lack of fit, the computed F has an F distribution with numerator and denominator degrees of freedom DFE(X) − DFE(X, Z) and DFE(X, Z), respectively. The p-value for the test is computed as the probability that a random variable with this distribution is greater than or equal to the computed F statistic.
The error degrees of freedom and error sum of squares from the fit of the expanded model are computed as the error degrees of freedom and sum of squares from the reduced model where Z and y have been adjusted for X. Routine RCOV is used to fit the reduced model. Let e be the vector of residuals from the original fitted model, let W be the diagonal matrix whose i-th diagonal element is the product of the weight and frequency for the i-th observation. The sum of squares and crossproducts matrix for the adjusted Z and y in the reduced model, which is input into RCOV, is
where A is a solution of RTA = DXTW Z.
1. Workspace may be explicitly provided, if desired, by use of R2OFN/DR2OFN. The reference is:
CALL R2OFN (NOBS, NCOL, X, LDX, INTCEP, IIND, INDIND, IRSP, FRQ, IWT, B, R, LDR, DFE, SSE, ICLUST, MAXIT, TOL, NGROUP, IGROUP, TESTLF, IWK, WK)
The additional arguments are as follows:
IWK — Work array of length 3 * NOBS + |IIND| + NGROUP + 3 + max{m + 2.8854 * ln(m) + 2, 3 * NGROUP, NCOEF}, if ICLUST is positive. If ICLUST = 0, IWK can be an array of length 1.
WK — Work array of length LWK.
2. Informational errors
Type Code
3 1 Convergence did not occur in the cluster analysis for the lack of fit test within MAXIT iterations. Better results may be obtained by increasing MAXIT.
4 2 An invalid weight or frequency is encountered. Weights and frequencies must be nonnegative.
3 3 The matrix of sum of squares and crossproducts computed for the within cluster model for testing lack of fit is not nonnegative definite within the tolerance defined by TOL.
4 4 At least one element in the columns containing the independent variables, IRSP, IFRQ, or IWT of X contains NaN (not a number), but the corresponding element in IGROUP is not zero. When ICLUST = 0, missing values in a row of X are indicated by setting the corresponding row of IGROUP to zero.
This example uses data from Draper and Smith (1981, page 374), which is input in X. A multiple linear regression of column 6 of X on an intercept and columns 1, 3, and 4 is computed using routine RGIVN. Tests for lack of fit are computed for choices of NGROUP equal to 4 and 6 using routine RLOFN. Note that for NGROUP equal to 6 the results are exactly the same as for routine RLOFE. (If there are exact replicates in the data and the number of clusters used by RLOFN equals the number of distinct cases of the independent variables, then RLOFN and RLOFE produce the same output.)
USE IMSL_LIBRARIES
IMPLICIT NONE
INTEGER LDB, LDR, LDSCPE, LDX, NCOEF, NCOL, NDEP, &
NIND, NOBS, J, INTCEP
PARAMETER (INTCEP=1, NCOL=6, NDEP=1, NIND=3, NOBS=20, &
LDSCPE=NDEP, LDX=NOBS, NCOEF=INTCEP+NIND, LDB=NCOEF, &
LDR=NCOEF)
!
INTEGER ICLUST, IDEP, IGROUP(NOBS), IIND, INDDEP(NDEP), &
INDIND(NIND), IRSP, NGROUP, NOUT, NRMISS, NROW
REAL B(LDB,NDEP), DFE, R(LDR,NCOEF), SCPE(LDSCPE,NDEP), &
SSE, TESTLF(10), X(LDX,NCOL)
!
DATA (X(1,J),J=1,6)/1.0, 1.0, 1.0, 0.0, 1.0, 246.0/
DATA (X(2,J),J=1,6)/1.0, 0.0, 1.0, 0.0, 1.0, 252.0/
DATA (X(3,J),J=1,6)/1.0, 1.0, 1.0, 0.0, 1.0, 253.0/
DATA (X(4,J),J=1,6)/0.0, 1.0, 1.0, 1.0, 0.0, 164.0/
DATA (X(5,J),J=1,6)/1.0, 1.0, 0.0, 0.0, 1.0, 203.0/
DATA (X(6,J),J=1,6)/0.0, 1.0, 1.0, 1.0, 0.0, 173.0/
DATA (X(7,J),J=1,6)/1.0, 1.0, 0.0, 0.0, 1.0, 210.0/
DATA (X(8,J),J=1,6)/1.0, 0.0, 1.0, 0.0, 1.0, 247.0/
DATA (X(9,J),J=1,6)/0.0, 1.0, 0.0, 1.0, 0.0, 120.0/
DATA (X(10,J),J=1,6)/0.0, 1.0, 1.0, 1.0, 0.0, 171.0/
DATA (X(11,J),J=1,6)/0.0, 1.0, 1.0, 1.0, 0.0, 167.0/
DATA (X(12,J),J=1,6)/0.0, 0.0, 1.0, 1.0, 0.0, 172.0/
DATA (X(13,J),J=1,6)/1.0, 1.0, 1.0, 0.0, 1.0, 247.0/
DATA (X(14,J),J=1,6)/1.0, 1.0, 1.0, 0.0, 1.0, 252.0/
DATA (X(15,J),J=1,6)/1.0, 0.0, 1.0, 0.0, 1.0, 248.0/
DATA (X(16,J),J=1,6)/0.0, 1.0, 1.0, 1.0, 0.0, 169.0/
DATA (X(17,J),J=1,6)/0.0, 1.0, 0.0, 0.0, 0.0, 104.0/
DATA (X(18,J),J=1,6)/0.0, 1.0, 1.0, 1.0, 0.0, 166.0/
DATA (X(19,J),J=1,6)/0.0, 1.0, 1.0, 1.0, 0.0, 168.0/
DATA (X(20,J),J=1,6)/0.0, 1.0, 1.0, 0.0, 0.0, 148.0/
DATA INDIND/1, 3, 4/, INDDEP/6/
!
NROW = NOBS
IIND = NIND
IDEP = NDEP
CALL RGIVN (X, IIND, INDIND, IDEP, INDDEP, B, R=R, DFE=DFE, SCPE=SCPE)
SSE = SCPE(1,1)
IRSP = 6
ICLUST = 2
DO 10 NGROUP=4, 6, 2
CALL RLOFN (X, IIND, INDIND, IRSP, B(1:, 1), R, DFE, SSE, NGROUP, &
IGROUP, TESTLF, ICLUST=ICLUST)
CALL UMACH (2, NOUT)
WRITE (NOUT,*) ' '
WRITE (NOUT,*) 'NGROUP = ', NGROUP
CALL WRIRN ('IGROUP', IGROUP, 1, NOBS, 1)
WRITE (NOUT,*) ' '
WRITE (NOUT,99999) ' Test for Lack of '// &
'Fit'
WRITE (NOUT,99999) ' Sum of Mean '// &
' Prob. of'
WRITE (NOUT,99999) ' Source of Error DF Squares Square '// &
' F Larger F'
WRITE (NOUT,99999) ' Lack of Fit ', TESTLF(1), TESTLF(4), &
TESTLF(7), TESTLF(9), TESTLF(10)
WRITE (NOUT,99999) ' Expanded model ', TESTLF(2), TESTLF(5), &
TESTLF(8)
WRITE (NOUT,99999) ' Original model ', TESTLF(3), TESTLF(6)
10 CONTINUE
99999 FORMAT (A, F5.1, F9.1, F8.2, F7.3, F10.3)
END
NGROUP =
4
IGROUP
1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19 20
4
4 4 4 2 4 2
4 2 4 4 4
4 4 4 4 1
4 4
3
Test for Lack of
Fit
Sum of Mean
Prob. of
Source of Error DF Squares
Square F Larger F
Lack of
Fit 1.0
0.4 0.38 0.035 0.855
Expanded
model 15.0 163.6 10.90
Original
model 16.0 163.9
NGROUP =
6
IGROUP
1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19 20
6
6 6 4 5 4 5
6 2 4 4 4
6 6 6 4 1
4 4
3
Test for Lack of
Fit
Sum of Mean
Prob. of
Source of Error DF Squares
Square F Larger F
Lack of
Fit 2.0 20.5
10.25 1.001 0.393
Expanded model
14.0 143.4 10.24
Original model
16.0 163.9
This example uses the same data and model from Example 1. Here, the option ICLUST = 0 is input so that the group numbers for performing the lack of fit test are input.
USE IMSL_LIBRARIES
IMPLICIT NONE
INTEGER LDB, LDR, LDSCPE, LDX, NCOEF, NCOL, NDEP, &
NIND, NOBS, J, INTCEP
PARAMETER (INTCEP=1, NCOL=6, NDEP=1, NIND=3, NOBS=20, &
LDSCPE=NDEP, LDX=NOBS, NCOEF=INTCEP+NIND, LDB=NCOEF, &
LDR=NCOEF)
!
INTEGER ICLUST, IDEP, IGROUP(NOBS), IIND, &
INDDEP(NDEP), INDIND(NIND), IRSP, &
NGROUP, NOUT
REAL B(LDB,NDEP), DFE, R(LDR,NCOEF), SCPE(LDSCPE,NDEP), &
SSE, TESTLF(10), TOL, X(LDX,NCOL), &
XMAX(NCOEF), XMIN(NCOEF)
!
DATA (X(1,J),J=1,6)/1.0, 1.0, 1.0, 0.0, 1.0, 246.0/
DATA (X(2,J),J=1,6)/1.0, 0.0, 1.0, 0.0, 1.0, 252.0/
DATA (X(3,J),J=1,6)/1.0, 1.0, 1.0, 0.0, 1.0, 253.0/
DATA (X(4,J),J=1,6)/0.0, 1.0, 1.0, 1.0, 0.0, 164.0/
DATA (X(5,J),J=1,6)/1.0, 1.0, 0.0, 0.0, 1.0, 203.0/
DATA (X(6,J),J=1,6)/0.0, 1.0, 1.0, 1.0, 0.0, 173.0/
DATA (X(7,J),J=1,6)/1.0, 1.0, 0.0, 0.0, 1.0, 210.0/
DATA (X(8,J),J=1,6)/1.0, 0.0, 1.0, 0.0, 1.0, 247.0/
DATA (X(9,J),J=1,6)/0.0, 1.0, 0.0, 1.0, 0.0, 120.0/
DATA (X(10,J),J=1,6)/0.0, 1.0, 1.0, 1.0, 0.0, 171.0/
DATA (X(11,J),J=1,6)/0.0, 1.0, 1.0, 1.0, 0.0, 167.0/
DATA (X(12,J),J=1,6)/0.0, 0.0, 1.0, 1.0, 0.0, 172.0/
DATA (X(13,J),J=1,6)/1.0, 1.0, 1.0, 0.0, 1.0, 247.0/
DATA (X(14,J),J=1,6)/1.0, 1.0, 1.0, 0.0, 1.0, 252.0/
DATA (X(15,J),J=1,6)/1.0, 0.0, 1.0, 0.0, 1.0, 248.0/
DATA (X(16,J),J=1,6)/0.0, 1.0, 1.0, 1.0, 0.0, 169.0/
DATA (X(17,J),J=1,6)/0.0, 1.0, 0.0, 0.0, 0.0, 104.0/
DATA (X(18,J),J=1,6)/0.0, 1.0, 1.0, 1.0, 0.0, 166.0/
DATA (X(19,J),J=1,6)/0.0, 1.0, 1.0, 1.0, 0.0, 168.0/
DATA (X(20,J),J=1,6)/0.0, 1.0, 1.0, 0.0, 0.0, 148.0/
DATA INDIND/1, 3, 4/, INDDEP/6/
DATA IGROUP/4*4, 2, 4, 2, 4, 2, 7*4, 1, 2*4, 3/
!
IIND = NIND
IDEP = NDEP
CALL RGIVN (X, IIND, INDIND, IDEP, INDDEP, B, R=R, DFE=DFE, SCPE=SCPE)
SSE = SCPE(1,1)
IRSP = 6
ICLUST = 0
NGROUP = 4
CALL RLOFN (X, IIND, INDIND, IRSP, B(1:, 1), R, DFE, SSE, NGROUP, &
IGROUP, TESTLF, iclust=iclust)
CALL UMACH (2, NOUT)
WRITE (NOUT,*) ' '
WRITE (NOUT,*) 'NGROUP = ', NGROUP
CALL WRIRN ('IGROUP', IGROUP, 1, NOBS, 1)
WRITE (NOUT,*) ' '
WRITE (NOUT,99999) ' Test for Lack of '// &
'Fit'
WRITE (NOUT,99999) ' Sum of Mean '// &
' Prob. of'
WRITE (NOUT,99999) ' Source of Error DF Squares Square '// &
' F Larger F'
WRITE (NOUT,99999) ' Lack of Fit ', TESTLF(1), TESTLF(4),&
TESTLF(7), TESTLF(9), TESTLF(10)
WRITE (NOUT,99999) ' Expanded model ', TESTLF(2), TESTLF(5),&
TESTLF(8)
WRITE (NOUT,99999) ' Original model ', TESTLF(3), TESTLF(6)
99999 FORMAT (A, F5.1, F9.1, F8.2, F7.3, F10.3)
END
NGROUP =
4
IGROUP
1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19 20
4
4 4 4 2 4 2
4 2 4 4 4
4 4 4 4 1
4 4
3
Test for Lack of
Fit
Sum of Mean
Prob. of
Source of Error DF Squares
Square F Larger F
Lack of
Fit 1.0
0.4 0.38 0.035 0.855
Expanded
model 15.0 163.6 10.90
Original
model 16.0 163.9
PHONE: 713.784.3131 FAX:713.781.9260 |