Computes model estimates and associated statistics for a hierarchical log-linear model.
NCLVAL — Vector of length NCLVAR containing, in its i-th element, the number of levels or categories of the i-th classification variable. (Input)
TABLE — Vector of
length NCLVAL(1)
* NCLVAL(2)
* … * NCLVAL(NCLVAR)
containing the entries in the cells of the table to be fit. (Input)
See Comment 3 for
comments on the ordering of the elements of TABLE.
NVEF — Vector of length NEF containing the number of classification variables associated with each effect. (Input)
INDEF — Vector of
length NVEF(1) +
… + NVEF(NEF) containing, in
consecutive positions, the indices of the variables that are included in each
effect. (Input)
The entries in INDEF are sequenced so
that the first NVEF(1) elements
contain the indices of the variables in effect 1, the next NVEF(2) elements of
INDEF contain
the indices of the variables in effect 2, etc. See Comment 4 for an
example.
FIT — Vector of
length NCLVAL(1)
* NCLVAL(2)
* … * NCLVAL(NCLVAR)
containing the model estimates of the cell frequencies.
(Input/Output)
On input, FIT
contains the initial estimates of the cell counts. Structural zeros in the model
are specified by setting the corresponding element of FIT
to 0.0. All other elements of FIT
may be set to 1.0 if no other estimate of the expected cell counts is available.
On output, FIT
contains the fitted table. See Comment 3 for the
ordering of the elements of FIT.
If an element of FIT
is positive but the corresponding element in TABLE is zero, then
the element is called a sampling zero. Sampling zeros may effect the number of
parameters that can be estimated, but they will not effect the degrees of
freedom in chi-squared tests. See the Description section.
NCOEF — Number of regression coefficients in the model. (Output)
COEF — NCOEF by 4 matrix
containing the estimated coefficients and associated statistics.
(Output)
Dummy variables used in fitting the log-linear model are generated
using the IDUMMY = 3
option of routine GRGLM (see Chapter 2, Regression;). For this option, the k-th dummy
variable for classification variable I is the (0, 1)
indicator variable for the k-th level of the classification variable
minus the (0, 1) indicator variable for the NCLVAL(I)-th level of the
classification variable.
Column |
Statistic |
1 |
Coefficient estimate |
2 |
Estimated standard error of the estimated coefficient |
3 |
Asymptotic normal score for testing that the coefficient is zero |
4 |
p-value associated with the normal score in column 3 (two-sided alternative). |
COV — NCOEF by NCOEF covariance matrix for the estimated parameters. (Output)
RESID — NCLVAL(1) * NCLVAL(2) * … * NCLVAL(NCLVAR) by 4 matrix containing residual statistics for each cell in the table. (Output)
Column |
Statistics |
1 |
Signed square root of the contribution to chi-squared |
2 |
Contribution to the likelihood ratio |
3 |
Freeman-Tukey deviate |
4 |
Residual difference |
STAT — Vector of length 4 containing output statistics for the model. (Output)
I |
STAT(I) |
1 |
Log-likelihood. |
2 |
Likelihood ratio statistic for testing the fit of the model. |
3 |
Degrees of freedom in the likelihood ratio statistic. This statistic corrects for parameters that cannot be estimated because of sampling zeros. |
4 |
p-value corresponding to the likelihood ratio statistic. |
NCLVAR — Number
of classification variables. (Input)
A variable specifying a
margin in the table is a classification variable. The first classification
variable is named A, the
second classification variable is named B, etc.
Default: NCLVAR = size (NCLVAL,1).
NEF — Number of
effects in the model. (Input)
A marginal table is implied by
each effect in the model. Lower-order effects should not be included since their
inclusion is automatic in the hierarchical models fit here (e.g., do not include
effects A or B if
effect AB is in the model).
Default: NEF = size (NVEF,1).
EPS — Convergence
criterion. (Input)
Convergence is assumed when the maximum
deviation between an observed and a fitted marginal total is less than EPS. EPS = 0.10 is a
typical value.
Default: EPS = 0.10.
MAXIT — Maximum
number of iterations. (Input)
MAXIT = 15 is a
typical value.
Default: MAXIT = 30
TOL — Tolerance
used in determining linear dependence in COV.
(Input)
TOL
= 100.0 AMACH(4)
is a common choice. See the documentation for routine AMACH
in Reference
Material.
Default: TOL = 1.19e-5 for
single precision and 2.d–14 for double precision.
IPRINT — Printing
option. (Input)
Default: IPRINT = 0.
IPRINT |
Action |
0 |
No printing is performed. |
1 |
TABLE, FIT, RESID, COEF, COV, and STAT are printed. |
LDCOEF — Leading
dimension of COEF exactly as
specified in the dimension statement in the calling program.
(Input)
Default: LDCOEF = size (COEF,1).
LDCOV — Leading
dimension of COV
exactly as specified in the dimension statement in the calling
program. (Input)
Default: LDCOV = size (COV,1).
LDRESI — Leading
dimension of RESID exactly as
specified in the dimension statement in the calling program.
(Input)
Default: LDRESI = size (RESID,1).
Generic: CALL CTLLN (NCLVAL, TABLE, NVEF, INDEF, FIT, NCOEF, COEF, COV, RESID, STAT [,…])
Specific: The specific interface names are S_CTLLN and D_CTLLN.
Single: CALL CTLLN (NCLVAR, NCLVAL, TABLE, NEF, NVEF, INDEF, EPS, MAXIT, TOL, IPRINT, FIT, NCOEF, COEF, LDCOEF, COV, LDCOV, RESID, LDRESI, STAT)
Double: The double precision name is DCTLLN.
Routine CTLLN
computes statistics of interest for a hierarchical model in a log-linear
analysis of a multidimensional contingency table. Among the statistics computed
are the expected cell values, cell residuals, the log-linear parameters and
their estimated variances and covariances, the
log-likelihood for the model
(plus a constant), and a likelihood-ratio test of the model (versus the
alternative that the cell probabilities are free to vary, subject only to the
marginal constraints). In addition, CTLLN
can print and label all statistics that it computes.
Routine PRPFT is used to find the maximum likelihood estimates of the expected cell counts (FIT). These expected values are then used as input to routine CTPAR in order to compute estimates of the parameters in the model and their estimated covariances.
The matrix RESID contains various residuals that may be used in analyzing the model. These residuals are discussed in detail by Bishop, Feinberg, and Holland (1975, pages 136-137), among others. Each is computed from the cell observed (oi) and expected (fitted, fi) values according to the following methods:
1. The signed square root of the contributions to Χ2 are computed as
2. The contributions to the likelihood ratio (G2) are computed as 2oi log(oi/fi)
3. Freeman-Tukey deviates are computed as
4. The residual differences are computed as oi − fi
The log-likelihood STAT(1) is computed as
where n is the number of cells in the table. The likelihood ratio statistic for testing the fit of the model is computed as
which for large samples follows a chi-squared distribution.
The number of degrees of freedom in G2 is computed as the number of cells in the table, excluding structural zeros, minus the number of parameters that could be estimated if there were no sampling zeros. When there are either structural or sampling zeros in the model, some parameters may not be estimable because they are infinite. Parameters that cannot be estimated due to structural zeros are not counted in the number of parameters estimated when computing the degrees of freedom for Χ2. Parameters that cannot be estimated because of sampling zeros are counted as estimated parameters when computing the degrees of freedom for Χ2.
To explain the calculation of degrees of freedom, note that extended maximum likelihood estimates may be written as
where
are coefficient vectors, and ρ → ∞. Routine CTLLN estimates the finite portion of the estimates,
The infinite portion,
ensures that the fitted values for zero marginal cells corresponding to a term in the hierarchical model have estimated expectation of zero. Thus, CTLLN fits the finite portion of extended maximum likelihood estimates where the extension is to ±∞. Because the Hessian elements corresponding to infinite parameters are zero, the Hessian is computed from a reduced likelihood in which cells leading to infinite estimates have been eliminated. The user is referred to Clarkson and Jennrich (1991) for details.
1. Workspace may be explicitly provided, if desired, by use of C2LLN/DC2LLN. The reference is:
CALL C2LLN (NCLVAR, NCLVAL, TABLE, NEF, NVEF, INDEF, EPS, MAXIT, TOL, IPRINT, FIT, NCOEF, COEF, LDCOEF, COV, LDCOV, RESID, LDRESI, STAT, AMAR, INDEX, NCVEF, IXEF, IINDEF, IA, INDCL, CLVAL, REG, X, D, XMIN, XMAX, COVWK, WK, IWK)
The additional arguments are as follows.
AMAR — Vector of length equal to the sum over all effects in the model (J = 1 to NEF) of the length of the marginal table required for the effect. The length of each marginal table is computed as the product of the number of class values for each classification variable in the effect (the product of the nonzero elements of NCLVAL(INDEF(I)) where I ranges from K(J) through K(J)+ NVEF(J) − 1. Here, K(1) = 1 and K(J + 1) = K(J) + NVEF(J).)
INDX — Vector of length NEF.
NCVEF — Vector of length 2NCLVAR − 1.
IXEF — Vector of length NCLVAR * 2NCLVAR−1.
IINDEF — Vector of length NVEF(1) + … + NVEF(NEF).
IA — Vector of length NCLVAR.
INDCL — Vector of length NCLVAR.
CLVAL — Vector of length NCLVAL(1) + … + NCLVAL(NCLVAR).
REG — Vector of length NCOEF + 1.
X — Vector of length NCOEF if there exists both structural structural and sampling zeros in TABLE; otherwise, it is of length NCLVAR.
D — Vector of length NCOEF + 1.
XMIN — Vector of length NCOEF.
XMAX — Vector of length NCOEF.
COVWK — Vector of length NCOEF2 if there exists both structural and sampling zeros in TABLE. Otherwise, COVWK is not referenced and can be dimensioned of length one.
WK — Vector of length max(g, NCOEF + 1) if IPRINT = 0; otherwise, WK is of length max(g, 6m, n) where m is the maximal element in NCLVAL, n is the length of TABLE, and g equals the maximum over all effects in the model (J = 1, NEF) of the length of the marginal table required for the effect. The length of the marginal table is computed as the product of the number of class values for each classification variable in the effect (the product of the nonzero elements of NCLVAL(INDEF(I)) where I ranges from K(J) through K(J) + NVEF(J) − 1, where K(1) = 1 and K(J + 1) = K(J) + NVEF(J)).
IWK — Vector of length 2 * NCLVAR + z + 1 where z is the number of structural zeros in TABLE.
2. Informational errors
Type Code
3 1 The optimization algorithm did not converge to the desired accuracy within MAXIT iterations. Some of the estimated statistics may not be accurate.
3 5 The label for one or more of the tables exceeds the buffer limit.
3 11 The label for one or more effects exceeds the buffer limit.
4 2 LDCOEF or LDCOV is less than NCOEF.
3. The cells
of the vectors TABLE
and ZERO are
sequenced so that the first variable cycles from 1 to NCLVAL(1)
the slowest, the second variable cycles from 1 to NCLVAL(2)
the next slowest, etc., up to the NCLVAR-th
variable, which cycles from 1 to NCLVAL(NCLVAR)
the fastest.
Example: For NCLVAR
= 3, NCLVAL(1)
= 2, NCLVAL(2)
= 3, and NCLVAL(3)
= 2, the cells of table X(I, J, K)
are entered into TABLE(1)
through TABLE(12)
in the following order:
X(1,
1, 1), X(1,
1, 2), X(1,
2, 1), X(1,
2, 2), X(1,
3, 1), X(1,
3, 2), X(2, 1, 1),
X(2, 1,
2), X(2,
2, 1), X(2,
2, 2), X(2,
3, 1), X(2,
3, 2). The elements of FIT
are similarly sequenced.
4. INDEF is used to describe the marginal tables to be fit. For example, if NCLVAR = 3 and the first effect is to fit the marginal table for variables 1 and 3 and the second effect is to fit the marginal table for variable 2, then: NEF = 2, NVEF(1) = 2, and NVEF(2) = 1. Since the sum of the NVEF(I) is 3, then INDEF is a vector of length 3 with values: INDEF(1) = 1, INDEF(2) = 3, and INDEF(3) = 2.
The example illustrates the use of CTLLN in a simple four-way table in which the first three factors have two levels, and the fourth factor has three levels. The data, taken from Lee (1977), involve brand preference in different situations.
USE CTLLN_INT
IMPLICIT NONE
INTEGER IPRINT, LDCOEF, LDCOV, LDRESI, LTAB, MAXIT, NCLVAR
REAL EPS
PARAMETER (EPS=0.01, IPRINT=1, LDCOEF=10, LDCOV=10, LDRESI=24, &
LTAB=24, MAXIT=10, NCLVAR=4)
!
INTEGER INDEF(6), NCLVAL(NCLVAR), NCOEF, NEF, NVEF(3)
REAL COEF(LDCOEF,4), COV(LDCOV,LDCOV), FIT(LTAB), &
RESID(LDRESI,4), STAT(4), TABLE(LTAB)
!
DATA TABLE/19, 57, 29, 63, 29, 49, 27, 53, 23, 47, 33, 66, 47, &
55, 23, 50, 24, 37, 42, 68, 43, 52, 30, 42/
DATA NEF/3/, NVEF/2, 2, 2/, INDEF/2, 4, 1, 4, 2, 3/
DATA NCLVAL/3, 2, 2, 2/, FIT/24*1.0/
!
CALL CTLLN (NCLVAL, TABLE, NVEF, INDEF, FIT, NCOEF, COEF, &
COV, RESID, STAT, EPS=EPS, MAXIT=MAXIT, IPRINT=IPRINT)
!
END
Fitted Model: (B*D, A*D, B*C)
Variable
Number of Levels
1
A
3
2
B
2
3
C
2
4
D
2
Model
Statistics
Log-likelihood
3.7906
Likelihood
ratio
11.89
Degrees of
freedom
14.0
P-value
0.6154
Coefficient
Statistics
Standard
Asymptotic
Coefficient Error
Z-statistic P-value
1
intercept
3.6827
0.0333
110.66 0.0000
2
A(1)
-0.0591
0.0475
-1.24 0.2341
3
A(2)
0.0278
0.0461
0.60 0.5562
4
B
-0.0166
0.0331
-0.50 0.6242
5
C
-0.0434
0.0319
-1.36 0.1943
6
D
-0.2783
0.0329
-8.45 0.0000
7
A*D(1)
-0.1016
0.0475
-2.14 0.0506
8
A*D(2)
0.0034
0.0461
0.07 0.9414
9
B*C
-0.1438
0.0319
-4.51 0.0005
10
B*D
-0.0684
0.0328 -2.09
0.0558
------------------------------
Table 1: C = 1 D =
1
B = 1 by A
(column)
1
2
3
Observed
19.00
23.00
24.00
Fit
19.52
23.65 26.09
Root
chi-square
-0.12
-0.13
-0.41
Likelihood
-1.03
-1.29
-4.02
Freeman-Tukey
-0.06
-0.08
-0.37
Residual
-0.52
-0.65
-2.09
B = 2 by A
(column)
1
2
3
Observed
29.00
47.00
43.00
Fit
30.85
37.37 41.23
Root
chi-square
-0.33
1.57
0.28
Likelihood
-3.58
21.54
3.62
Freeman-Tukey
-0.29
1.52
0.31
Residual
-1.85
9.63 1.77
------------------------------
Table 2: C = 1 D =
2
B = 1 by A
(column)
1
2
3
Observed
57.00
47.00
37.00
Fit
47.85
46.99 42.89
Root
chi-square
1.32
0.00
-0.90
Likelihood
19.95
0.03
-10.93
Freeman-Tukey
1.29
0.04
-0.89
Residual
9.15
0.01
-5.89
B = 2 by A
(column)
1
2
3
Observed
49.00 55.00
52.00
Fit
57.52
56.48 51.56
Root
chi-square
-1.12
-0.20
0.06
Likelihood
-15.70
-2.92
0.89
Freeman-Tukey
-1.13
-0.16
0.10
Residual
-8.52
-1.48
0.44
------------------------------
Table 3: C = 2 D =
1
B = 1 by A
(column)
1
2
3
Observed
29.00
33.00
42.00
Fit
28.39
34.40 37.94
Root
chi-square
0.11
-0.24
0.66
Likelihood
1.23
-2.73
8.53
Freeman-Tukey
0.16
-0.20
0.68
Residual
0.61
-1.40
4.06
B = 2 by A
(column)
1
2
3
Observed
27.00
23.00
30.00
Fit
25.24
30.58 33.73
Root
chi-square
0.35
-1.37 -0.64
Likelihood
3.64
-13.10
-7.04
Freeman-Tukey
0.39
-1.41
-0.61
Residual
1.76
-7.58
-3.73
------------------------------
Table 4: C = 2 D =
2
B = 1 by A
(column)
1
2
3
Observed
63.00
66.00
68.00
Fit
69.58
68.32 62.37
Root
chi-square
-0.79
-0.28
0.71
Likelihood
-12.51
-4.57
11.75
Freeman-Tukey
-0.78
-0.25
0.73
Residual
-6.58
-2.32
5.63
B = 2 by A
(column)
1
2
3
Observed
53.00 50.00
42.00
Fit
47.06
46.21 42.18
Root
chi-square
0.87
0.56
-0.03
Likelihood
12.61
7.88
-0.36
Freeman-Tukey
0.87
0.58
0.01
Residual
5.94
3.79
-0.18
Asymptotic Coefficient
Covariance
1
2
3
4
5
1 1.1076E-03
9.7132E-05 -3.5887E-05
4.3244E-05 4.3786E-05
2
2.2562E-03 -1.1408E-03 -3.4043E-11
2.6829E-11
3
2.1232E-03 2.5675E-11 -5.1643E-11
4
1.0968E-03 1.4480E-04
5
1.0146E-03
6
7
8
9 10
1 2.9815E-04 1.3065E-04
-1.6147E-05 1.4480E-04 7.6307E-05
2 1.3065E-04 7.2117E-04
-4.0976E-04 6.2343E-11 -1.0681E-11
3 -1.6147E-05 -4.0976E-04
5.7437E-04 -4.9217E-11 -2.3482E-11
4 7.6307E-05 1.2601E-11
-4.1730E-11 4.3786E-05 2.8917E-04
5 -1.4272E-11 -5.5301E-11
4.2801E-11 4.5231E-06 -4.6962E-11
6 1.0851E-03 9.7132E-05
-3.5887E-05 -4.9749E-11 3.0847E-05
7
2.2562E-03 -1.1408E-03 5.9300E-11
-1.0361E-10
8
2.1232E-03 -2.4481E-11 2.9160E-11
9
1.0146E-03
1.1201E-11
10
1.0743E-03
PHONE: 713.784.3131 FAX:713.781.9260 |