Performs iterative proportional fitting of a contingency table using a loglinear model.
NCLVAL — Vector of length NCLVAR containing, in its i-th element, the number of levels or categories of the i-th classification variable. (Input)
TABLE — Vector of
length NCLVAL(1)
* NCLVAL
(2) * … * NCLVAL(NCLVAR)
containing the entries in the cells of the table to be fit. (Input)
See Comment 3 for
comments on the ordering of the elements of TABLE.
NVEF — Vector of length NEF that contains the number of classification variables associated with each effect. (Input)
INDEF — Vector of
length NVEF(1) +
…+ NVEF(NEF) that contains, in
consecutive positions, the indices of the variables that are included in each
effect. (Input)
The entries in INDEF are sequenced so
that the first NVEF(1) elements
contain the indices of the variables in effect 1, the next NVEF(2) elements of
INDEF contain
the indices of the variables in effect 2, etc. See Comment 4 for an
example.
FIT — Vector of
length NCLVAL(1)
* NCLVAL(2)
* … * NCLVAL(NCLVAR).
(Input/Output)
On input, FIT
contains the initial estimates of the cell counts. Structural zeros in the model
are specified by setting the corresponding element of FIT
to 0.0. All other elements of FIT
must be positive. 1.0 may be used if no other estimate of the cell counts is
available. See Comment 3 for the
ordering of the elements of FIT.
On output, FIT
contains the fitted table.
NCLVAR — Number
of classification variables. (Input)
Default: NCLVAR
= size (NCLVAL,1).
NEF — Number of
effects in the model. (Input)
A marginal table is implied by
each effect in the model. Lower order effects should not be included since their
inclusion is automatic (e.g., do not include effects A or B if effect AB
is in the model).
Default: NEF
= size (NVEF,1).
EPS — Convergence
criterion. (Input)
Convergence is assumed when the maximum
deviation between an observed and a fitted marginal total is less than EPS. EPS = 0.10 is a
typical value.
Default: EPS
= .10.
MAXIT — Maximum
number of iterations. (Input)
MAXIT = 15 is a
typical value.
Default: MAXIT
= 30.
Generic: CALL PRPFT (NCLVAL, TABLE, NVEF, INDEF, FIT [,…])
Specific: The specific interface names are S_PRPFT and D_PRPFT.
Single: CALL PRPFT (NCLVAR, NCLVAL, TABLE, NEF, NVEF, INDEF, EPS, MAXIT, FIT)
Double: The double precision name is DPRPFT.
Routine PRPFT uses the iterative proportional-fitting algorithm to fit a log-linear hierarchical model to a contingency table. Structural zeros are allowed. A hierarchical model is a factorial model in which lower-order terms are always present. Thus, in a three-way table with classification variable names A, B, and C, the following models are all hierarchical models.
Many other hierarchical models exist for the three-way table. Since all hierarchical models can be completely specified by the higher-order interactions (the lower-order interactions will always be present), no lower-order effects are included in model specification.
Corresponding to each hierarchical interaction is a marginal table. Iterations in PRPFT proceed by fitting marginal tables successively until the desired precision is achieved.
A structural zero is a cell in the table that, by design or otherwise, can have no observations, i.e., the count for the cell must be zero. Structural zeros are specified by setting the corresponding element in FIT to zero on input. Routine PRPFT is best suited for tables with no structural zeros and in which the initial estimates input in FIT are all 1. The user should be aware that the algorithm may take (much) longer to converge when this is not the case.
Sampling zeros are cells that are not structural zeros, but for which no count is observed. Routine PRPFT requires the absence of sampling zeros in all marginal tables that are fit. One common way method of achieving this is to add a constant, often 0.5, to each cell prior to fitting the table.
1. Workspace may be explicitly provided, if desired, by use of P2PFT/DP2PFT. The reference is:
CALL P2PFT (NCLVAR, NCLVAL, TABLE, NEF, NVEF, INDEF, EPS, MAXIT, FIT, AMAR, INDEX, WK, IWK)
The additional arguments are as follows.
AMAR — Work vector with length equal to the sum from J = 1 to NEF of the product of the nonzero elements of NCLVAL(INDEF(I)) for I = 1 to NVEF(J).
INDEX — Work vector of length NEF.
WK — Work vector with length equal to the maximum over J = 1 to NEF of the product of the nonzero elements of NCLVAL(INDEF(I)), for I = 1 to NVEF(J).
IWK — Work vector of length 2 * NCLVAR.
2. Informational errors
Type Code
3 11 The algorithm did not converge to the desired accuracy within MAXIT iterations.
4 12 A marginal total for an effect is zero. Since FIT indicates this is not a structural zero, the algorithm will not converge properly. One way to proceed is to add a constant to all cells in the table.
3. The cells
of the vectors TABLE and FIT
are sequenced so that the first variable cycles from 1 to NCLVAL(1),
which is the slowest, the second variable cycles from 1 to NCLVAL(2),
which is the next slowest, etc., up to the NCLVAR-th
variable, which cycles from 1 to NCLVAL(NCLVAR)
the fastest.
Example. For NCLVAR
= 3, NCLVAL(1)
= 2, NCLVAL(2)
= 3, and NCLVAL(3)
= 2, the cells of table X(I, J, K) are entered into
TABLE(1) through
TABLE(12) in the
following order.
X(1, 1, 1), X(1, 1, 2), X(1, 2, 1), X(1, 2, 2), X(1, 3, 1), X(1, 3, 2), X(2, 1, 1),
X(2, 1, 2),
X(2, 2, 1),
X(2, 2, 2),
X(2, 3, 1),
X(2, 3, 2). The
elements of FIT
are similarly sequenced.
4. INDEF is used to describe the marginal tables to be fit. For example, if NCLVAR = 3 and the first effect is to fit the marginal table for variables 1 and 3 and the second effect is to fit the marginal table for variable 2, then: NEF = 2, NVEF(1) = 2, and NVEF(2) = 1.
Since the sum of the NVEF(I) is 3, then INDEF is a vector of length 3 with values. INDEF (1) = 1, INDEF(2) = 3, and INDEF(3) = 2.
5. Typically, MAXIT = 5 is sufficient. If PRPFT does not converge, try using double precision, increasing MAXIT, or using the values output in FIT as input for another call to PRPFT.
The following example is taken from Bishop, Feinberg, and
Holland (1975, page 87). The data are originally from Bartlett (1935). This
example examines the survival of plants (factor A = factor 2) at
different values for time of planting (factor C = factor 3) and length of
cutting
(factor B = factor 1). The sample size for each level of
B and C is fixed at 240.
B | ||||||||
|
|
1 |
|
|
2 |
| ||
|
|
A |
|
|
A |
| ||
|
|
1 |
2 |
|
|
1 |
2 |
|
C |
1 |
156 |
84 |
C |
1 |
84 |
156 |
|
|
2 |
107 |
133 |
|
2 |
31 |
209 |
|
The model to be fit is given by:
where mijk is the cell expected value for levels i, j, and k of factors A, B, and C, respectively.
USE PRPFT_INT
USE UMACH_INT
IMPLICIT NONE
INTEGER NCLVAR, NEF
PARAMETER (NCLVAR=3, NEF=3)
!
INTEGER INDEF(6), MAXIT, NCLVAL(NCLVAR), NOUT, NVEF(NEF)
REAL EPS, FIT(8), TABLE(8)
!
DATA NCLVAL/2, 2, 2/, NVEF/2, 2, 2/
DATA INDEF/1, 2, 1, 3, 2, 3/, EPS/0.0001/, MAXIT/15/
DATA TABLE/156, 107, 84, 31, 84, 133, 156, 209/
DATA FIT/8*1.0/
!
CALL PRPFT (NCLVAL, TABLE, NVEF, INDEF, FIT, EPS=EPS, MAXIT=MAXIT)
!
CALL UMACH (2, NOUT)
WRITE (NOUT,99999) FIT
99999 FORMAT (' FIT =', 8F7.1)
END
FIT = 161.1 101.9 78.9 36.1 78.9 138.1 161.1 203.9
PHONE: 713.784.3131 FAX:713.781.9260 |