Generates regressors for a general linear model.
X — NROW by NCOL matrix containing the data. (Input)
INDCL — Index vector of length NCLVAR containing the column numbers of X that are the classification variables. (Input)
NCLVAL — Vector
of length NCLVAR
containing the number of values taken on by each classification
variable. (Input)
NCLVAL(I) is the number of
distinct values for the I-th classification
variable.
CLVAL — Vector of
length NCLVAL(1)
+ NCLVAL(2) +
… + NCLVAL(NCLVAR) containing the
values of the classification variables. (Input)
The first NCLVAL(1) elements
contain the values of the first classification variable. The next NCLVAL(2) elements
contain the values of the second classification variable. … The last NCLVAL(NCLVAR)
elements contain the values of the last classification variable.
NVEF — Vector of length NEF containing the number of variables associated with each effect in the model. (Input)
INDEF — Index
vector of length NVEF(1)
+ NVEF(2)
+ … + NVEF(NEF).
(Input)
The first NVEF(1)
elements give the column numbers of X for each variable in
the first effect. The next NVEF(2)
elements give the column numbers for each variable in the second effect. … The last NVEF(NEF) elements give the
column numbers for each variable in the last effect.
NREG — Number of columns in REG. (Output)
REG — NROW
by NREG matrix
containing the regressor variables generated from the matrix X.
(Output, if IDUMMY
> 0)
Since, in general, NREG will not be known
in advance, the user may need to invoke GRGLM first with IDUMMY
< 0, dimension REG, and then invoke
GRGLM with
IDUMMY >
0.
NROW — Number of
rows of data in X.
(Input)
Default: NROW
= size (X,1).
NCOL — Number of
columns in X.
(Input)
Default: NCOL
= size (X,2).
LDX — Leading
dimension of X
exactly as specified in the dimension statement in the calling
program. (Input)
Default: LDX
= size (X,1).
NCLVAR — Number
of classification variables. (Input)
Default: NCLVAR
= size (INDCL,1).
NEF — Number of
effects (sources of variation) in the model. (Input)
Default:
NEF
= size (NVEF,1).
IDUMMY — Dummy
variable option. (Input)
Default: IDUMMY
= 1.
Some indicator variables are defined for the I-th class variable as
follows: Let J = NCLVAL(1) + NCLVAL(2) + … + NCLVAL(I − 1). NCLVAL(I) indicator variables
are defined such that for K = 1, 2, …, NCLVAL(I) the K-th indicator
variable for observation number IOBS takes the value
1.0 if X(IOBS, INDCL(I)) = CLVAL(J + K) and equals 0.0
otherwise. Dummy variables are generated from these indicator variables in one
of the three following ways:
IDUMMY |
Method |
-1, 1 |
The NCLVAL(I) indicator variables are the dummy variables. |
-2, 2 |
The first NCLVAL(I) − 1 indicator variables are the dummy variables. The last indicator variable is omitted. |
-3, 3 |
The K-th indicator variable minus the NCLVAL(I)-th indicator variable is the K-th dummy variable (K = 1, 2, …, NCLVAL(I) − 1). |
If IDUMMY < 0, only NREG is computed; and X, CLVAL, and REG are not referenced.
LDREG — Leading
dimension of REG
exactly as specified in the dimension statement in the calling
program. (Input)
Default: LDREG
= size (REG,1).
NRMISS — Number
of rows of REG
containing NaN (not a number). (Output)
A row of REG contains NaN for a
regressor when any of the variables involved in generation of the regressor
equals NaN or if a value of one of the classification variables in the model is
not given by CLVAL.
Generic: CALL GRGLM (X, INDCL, NCLVAL, CLVAL, NVEF, INDEF, NREG, REG [,…])
Specific: The specific interface names are S_GRGLM and D_GRGLM.
Single: CALL GRGLM (NROW, NCOL, X, LDX, NCLVAR, INDCL, NCLVAL, CLVAL, NEF, NVEF, INDEF, IDUMMY, NREG, REG, LDREG, NRMISS)
Double: The double precision name is DGRGLM.
Routine GRGLM generates regressors for a general linear model from a data matrix. The data matrix can contain classification variables as well as continuous variables.
Regressors for effects composed solely of continuous
variables are generated as powers and crossproducts. Consider a data matrix
containing continuous variables as columns 3 and 4. The effect indices (3,3)
(stored in INDEF)
generates a regressor whose i-th value is the square of
the
i-th value
in column 3. The effect indices (3,4) generates a regressor whose i-th value is the product
of the i-th value
in column 3 with the i-th value in column 4.
Regressors for an effect (source of variation) composed of
a single classification variable are generated using indicator variables. Let
the classification variable A take on values a1, a2, …, an (stored in CLVAL).
From this classification variable, GRGLM
creates n indicator variables. For
k = 1, 2, …, n we
have
For each classification variable, another set of variables is created from the indicator variables. We call these new variables dummy variables. Dummy variables are generated from the indicator variables in one of three manners:
1. the dummies are the n indicator variables ,
2. the dummies are the first n − 1 indicator variables,
3 the n − 1 dummies are defined in terms of the indicator variables so that for balanced data, the usual summation restrictions are imposed on the regression coefficients.
In particular, for IDUMMY
= 1, the dummy variables are Ak = Ik (k = 1, 2, …, n). For IDUMMY
= 2, the dummy variables are Ak = Ik (k = 1, 2,
…, n − 1). For IDUMMY
= 3, the dummy variables are
Ak = Ik − In (k = 1, 2,
…, n − 1). The
regressors generated for an effect composed of a single classification variable
are the associated dummy variables.
Let mj be the number of dummies generated for the j-th classification variable. Suppose there are two classification variables A and B with dummies
respectively. The regressors generated for an effect composed of two classification variables A and B are
A ⊗ B
More generally, the regressors generated for an effect composed of several classification variables and several continuous variables are given by the Kronecker products of variables, where the order of the variables is specified in INDEF. Consider a data matrix containing classification variables in columns 1 and 2 and continuous variables in columns 3 and 4. Label these four columns A, B, X1, and X2. The regressors generated by the effect indices (1, 2, 3, 3, 4) is A ⊗ B ⊗ X1X1X2.
Let the data matrix X = (A,
B, X1) where A and B are
classification variables, and X1 is a continuous variable. The model
containing the effects A,
B, AB, X1, AX1, BX1 and ABX1 is specified as follows:
NCLVAR
= 2, INDCL = (1,
2), NEF = 7,
NVEF
= (1, 1, 2, 1, 2, 2, 3), and
INDEF = (1, 2, 1, 2,
3, 1, 3, 2, 3, 1, 2, 3).
For this model, suppose NCLVAL(1) = 2, NCLVAL(2) = 3, and CLVAL= (1.0, 2.0, 1.0, 2.0, 3.0). Let A1, B1, B2, and B3 be the associated indicator variables. Given below, for each IDUMMY option, are the regressors in their order of appearance in REG.
IDUMMY |
REG |
1 |
A1, A2, B1, B2, B3, A1B1, A1B2, A1B3, A2B1, A2B2, A2B3, X1, A1X1, A2X1, B1X1, B2X1, B3X1, A1B1X1, A1B2X1, A1B3X1, A2B1X1, A2B2X1, A2B3X1 |
2 |
A1, B1, B2, A1B1, A1B2, X1, A1X1, B1X1, B2X1, A1B1X1, A1B2X1 |
3 |
A1 − A2, B1 − B3, B2 − B3, (A1 − A2)(B1 − B2), (A1 − A2)(B2 − B3), X1, (A1 − A2)X1, (B1 − B3)X1, (B2 − B3)X1, (A1 − A2)(B1 − B2)X1, (A1 − A2)(B2 − B3)X1 |
Within a group of regressors corresponding to an interaction effect, the indicator variables composing the regressors vary most rapidly for the last classification variable, vary next most rapidly for the next to last classification variable, etc.
In this example, regressors are generated for a two-way analysis-of-covariance model containing all the interaction terms. The model could be fitted by a subsequent invocation of routine RGIVN with INTCEP = 1. The regressors generated with the option IDUMMY = 2 are for the model whose mean function is
μ + α i+ βj+ γij+ δxij+ ζ ixij+ ηjxij+ θ ijxij i = 1, 2; j = 1, 2, 3
where α2 = β3 = γ13= γ21= γ22= γ23= ζ2 = η3 = θ13= θ21= θ22= θ23= 0.
USE GRGLM_INT
USE UMACH_INT
USE WRRRL_INT
IMPLICIT NONE
INTEGER LDREG, LDX, LINDEF, MAXCL, NCLVAR, NCOL, NDREG, NEF, &
NROW
PARAMETER (LINDEF=12, MAXCL=5, NCLVAR=2, NCOL=3, NDREG=20, &
NEF=7, NROW=6, LDREG=NROW, LDX=NROW)
!
INTEGER IDUMMY, INDCL(NCLVAR), INDEF(LINDEF), J, &
NCLVAL(NCLVAR), NOUT, NREG, NRMISS, NVEF(NEF)
REAL CLVAL(MAXCL), REG(LDREG,NDREG), X(LDX,NCOL)
CHARACTER CLABEL(12)*7, RLABEL(1)*7
!
DATA INDCL/1, 2/, NCLVAL/2, 3/, CLVAL/1.0, 2.0, 1.0, 2.0, 3.0/
DATA NVEF/1, 1, 2, 1, 2, 2, 3/, INDEF/1, 2, 1, 2, 3, 1, 3, 2, 3, &
1, 2, 3/
DATA (X(1,J),J=1,NCOL)/1.0, 1.0, 1.11/
DATA (X(2,J),J=1,NCOL)/1.0, 2.0, 2.22/
DATA (X(3,J),J=1,NCOL)/1.0, 3.0, 3.33/
DATA (X(4,J),J=1,NCOL)/2.0, 1.0, 4.44/
DATA (X(5,J),J=1,NCOL)/2.0, 2.0, 5.55/
DATA (X(6,J),J=1,NCOL)/2.0, 3.0, 6.66/
DATA RLABEL/'NUMBER'/, CLABEL/' ', 'ALPHA1', 'BETA1', &
'BETA2', 'GAMMA11', 'GAMMA12', 'DELTA', 'ZETA1', &
'ETA1', 'ETA2', 'THETA11', 'THETA12'/
!
IDUMMY = 2
CALL GRGLM (X, INDCL, NCLVAL, CLVAL, NVEF, INDEF, NREG, REG, &
IDUMMY=IDUMMY, NRMISS=NRMISS)
CALL UMACH (2, NOUT)
WRITE (NOUT,*) 'NREG = ', NREG, ' NRMISS = ', NRMISS
CALL WRRRL ('%/REG', REG, RLABEL, CLABEL, NROW, NREG, FMT='(F7.2)')
END
NREG = 11 NRMISS =
0
REG
ALPHA1 BETA1
BETA2 GAMMA11 GAMMA12 DELTA
ZETA1 ETA1
1
1.00 1.00
0.00 1.00
0.00 1.11
1.11 1.11
2
1.00 0.00
1.00 0.00
1.00 2.22
2.22 0.00
3
1.00 0.00
0.00 0.00
0.00 3.33
3.33 0.00
4
0.00 1.00
0.00 0.00
0.00 4.44
0.00 4.44
5
0.00 0.00
1.00 0.00
0.00 5.55
0.00 0.00
6
0.00 0.00
0.00 0.00
0.00 6.66
0.00 0.00
ETA2 THETA11 THETA12
1
0.00 1.11
0.00
2 2.22
0.00 2.22
3
0.00 0.00
0.00
4 0.00
0.00 0.00
5
5.55 0.00
0.00
6 0.00
0.00 0.00
PHONE: 713.784.3131 FAX:713.781.9260 |