|
Uses Fishers linear discriminant analysis method to reduce the number of variables.
XMEAN NGROUP by NVAR matrix containing the means of the variables in each group. (Input)
SUMWT Vector of length NGROUP containing the sum of the weights of the observations in each group. (Input)
COV NVAR by NVAR matrix containing the pooled within-groups variance-covariance matrix Sp. (Input)
NNV Number of eigenvectors extracted from
the standardized between-groups variance-covariance matrix. (Output)
Sp
is the pooled within-groups variance-covariance matrix, and Sb is the
between-groups variance-covariance matrix. NNV is usually the
minimum of NVAR
and
NGROUP
−1, but it
may be smaller if any row of XMEAN or COV is a linear
combination of the other rows.
EVAL Vector of
length NNV
containing the eigenvalues extracted from the standardized between-means
variancecovariance matrix, in descending order. (Output)
NNV is less than or
equal to the minimum of NVAR and (NGROUP
−1).
COEF NVAR by NNV matrix of
eigenvectors from the standardized between-means variance-covariance
matrix. (Output)
The eigenvector coefficients have been
standardized such that the canonical scores can be obtained directly by
multiplication of the original data by COEF.
CMEAN NGROUP by NNV matrix of group means of the canonical variables. (Output)
NGROUP Number
of groups. (Input)
Default: NGROUP = size (XMEAN,1).
NVAR Number of
variables. (Input)
Default: NVAR = size (XMEAN,2).
LDXMEA Leading
dimension of XMEAN exactly as
specified in the dimension statement in the calling program.
(Input)
Default: LDXMEA = size (XMEAN,1).
LDCOV Leading
dimension of COV
exactly as specified in the dimension statement in the calling
program. (Input)
Default: LDCOV = size (COV,1).
LDCOEF Leading
dimension of COEF exactly as
specified in the dimension statement in the calling program.
(Input)
Default: LDCOEF = size (COEF,1).
LDCMEA Leading
dimension of CMEAN exactly as
specified in the dimension statement in the calling program.
(Input)
Default: LDCMEA = size (CMEAN,1).
Generic: CALL DMSCR (XMEAN, SUMWT, COV, NNV, EVAL, COEF, CMEAN [, ])
Specific: The specific interface names are S_DMSCR and D_DMSCR.
Single: CALL DMSCR (NGROUP, NVAR, XMEAN, LDXMEA, SUMWT, COV, LDCOV, NNV, EVAL, COEF, LDCOEF, CMEAN, LDCMEA)
Double: The double precision name is DDMSCR.
Routine DMSCR is a natural generalization of R.A. Fishers linear discrimination procedure for two groups. This method of discrimination obtains those linear combinations of the observed random variables that maximize the between-groups variation relative to the withingroups variation. Denote the first of these linear combinations by
where β1 is a column vector of coefficients of length NVAR and x is an observation to be classified. On the basis of one linear combination, the discriminant rule assigns the observation, z, to a group (characterized by the group mean) by minimizing the Euclidean distance between z and the group mean.
To obtain β1 (see, e.g., Tatsuoka 1971, page 158), let Sp denote the pooled within-groups covariance matrix (Sp is defined and can be computed via routine DSCRM) and let Sb denote the between-groups covariance matrix defined by
where g is the number of groups,
is the mean vector for the i-th group of
observations, denotes the vector of
means over all observations, wi is the sum of the
weights times the frequencies as input in SUMWT
and as used in the computation of
and N is the total number of observations used in computing COV. Then, β1, such that
can be computed as the maximum of
This yields β1 as the eigenvector associated with the largest eigenvalue from
Generally,
has rank m, where m = min(g − 1, p) and p = NVAR.
has m such eigenvectors, and the matrix COEF is obtained as (β1, β2, ., βm), where each βi is an eigenvector.
The matrix CMEAN is taken as the within-group means vector of the linear combinations zi defined by the βs. For each observation x, scores
can be computed, because of the restriction on βi, the sample variance of the zi is 1.0. The observation is classified into the group (as specified by the group mean of the zis) to which, on the basis of the zi, the Euclidean distance is the least.
Note that the linear combinations zi have meaning even when discrimination is not desired. The linear combination of the observed variables that most separates the g groups is z1; z2, giving the second highest such separation orthogonal to the first, and so on. Thus, a plot of the mean vectors of the first two variables gives a good two-dimensional summarization of the relationships between the groups.
1. Workspace may be explicitly provided, if desired, by use of D2SCR/DD2SCR. The reference is:
CALL D2SCR (NGROUP, NVAR, XMEAN, LDXMEA, SUMWT, COV, LDCOV, NNV, EVAL, COEF, LDCOEF, CMEAN, LDCMEA, BCOV, EVAL2, EVEC, WKR, WK)
The additional arguments are as follows:
BCOV Work array of length NVAR * NVAR.
EVAL2 Work array of length NVAR.
EVEC Work array of length NVAR * NVAR.
WKR Work array of length NVAR * NVAR.
WK Work array of length 2 * NVAR.
2. IMSL routine DSCRM may be used to calculate the input arrays for this routine from the original data.
The following example illustrates a typical sequence. Fishers iris data is used. (See routine GDATA, Chapter 19, Utilities;). Routine DSCRM is first used to perform a discriminant analysis based on all the variables. COV, XMEAN, and NI are obtained from DSCRM. Function DMSCR, which uses these arrays, is then called.
USE IMSL_LIBRARIES
IMPLICIT NONE
INTEGER IGRP, IMTH, IPRINT, LDCLAS, LDCMEA, LDCO, LDCOEF, &
LDCOV, LDD2, LDPROB, LDX, LDXMEA, NCOL, &
NGROUP, NROW, NVAR
PARAMETER (IGRP=1, IMTH=3, IPRINT=0, LDCOV=4, NCOL=5, NGROUP=3, &
NROW=150, NVAR=4, LDCLAS=NGROUP, LDCMEA=NGROUP, &
LDCO=NGROUP, LDCOEF=NVAR, LDD2=NGROUP, LDPROB=NROW, &
LDX=NROW, LDXMEA=NGROUP)
!
INTEGER ICLASS(NROW), IND(4), NI(NGROUP), NNV, NOBS, NOUT, &
NRMISS, NV
REAL CLASS(LDCLAS,NGROUP), CMEAN(LDCMEA,NGROUP-1), &
CO(LDCO,NVAR+1), COEF(LDCOEF,NGROUP-1), &
COV(LDCOV,LDCOV,1), D2(LDD2,NGROUP), EVAL(NGROUP-1), &
PRIOR(3), PROB(LDPROB,NGROUP), REAL, &
STAT(6+2*NGROUP), SUMWT(NGROUP), X(LDX,5), &
XMEAN(LDXMEA,NVAR)
INTRINSIC REAL
!
DATA IND/2, 3, 4, 5/, PRIOR/0.3333333, 0.3333333, 0.3333333/
!
CALL GDATA (3, X, NOBS, NV)
!
CALL DSCRM (NROW, NVAR, X, NGROUP, COV(1:,1:,1), CO, ICLASS, &
PROB, CLASS, D2, STAT, IND=IND, IGRP=IGRP, IMTH=IMTH, &
PRIOR=PRIOR, XMEAN=XMEAN)
!
SUMWT(1) = STAT(6+NGROUP)
SUMWT(2) = STAT(7+NGROUP)
SUMWT(3) = STAT(8+NGROUP)
!
CALL DMSCR (XMEAN, SUMWT, COV(1:,1:,1), NNV, EVAL, COEF, CMEAN)
CALL UMACH (2, NOUT)
WRITE (NOUT,'('' NNV = '',I1)') NNV
CALL WRRRN ('EVAL', EVAL, 1, NNV, 1)
CALL WRRRN ('COEF', COEF)
CALL WRRRN ('CMEAN', CMEAN)
END
NNV = 2
EVAL
1
2
32.19
0.29
COEF
1 2
1 -0.829
0.024
2 -1.534 2.165
3 2.201
-0.932
4 2.810
2.839
CMEAN
1 2
1 -5.502
6.877
2 3.930 5.934
3
7.888 7.174
PHONE: 713.784.3131 FAX:713.781.9260 |