Performs k nearest neighbor discrimination.
X NROW by NVAR + 1 matrix
containing the data to be used on this call. (Input/Output)
One
column in X must
contain the group number for each observation. On output, X is sorted into a
k-d tree. The first NRULE + NCLASS rows of X must not contain
missing values in the columns specified by IND and IGRP.
K Number of nearest neighbors to be used in the discriminant rule. (Input)
NGROUP Number of groups in the data. (Input)
NRULE Number of
observations in X to be used in the
discriminant rule. (Input)
The first |NRULE| observations in
X are used as
the set defining the rule. If NRULE is positive,
then the NRULE
observations defining the rule are classified. If NRULE is negative, the
NRULE
observations defining the rule are not classified.
NCLASS Number
of observations in X to
classify. (Input)
NCLASS is the number
of observations in a second sample that may be used to test the rule formed from
the first NRULE
observations. If present, this sample is in rows
NRULE + 1 through
NRULE + NCLASS of X.
THRESH
Threshold for the posterior probabilities. (Input)
If the
maximum posterior probability is less than THRESH, the
observation is classified into group NGROUP
+ 1 (the group other).
PART Vector of length NRULE containing the values to be used in the partition of X for the k-d tree. (Output)
IDISCR Vector
of length NRULE
containing the element number in IND that points to the
column of X to
be used as the discriminator in the k-d tree.
(Output)
IDISCR(i) = 0
if the observation is a terminal node. IND(IDISCR(i)) is
the column number in X to be used as the
discriminator.
NI Vector of length NGROUP containing the number of observations in each group. (Output)
ICLASS Vector
of length m containing the group to which the observation was
classified. (Output)
If NRULE > 0, m
= NRULE + NCLASS; otherwise,
m = NCLASS. The
i-th element in ICLASS corresponds to
to i-th row in the sorted matrix X.
PROB m
by NGROUP
matrix containing the posterior probabilities for each observation.
(Output)
The i-th row in PROB corresponds to
the i-th row in the in the sorted matrix X.
CLASS NGROUP
by NGROUP
+ 1 matrix containing the classification table. (Output)
Each
observation that is classified and has a group number equal to 1.0, 2.0,
, NGROUP
is entered into the table. The rows of the table correspond to the known group
membership. The columns refer to the group to which the observation was
classified. Column NGROUP
+ 1 refers to the column other (see THRESH).
NROW Number of
rows of X that
contain an observation. (Input)
Default: NROW = size (X,1).
NVAR Number of
variables to be used in the discrimination. (Input)
Default:
NVAR = size
(X,2) 1.
NCOL Number of
columns in matrix X.
(Input)
Default: NCOL = size (X,2).
LDX Leading
dimension of X
exactly as specified in the dimension statement in the calling
program. (Input)
Default: LDX = size (X,1).
IND Vector of
length NVAR
containing the column numbers in X to be used in the
discrimination. (Input)
By default, IND(I)=1.
IGRP Column
number in X
containing the group numbers. (Input)
The group numbers must be
1.0, 2.0,
, NGROUP
for an observation to be used in the discriminant functions. (Note, however,
that the nearest integer (NINT) function is used
to obtain the group numbers.)
Default: IGRP = NVAR + 1.
METRIC Metric
to be used in computing the k nearest neighbors. (Input)
Default: METRIC = 0.
METRIC Metric used
0 Euclidean distance
1 L1 norm
2 L∞ norm
PRIOR Vector of
length NGROUP
containing the prior probabilities for each group. (Input, if PRIOR(1) is not −1.0;
input/output, if PRIOR(1) is −1.0)
If PRIOR(1) is not −1.0, then the
elements of PRIOR should sum to
1.0. Proportional priors can be selected by setting PRIOR(1) = −1.0. In this
case, the prior probabilities will be proportional to the sample size in each
group based upon the first NRULE observations,
and the elements of PRIOR will contain the
proportional prior probabilities on return from NNBRD.
Default:
PRIOR(1) = −1.0.
LDPROB Leading
dimension of PROB exactly as
specified in the dimension statement in the calling program.
(Input)
Default: LDPROB = size (PROB,1).
LDCLAS Leading
dimension of CLASS exactly as
specified in the dimension statement in the calling program.
(Input)
Default: LDCLAS = size (CLASS,1).
Generic: CALL NNBRD (X, K, NGROUP, NRULE, NCLASS, THRESH, PART, IDISCR, NI, ICLASS, PROB, CLASS [, ])
Specific: The specific interface names are S_NNBRD and D_NNBRD.
Single: CALL NNBRD (NROW, NVAR, NCOL, X, LDX, K, IND, IGRP, NGROUP, NRULE, NCLASS, METRIC, PRIOR, THRESH, PART, IDISCR, NI, ICLASS, PROB, LDPROB, CLASS, LDCLAS)
Double: The double precision name is DNNBRD.
Routine NNBRD performs k-th nearest neighbor discriminant function analysis. The k-d tree algorithm of Friedman, Bentley, and Finkel (1977) is used to find the nearest neighbors. Consult this reference for a discussion of k-d trees and how one goes about finding nearest neighbors in them.
In NNBRD, the k nearest neighbors of any observation used in forming the rule (i.e., one of the first NRULE observations in X), do not include the observation. Let ki(i = 1, , NGROUP) denote the number of nearest neighbors found from each of the groups for a given observation (Σiki = k); let pi = PRIOR(i)( Σipi = 1); and let
denote the estimated posterior probability of membership in group i. Compute
where g = NGROUP. (If nj = 0 for some j, the associated term in the denominator is excluded and
is set to 0.0.)
Let m denote the index of the maximum
and ɸ = THRESH. Then if
the observation is classified into group m. If
or if the maximum is not unique, then the observation is not classified into any
group and ICLASS
is set to zero.
Three metrics are available in NNBRD
for finding the nearest neighbors. These are Euclidean (L2) distance,
L1 norm, and
L∞
norm. In order to use Mahalanobis distance, xTΣ−1 x, a
transformation
y = Σ−1∕2 x is first
needed so that Var(y) = I. These transformations can be
accomplished by use of the mathematical routines. The L2 norm would then be
used with y as input to obtain the Mahalanobis metric.
Workspace may be explicitly provided, if desired, by use of N2BRD/DN2BRD. The reference is:
CALL N2BRD (NROW, NVAR, NCOL, X, LDX, K, IND, IGRP, NGROUP, NRULE, NCLASS, METRIC, PRIOR, THRESH, PART, IDISCR, NI, ICLASS, PROB, LDPROB, CLASS, LDCLAS, WK, IWK, ILOW, IHIGH, ISIDE, BNDL, BNDH, XKEY, IPQR, PQD)
The additional arguments are as follows:
WK Work vector of length NROW.
IWK Work vector of length NROW.
ILOW Work vector of length log2(NROW) + 3.
IHIGH Work vector of length log2(NROW) + 3.
ISIDE Work vector of length log2(NROW) + 3.
BNDL Work vector of length NVAR * (log2(NROW) + 3).
BNDH Work vector of length NVAR * (log2(NROW) + 3).
XKEY Work vector of length NVAR.
IPQR Work vector of length K + 1.
PQD Work vector of length K + 1.
Fishers iris data are used to illustrate routine NNBRD. The data consist of three types of iris. NNBRD is called with k = 5 and Euclidean distance as the metric. The results show a clear separation of the groups.
USE GDATA_INT
USE NNBRD_INT
USE WRRRN_INT
USE WRIRN_INT
IMPLICIT NONE
INTEGER IGRP, K, LDCLAS, LDPROB, LDX, NCLASS, NCOL, &
NGROUP, NROW, NRULE, NVAR
REAL THRESH
PARAMETER (IGRP=1, K=5, LDCLAS=3, LDPROB=150, LDX=150, &
NCLASS=0, NCOL=5, NGROUP=3, NROW=150, &
NRULE=150, NVAR=4, THRESH=0.10)
!
INTEGER ICLASS(NROW), IDISCR(NROW), IND(NVAR), NI(NGROUP), &
NRA, NRB
REAL CLASS(LDCLAS,NGROUP+1), PART(NRULE), PRIOR(NGROUP), &
PROB(LDPROB,NGROUP), X(LDX,NCOL)
!
DATA IND/2, 3, 4, 5/
!
CALL GDATA (3, X, NRA, NRB)
!
PRIOR(1) = -1.0
CALL NNBRD (X, K, NGROUP, NRULE, NCLASS, THRESH, PART, IDISCR, &
NI, ICLASS, PROB, CLASS, IND=IND, IGRP=IGRP, PRIOR=PRIOR)
CALL WRRRN ('The first 10 rows of X', X, 10, NCOL, LDX)
CALL WRRRN ('PRIOR', PRIOR, 1, NGROUP, 1)
CALL WRRRN ('The first 10 elements of PART', PART, 1, 10, 1)
CALL WRIRN ('The first 10 elements of IDISCR', IDISCR, 1, 10, 1)
CALL WRIRN ('NI', NI, 1, NGROUP, 1)
CALL WRIRN ('The first 10 elements of ICLASS', ICLASS, 1, 10, 1)
CALL WRRRN ('The first 10 rows of PROB', PROB, 10, NGROUP, LDPROB)
CALL WRRRN ('CLASS', CLASS, NGROUP, NGROUP, LDCLAS)
!
END
The
first 10 rows of X
1 2
3 4
5
1 1.000 4.500 2.300
1.300 0.300
2 1.000
4.400 2.900 1.400
0.200
3 1.000 4.800
3.000 1.400 0.300
4
1.000 4.400 3.000 1.300
0.200
5 1.000 4.800
3.000 1.400 0.100
6
1.000 4.300 3.000 1.100
0.100
7 1.000 4.600
3.100 1.500 0.200
8
1.000 4.900 3.100 1.500
0.100
9 1.000 4.900
3.000 1.400 0.200
10 1.000
4.900 3.100 1.500 0.200
PRIOR
1
2 3
0.3333
0.3333
0.3333
The first 10 elements of PART
1 2
3 4
5 6
7 8
9 10
0.000 0.000 3.000
0.000 3.000 0.000 0.000
4.900 0.000 3.100
The first 10
elements of IDISCR
1 2 3 4
5 6 7 8 9
10
0 0 2 0 2
0 0 1 0
2
NI
1
2 3
50 50
50
The first 10 elements of ICLASS
1
2 3 4 5 6
7 8 9 10
1 1
1 1 1 1 1
1 1 1
The first 10 rows of
PROB
1 2
3
1 1.000 0.000
0.000
2 1.000 0.000
0.000
3 1.000 0.000
0.000
4 1.000 0.000
0.000
5 1.000 0.000
0.000
6 1.000 0.000
0.000
7 1.000 0.000
0.000
8 1.000 0.000
0.000
9 1.000 0.000
0.000
10 1.000 0.000
0.000
CLASS
1 2
3
1 50.00 0.00
0.00
2 0.00 47.00
3.00
3 0.00 2.00 48.00
PHONE: 713.784.3131 FAX:713.781.9260 |