Search’s a k-d tree for the k nearest neighbors of a key.
XKEY − Vector of
length NVAR
containing the key for which nearest neighbors are desired. (Input)
Note that the elements in XKEY are not arranged
in the same manner as the columns in X.
K − Number of nearest neighbors to find. (Input)
X − NROW by NCOL matrix containing
the data used to form the k-d tree as output from routine QUADT. (Input)
X must not contain
missing values (NaN).
IND − Vector of
length NVAR
containing the column numbers in X used in forming the
k-d tree. (Input)
NBUCK − Bucket
size. (Input)
NBUCK is the maximum
number of observations in a leaf of the k-d tree. The value of
NBUCK should be
the same as the value used in forming the k-d tree (i.e. as input
to the routine QUADT).
IDISCR − Vector of
length NROW
containing the element number in IND that points to the
column of X to
be used as the discriminator in the k-d tree, as output from
routine QUADT.
(Input)
IDISCR(I) = 0 if the
observation is a terminal node. IND(IDISCR(I)) is the column
number in X to
be used as the discriminator.
PART − Vector of length NROW containing the median value to be used for the partition, as output from routine QUADT. (Input)
IPQR − Vector of length K containing the indices of the nearest neighbors. (Output)
PQD − Vector of length K containing the nearest neighbor distances. (Output)
NVAR − Number of
variables used to form the k-d tree. (Input)
Default: NVAR = size (XKEY,1).
NROW − Number of rows
of X used to
form the k-d tree. (Input)
Default: NROW = size (X,1).
NCOL − Number of
columns in X.
(Input)
Default: NCOL = size (X,2).
LDX − Leading
dimension of X
exactly as specified in the dimension statement in the calling
program. (Input)
Default: LDX = size (X,1).
METRIC − Metric to use
in computing the k nearest neighbors. (Input)
Default:
METRIC = 0.
METRIC Metric used
0 Euclidean distance
1 L1 norm
2 L∞ norm
Generic: CALL NGHBR (XKEY, K, X, IND, NBUCK, IDISCR, PART, IPQR, PQD [,…])
Specific: The specific interface names are S_NGHBR and D_NGHBR.
Single: CALL NGHBR (NVAR, XKEY, K, NROW, NCOL, X, LDX, IND, NBUCK, IDISCR, PART, METRIC, IPQR)
Double: The double precision name is DNGHBR.
Routine NGHBR finds the k nearest neighbors in an input k-d tree for an arbitrary key, XKEY in logarithmic time. A k-d tree is a form of B-tree that is especially useful for finding nearest neighbors. The k-d tree input into routine NGHBR should be produced by routine QUADT. Three metrics, Euclidean, L1, and L∞, are available for defining the nearest neighbors. The user should note that if the input key is a row of the k-d tree, then the row will be returned as one of the nearest neighbors. In this case, only k − 1 nearest neighbors will be found.
The algorithm is given by Friedman, Bentley, and Finkel (1977) and is summarized in the following. The basic idea is to traverse the k-d tree in order to determine which leaves of the tree need to be examined for the nearest neighbor. The algorithm is efficient because most leaves are not examined.
1. Let l = 1 and h = NROW.
2. Let k = (l + h)/2, and j and p be the k-th elements of IDISCR and PART, respectively.
3. If (h − l) is less than NBUCK, then go to Step 4. Otherwise, let m be the j-th element of IND. If the (k, m)-th element of X is greater than p, then let l = k + 1 and go to Step 2. Otherwise, set h = k and go to Step 2.
4. Examine each row in X from row l to row h to determine if it is a nearest neighbor. Check to see if rows in X (leaves of the tree) adjacent to these rows need to be examined (see Friedman, Bentley, and Finkel (1977)). If necessary, examine the adjacent rows for nearest neighbors.
The value used for the bucket size, NBUCK, must be the same value as was used in routine QUADT when the k-d tree was created. A common choice for NBUCK is three.
1. Workspace may be explicitly provided, if desired, by use of N2HBR/DN2HBR. The reference is:
CALL N2HBR (NVAR, XKEY, K, NROW, NCOL, X, LDX, IND, IDISCR, PART, METRIC, IPQR, PQD, ILOW, IHIGH, ISIDE, BNDL, BNDH)
The additional arguments are as follows:
ILOW − Work vector of length log2 (NROW) + 3.
IHIGH − Work vector of length log2 (NROW) + 3.
ISIDE − Work vector of length log2 (NROW) + 3.
BNDL − Work vector of length NVAR * (log2 (NROW) + 3).
BNDH − Work vector of length NVAR * (log2 (NROW) + 3).
2. Informational error
Type Code
4 1 The data structure input is not a k-d tree. Use routine QUADT to create the k-d tree.
The following example creates a k-d tree from
financial data collected for firms approximately 2 years prior to bankruptcy and
for financially sound firms at about the same point in time. The data on five
variables, X1 = (population),
X2 = (cash flow)/(total
dept),
X3 = (net income)/(total
assets), X4 = (current
assets)/(current liabilities),
and X5 = (current
assets)/(net sales) are taken from Johnson and Wichern, page 536. Routine NGHBR
is then used to determine the 5 nearest neighbors of the first row in X.
As expected, one of the nearest neighbors found is the key (the first row in
X).
USE QUADT_INT
USE NGHBR_INT
USE WRIRN_INT
USE WRRRN_INT
IMPLICIT NONE
INTEGER K, LDX, METRIC, NBUCK, NCOL, NROW, NVAR
PARAMETER (K=5, LDX=47, METRIC=1, NBUCK=3, NCOL=5, NROW=47, &
NVAR=4)
!
INTEGER I, IDISCR(NROW), IND(NVAR), IPQR(K)
REAL PART(NROW), PQD(K), X(LDX,NCOL), XKEY(NVAR)
!
DATA IND/2, 3, 4, 5/
DATA (X(I,1),I=1,47)/1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., &
1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 2., 2., 2., 2., 2., &
2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., &
2., 2., 2., 2., 2., 2./
DATA (X(I,2),I=1,47)/-0.4485, -0.5633, 0.0643, -0.0721, -0.1002, &
-0.1421, 0.0351, -0.0653, 0.0724, -0.1353, -0.2298, 0.0713, &
0.0109, -0.2777, 0.1454, 0.3703, -0.0757, 0.0451, 0.0115, &
0.1227, -0.2843, 0.5135, 0.0769, 0.3776, 0.1933, 0.3248, &
0.3132, 0.1184, -0.0173, 0.2169, 0.1703, 0.1460, -0.0985, &
0.1398, 0.1379, 0.1486, 0.1633, 0.2907, 0.5383, -0.3330, &
0.4785, 0.5603, 0.2029, 0.2029, 0.4746, 0.1661, 0.5808/
DATA (X(I,3),I=1,47)/-0.4106, -0.3114, -0.3114, -0.0930, &
-0.0917, -0.0651, 0.0147, -0.0566, -0.0076, -0.1433, &
-0.2961, 0.0205, 0.0011, -0.2316, 0.0500, 0.1098, -0.0821, &
0.0263, -0.0032, 0.1055, -0.2703, 0.1001, 0.0195, 0.1075, &
0.0473, 0.0718, 0.0511, 0.0499, 0.0233, 0.0779, 0.0695, &
0.0518, -0.0123, -0.0312, 0.0728, 0.0564, 0.0486, 0.0597, &
0.1064, -0.0854, 0.0910, 0.1112, 0.0792, 0.0792, 0.1380, &
0.0351, 0.0371/
DATA (X(I,4),I=1,47)/1.0865, 1.5134, 1.0077, 1.4544, 1.5644, &
0.7066, 1.5046, 1.3737, 1.3723, 1.4196, 0.3310, 1.3124, &
2.1495, 1.1918, 1.8762, 1.9941, 1.5077, 1.6756, 1.2602, &
1.1434, 1.2722, 2.4871, 2.0069, 3.2651, 2.2506, 4.2401, &
4.4500, 2.5210, 2.0538, 2.3489, 1.7973, 2.1692, 2.5029, &
0.4611, 2.6123, 2.2347, 2.3080, 1.8381, 2.3293, 3.0124, &
1.2444, 4.2918, 1.9936, 1.9936, 2.9166, 2.4527, 5.0594/
DATA (X(I,5),I=1,47)/0.4526, 0.1642, 0.3978, 0.2589, 0.6683,&
0.2794, 0.7080, 0.4032, 0.3361, 0.4347, 0.1824, 0.2497, &
0.6969, 0.6601, 0.2723, 0.3828, 0.4215, 0.9494, 0.6038, &
0.1655, 0.5128, 0.5368, 0.5304, 0.3548, 0.3309, 0.6279, &
0.6852, 0.6925, 0.3483, 0.3970, 0.5174, 0.5500, 0.5778, &
0.2643, 0.5151, 0.5563, 0.1978, 0.3786, 0.4835, 0.4730, &
0.1847, 0.4443, 0.3018, 0.3018, 0.4487, 0.1370, 0.1268/
!
! Create the k-d tree
!
CALL QUADT (X, IND, NBUCK, IDISCR, PART)
!
DO 10 I=1, NVAR
XKEY(I) = X(1,IND(I))
10 CONTINUE
!
CALL NGHBR (XKEY, K, X, IND, NBUCK, IDISCR, PART, IPQR, PQD, &
METRIC=METRIC)
!
CALL WRIRN ('Indices of the nearest neighbors, IPQR.', IPQR, 1, K, 1)
CALL WRRRN ('Nearest neighbor distances, PQD.', PQD, 1, K, 1)
!
END
Indices of the nearest neighbors, IPQR.
1 2 3 4 5
1 3 2 5 7
Nearest neighbor distances, PQD.
1 2 3 4 5
0.000 0.791 0.847 1.201 1.352
PHONE: 713.784.3131 FAX:713.781.9260 |