Forms a k-d tree.
X − NROW by NCOL matrix containing
the data to be used on this call. (Input/Output)
On output the
rows of X have
been rearranged to form the k-d tree. X must not contain
missing values (NaN).
IND − Vector of length NVAR containing the column numbers in X to be used in the forming the k-d tree. (Input)
NBUCK − Bucket
size. (Input)
NBUCK gives the
maximum number of observations in a leaf of the k-d tree. NBUCK = 3 is a common
choice. NBUCK
should be small when compared to NROW.
IDISCR − Vector of
length NROW
containing the element number in IND that points to the
column of X to
be used as the discriminator in the k-d tree. (Output)
IDISCR(I) = 0 if the
observation is a terminal node. IND(IDISCR(I)) is the column
number in X to
be used as the discriminator.
PART − Vector of length NROW containing the value to be used in the partition for this observation. (Output)
NROW − Number of rows
of X to be used
in forming the k-d tree. (Input)
Default: NROW = size (X,1).
NVAR − Number of
variables to be used in forming the tree. (Input)
Default: NVAR = size (IND,1).
NCOL − Number of
columns in X.
(Input)
Default: NCOL = size (X,2).
LDX − Leading
dimension of X
exactly as specified in the dimension statement in the calling
program. (Input)
Default: LDX = size (X,1).
Generic: CALL QUADT (X, IND, NBUCK, IDISCR, PART [,…])
Specific: The specific interface names are S_QUADT and D_QUADT.
Single: CALL QUADT (NROW, NVAR, NCOL, X, LDX, IND, NBUCK, IDISCR, PART)
Double: The double precision name is DQUADT.
Routine QUADT
creates the data structure required for a k-d tree. A
k-d tree is a multivariate form of B-tree that is especially
useful for finding nearest neighbors but may be of use in other
situations. Once the k-d tree has been formed, routine
NGHBR may be used to find the nearest
neighbors for any point in logarithmic time.
The basic algorithm is given by Friedman, Bentley, and Finkel (1977) and can be summarized as follows:
1. Let l = 1 and h = NROW.
2. Let k = (l + h)/2.
3. Each column in X to be used in forming the k-d tree is examined over the range [l, h] in order to find the column with the maximum spread. Let j equal this column number.
4. The k-th element of PART is set to the median value in the range [l, h] of the j-th column in X while IDISCR(k) is set to the element in IND that points to this column.
5. The rows of X are interchanged so that all rows of X with values in column j less than or equal to the median value computed in Step 4 occur before (or at) the k-th element.
6. Go to Step 2 repeatedly with zero, one, or two submatrices in X. Go to Step 2 with the submatrix formed from rows l to k of X if k − l is greater than NBUCK. Go to Step 2 with the submatrix formed from rows k + 1 to h of X if h − k − 1 is greater than NBUCK.
The bucket size, NBUCK, is the maximum number of observations allowed in the lowest level of the k-d tree, i.e., in the leaves of the tree. The choice of NBUCK can affect the speed with which nearest neighbors are found. A value of 3 or 5 is a common choice, but if the number of nearest neighbors to be obtained is large, a larger value for NBUCK should probably be used.
Workspace may be explicitly provided, if desired, by use of Q2ADT/DQ2ADT. The reference is:
CALL Q2ADT (NROW, NVAR, NCOL, X, LDX, IND, NBUCK, IDISCR, PART, ILOW, IHIGH, WK, IWK)
The additional arguments are as follows:
ILOW − Work vector of length log2 (NROW) + 3.
IHIGH − Work vector of length log2 (NROW) + 3.
WK − Work vector of length NROW.
IWK − Work vector of length NROW.
The following example creates a k-d tree from financial data collected for firms approximately 2 years prior to bankruptcy and for financially sound firms at about the same point in time. The data on five variables, X1 = (population), X2 = (cash flow)/(total debt), X3 = (net income)/(total assets), X4 = (current assets)/(current liabilities), and X5 = (current assets)/(net sales) are taken from Johnson and Wichern (1988, page 536).
USE QUADT_INT
USE WRRRN_INT
USE WRIRN_INT
IMPLICIT NONE
INTEGER LDX, NBUCK, NCOL, NROW, NVAR, I
PARAMETER (LDX=47, NBUCK=3, NCOL=5, NROW=47, NVAR=4)
!
INTEGER IDISCR(NROW), IND(NVAR)
REAL PART(NROW), X(LDX,NCOL)
!
DATA IND/2, 3, 4, 5/
DATA (X(I,1),I=1,47)/1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., &
1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 2., 2., 2., 2., 2., &
2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., &
2., 2., 2., 2., 2., 2./
DATA (X(I,2),I=1,47)/-0.4485, -0.5633, 0.0643, -0.0721, -0.1002, &
-0.1421, 0.0351, -0.0653, 0.0724, -0.1353, -0.2298, 0.0713, &
0.0109, -0.2777, 0.1454, 0.3703, -0.0757, 0.0451, 0.0115, &
0.1227, -0.2843, 0.5135, 0.0769, 0.3776, 0.1933, 0.3248, &
0.3132, 0.1184, -0.0173, 0.2169, 0.1703, 0.1460, -0.0985, &
0.1398, 0.1379, 0.1486, 0.1633, 0.2907, 0.5383, -0.3330, &
0.4785, 0.5603, 0.2029, 0.2029, 0.4746, 0.1661, 0.5808/
DATA (X(I,3),I=1,47)/-0.4106, -0.3114, -0.3114, -0.0930, &
-0.0917, -0.0651, 0.0147, -0.0566, -0.0076, -0.1433, &
-0.2961, 0.0205, 0.0011, -0.2316, 0.0500, 0.1098, -0.0821, &
0.0263, -0.0032, 0.1055, -0.2703, 0.1001, 0.0195, 0.1075, &
0.0473, 0.0718, 0.0511, 0.0499, 0.0233, 0.0779, 0.0695, &
0.0518, -0.0123, -0.0312, 0.0728, 0.0564, 0.0486, 0.0597, &
0.1064, -0.0854, 0.0910, 0.1112, 0.0792, 0.0792, 0.1380, &
0.0351, 0.0371/
DATA (X(I,4),I=1,47)/1.0865, 1.5134, 1.0077, 1.4544, 1.5644, &
0.7066, 1.5046, 1.3737, 1.3723, 1.4196, 0.3310, 1.3124, &
2.1495, 1.1918, 1.8762, 1.9941, 1.5077, 1.6756, 1.2602, &
1.1434, 1.2722, 2.4871, 2.0069, 3.2651, 2.2506, 4.2401, &
4.4500, 2.5210, 2.0538, 2.3489, 1.7973, 2.1692, 2.5029, &
0.4611, 2.6123, 2.2347, 2.3080, 1.8381, 2.3293, 3.0124, &
1.2444, 4.2918, 1.9936, 1.9936, 2.9166, 2.4527, 5.0594/
DATA (X(I,5),I=1,47)/0.4526, 0.1642, 0.3978, 0.2589, 0.6683, &
0.2794, 0.7080, 0.4032, 0.3361, 0.4347, 0.1824, 0.2497, &
0.6969, 0.6601, 0.2723, 0.3828, 0.4215, 0.9494, 0.6038, &
0.1655, 0.5128, 0.5368, 0.5304, 0.3548, 0.3309, 0.6279, &
0.6852, 0.6925, 0.3483, 0.3970, 0.5174, 0.5500, 0.5778, &
0.2643, 0.5151, 0.5563, 0.1978, 0.3786, 0.4835, 0.4730, &
0.1847, 0.4443, 0.3018, 0.3018, 0.4487, 0.1370, 0.1268/
!
CALL QUADT (X, IND, NBUCK, IDISCR, PART)
CALL WRRRN ('first 10 rows of X after QUADT', X, 10, NCOL, LDX)
CALL WRRRN ('PART', PART, 1, NROW, 1)
CALL WRIRN ('IDISCR', IDISCR, 1, NROW, 1)
!
END
first 10 rows of X after QUADT
1 2 3 4 5
1 1.000 -0.230 -0.296 0.331 0.182
2 2.000 0.140 -0.031 0.461 0.264
3 1.000 -0.142 -0.065 0.707 0.279
4 1.000 -0.449 -0.411 1.087 0.453
5 1.000 0.064 -0.311 1.008 0.398
6 1.000 0.123 0.105 1.143 0.166
7 1.000 -0.284 -0.270 1.272 0.513
8 1.000 -0.278 -0.232 1.192 0.660
9 1.000 0.012 -0.003 1.260 0.604
10 1.000 0.071
0.021 1.312 0.250
PART
1 2 3 4 5 6 7 8 9 10
0.000 0.461 0.857 0.000
0.064 1.168 0.000 -0.278
0.041 0.000
11 12 13 14 15 16 17 18 19 20
0.072 1.373 0.000 -0.072
0.412 0.000 0.435 -0.015
0.000 1.876
21 22 23 24 25 26 27 28 29 30
0.448 0.000 .708
1.994 0.000 0.203 2.152
0.000 2.308 0.390
31 32 33 34 35 36 37 38 39 40
0.000 .550 0.147
0.000 0.217 2.453 0.000
2.521 0.128 0.000
41 42 43 44 45 46 47
2.612 3.012 0.000
4.240 4.292 4.755 0.000
IDISCR
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
0 3 3 0 1
3 0 1 1 0
1 3 0 1 4
0 4 1 0 3
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
4 0 4 3 0
1 3 0 3 4
0 4 1 0 1
3 0 3 1 0
41 42 43 44 45 46 47
3 3 0 3 3 3 0
PHONE: 713.784.3131 FAX:713.781.9260 |