Computes a matrix of dissimilarities (or similarities) between the columns (or rows) of a matrix.
X — NROW by NCOL matrix containing the data. (Input)
DIST — m
by m matrix containing the computed dissimilarities or similarities,
where
m = NROW if IROW = 1 and m
= NCOL
otherwise. (Output)
NROW — Number of
rows in the matrix. (Input)
Default: NROW = size (X,1).
NCOL — Number of
columns in the matrix. (Input)
Default: NCOL = size (X,2).
LDX — Leading
dimension of X
exactly as specified in the dimension statement in the calling
program. (Input)
Default: LDX = size (X,1).
NDSTM — Number of
rows (columns, if IROW = 1) to be used
in computing the distance measure between the columns (rows).
(Input)
Default: NDSTM = size (IND,1) if IND is present.
Otherwise, a default value of 2 is used.
IND — Vector of
length NDSTM
containing the indices of the rows (columns, if IROW = 1) to be used
in computing the distance measure. (Input)
If IND(1) = 0; the first
NDSTM rows
(columns) are used.
By default, the first NDSTM rows(columns)
are used.
IMETH — Method to
be used in computing the dissimilarities or similarities. (Input)
Default: IMETH = 0.
IMETH Method
0 Euclidean distance (L2 norm)
1 Sum of the absolute differences (L1 norm)
2 Maximum difference (L∞ norm)
3 Mahalanobis distance
4 Absolute value of the cosine of the angle between the vectors
5 Angle in radians (0, π) between the lines through the origin defined by the vectors
6 Correlation coefficient
7 Absolute value of the correlation coefficient
8 Number of exact matches
The algorithm section of the manual document has a more detailed description of each measure.
IROW — Row or
columns option. (Input)
If IROW = 1, distances
are computed between the NROW rows of X. Otherwise,
distances between the NCOL columns of X are
computed.
Default: IROW = 1.
ISCALE — Scaling
option. (Input)
ISCALE is not used for
methods 3 through 8.
Default: ISCALE = 0.
ISCALE Scaling Performed
0 No scaling is performed.
1 Scale each column (row, if IROW = 1) by the standard deviation of the column (row).
2 Scale each column (row, if IROW = 1) by the range of the column (row).
LDDIST — Leading
dimension of DIST exactly as
specified in the dimension statement in the calling program.
(Input)
Default: LDDIST = size (DIST,1).
Generic: CALL CDIST (X, DIST [,…])
Specific: The specific interface names are S_CDIST and D_CDIST.
Single: CALL CDIST (NROW, NCOL, X, LDX, NDSTM, IND, IMETH, IROW, ISCALE, DIST, LDDIST)
Double: The double precision name is DCDIST.
Routine CDIST computes an upper triangular matrix (excluding the diagonal) of dissimilarities (or similarities) between the columns or rows of a matrix. Nine different distance measures can be computed. For the first three measures, three different scaling options can be employed. Output from CDIST is generally used as input to clustering or multidimensional scaling routines.
The following discussion assumes that the distance measure is being computed between the columns of the matrix, i.e., that IROW is not 1. If distances between the rows of the matrix are desired, set IROW to 1.
For IMETH
= 0 to 2, each row of X
is first scaled according to the value of ISCALE.
The scaling parameters are obtained from the values in the row scaled as either
the standard deviation of the row or the row range; the standard deviation is
computed from the unbiased estimate of the variance. If ISCALE
is 0, no scaling is performed, and the parameters in the following discussion
are all 1.0. Once the scaling value (if any) has been computed, the distance
between column i and column j is computed via the difference
vector zk = (xk − yk)/sk,
i = 1, …, NDSTM,
where xk denotes the
k-th element in the i-th column, and yk denotes the
corresponding element in the
j-th column. For given
zi, the metrics 0 to 2
are defined as:
IMETH |
Metric |
0 |
Euclidean distance |
1 |
|
2 |
|
Distance measures corresponding to IMETH = 3 to 8 do not allow for scaling. These measures are defined via the column vectors X = (xi), Y = (yi), and Z = (xi − yi) as follows:
IMETH Metric
3
Mahalanobis distance, where
is the usual unbiased sample
estimate of the covariance matrix of the rows.
4
the dot product of X and
Y divided by the length of X times the length of Y .
5 θ, where θ is defined in 4.
6 ρ = the usual (centered) estimate of the correlation between X and Y.
7 The absolute value of ρ (where ρ is defined in 6).
8 The number of times xi = yi, where xi and yi are elements of X and Y.
For the Mahalanobis distance, any variable used in computing the distance measure that is (numerically) linearly dependent upon the previous variables in the IND vector is omitted from the distance measure.
1. Workspace may be explicitly provided, if desired, by use of C2IST/DC2IST. The reference is:
CALL C2IST (NROW, NCOL, X, LDX, NDSTM, IND, IMETH, IROW, ISCALE, DIST, LDDIST, X1, X2, SCALE, WK, IND1)
The additional arguments are as follows:
X1 — Work vector of length NDSTM. Not used if IMETH = 8.
X2 — Work vector of length NDSTM. Not used if IMETH = 8.
SCALE — Work vector of length NDSTM if IMETH is less than
4; of length NCOL or NROW when IROW is 0 or 1, respectively, and IMETH is 4 or 5;
and of length
2 * NCOL or 2 * NROW when IROW is 0 or 1 and IMETH is 6 or 7.
SCALE is not
used when IMETH is 8.
WK — Work vector of length NDSTM * NDSTM when IMETH is 3, or of length NDSTM when IMETH = 6 or 7. Not used otherwise.
IND1 — Integer work vector of length NDSTM.
2. Informational error
Type Code
3 3 A variable is numerically linearly dependent on the previous variables when IMETH is 3. The variable detected as being linearly dependent is omitted from the distance measure.
The following example illustrates the use of CDIST for computing the Euclidean distance between the rows of a matrix.
USE WRRRN_INT
USE CDIST_INT
IMPLICIT NONE
INTEGER IROW, LDDIST, LDX, NCOL, NDSTM, NROW, IMETH
PARAMETER (IMETH=0, IROW=1, NCOL=2, NROW=4, LDDIST=NROW, LDX=NROW)
!
REAL DIST(LDDIST,NROW), X(NROW,NCOL), IND
!
DATA IND/0/
DATA X/1, 1, 1, 1, 1, 0, -1, 2/
DATA DIST/16*0.0/
! Print input matrix
CALL WRRRN ('X', X)
!
CALL CDIST (X, DIST)
! Print distance matrix
CALL WRRRN ('DIST', DIST)
!
END
X
1 2
1 1.000
1.000
2 1.000 0.000
3 1.000
-1.000
4 1.000
2.000
DIST
1 2
3 4
1 0.000
1.000 2.000 1.000
2 0.000
0.000 1.000 2.000
3 0.000
0.000 0.000 3.000
4 0.000
0.000 0.000 0.000
PHONE: 713.784.3131 FAX:713.781.9260 |