Chapter 11: Cluster Analysis

CNUMB

Computes cluster membership for a hierarchical cluster tree.

Required Arguments

NODE — Number of data points clustered.   (Input)

ICLSON — Vector of length NODE 1 containing the left son cluster numbers.   (Input)
Cluster NODE + I is formed by merging clusters ICLSON(I) and ICRSON(I).

ICRSON — Vector of length NODE 1 containing the right son cluster numbers.   (Input)
Cluster NODE + I is formed by merging clusters ICLSON(I) and ICRSON(I).

K — Desired number of clusters.   (Input)

ICLUS — Vector of length NODE containing the cluster membership of each observation.   (Output)
Observation I is in cluster ICLUS(I) when K clusters are specified.

NCLUS — Vector of length K containing the number of observations in each cluster.   (Output)

FORTRAN 90 Interface

Generic:                              CALL CNUMB (NODE, ICLSON, ICRSON, K, ICLUS, NCLUS)

Specific:                             The specific interface name is CNUMB.

FORTRAN 77 Interface

Single:                                CALL CNUMB (NODE, ICLSON, ICRSON, K, ICLUS, NCLUS)

Description

Given a fixed number of clusters (K) and the cluster tree (vectors ICRSON and ICLSON) produced by the hierarchical clustering algorithm (see routine CLINK), routine CNUMB determines the cluster membership of each observation. The routine CNUMB first determines the root nodes for the K distinct subtrees forming the K clusters and then traverses each subtree to determine the cluster membership of each observation. The routine CNUMB also returns the number of observations found in each cluster.

Comments

1.         Workspace may be explicitly provided, if desired, by use of C2UMB. The reference is:

CALL C2UMB (NODE, ICLSON, ICRSON, K, ICLUS, NCLUS, IPT)

The additional argument is:

IPT — Work vector of length 2 * NODE.

2.         Informational errors

Type Code

4         1                  The tree structure specified by ICLSON and ICRSON is invalid because an attempt to assign an observation to more than one cluster is being made.

4         2                  The tree structure specified by ICLSON and ICRSON is incorrect because an observation is not assigned to a cluster.

Example 1

In the following example, cluster membership for K = 2 clusters is found for the displayed cluster tree. The output vector ICLUS contains the cluster numbers for each observation.

 

      USE CNUMB_INT

      USE WRIRN_INT

 

      IMPLICIT   NONE

      INTEGER    K, NODE

      PARAMETER  (K=2, NODE=5)

!

      INTEGER    ICLSON(NODE-1), ICLUS(NODE), ICRSON(NODE-1), NCLUS(K)

!

      DATA ICLSON/5, 6, 4, 7/

      DATA ICRSON/3, 1, 2, 8/

!                                 Compute cluster membership

      CALL CNUMB (NODE, ICLSON, ICRSON, K, ICLUS, NCLUS)

!                                 Print output

      CALL WRIRN ('ICLUS', ICLUS, 1, NODE, 1)

      CALL WRIRN ('NCLUS', NCLUS, 1, K, 1)

!

      END

Output

 

      ICLUS
1   2   3   4   5
1   2   1   2   1


      NCLUS
1   2
3   2

Additional Example

Example 2

This example illustrates the typical usage of CNUMB. The Fisher iris data (see routine GDATA, Chapter 19, Utilities. is clustered. First the distance between the irises are computed using routine CDIST. The resulting distance matrix is then clustered using routine CLINK. The cluster membership for 5 clusters is then obtained via routine CNUMB using the output from CLINK. The need for 5 clusters can be obtained either by theoretical means or by examining a cluster tree. Because the cluster tree is too large to be included in this example, the call to routine TREEP (see Chapter 16, Line Printer Graphics;) that would ordinarily print the cluster tree has been commented in the example code. The cluster membership for each of the iris observations is printed.

 

      USE IMSL_LIBRARIES

 

      IMPLICIT   NONE

      INTEGER    IDATA, IPRINT, IROW, K, &

                 LDDIST, LDX, NCOL, NODE, NODEX, NROW, NVAR

      PARAMETER  (IDATA=3, IPRINT=0, IROW=1, K=5, LDDIST=150, &

                  LDX=150, NCOL=5, NODE=150, NODEX=5, NROW=150, NVAR=4)

!

      INTEGER    I, ICLSON(NROW-1), ICLUS(NODE), ICRSON(NROW-1), &

                 IND(4), J, NCLUS(K), NSCALE, NXCOL, NXROW

      REAL       AMAX1, CLEVEL(NROW-1), DIST(LDDIST,LDDIST), RN, &

                SCALE(2), X(LDX,NCOL)

      CHARACTER  NODENM(NODE)*7

      INTRINSIC  AMAX1

!

      DATA IND/2, 3, 4, 5/

      DATA NSCALE/1/

      DATA SCALE/0.0, 3.5/

!                                 Get IRIS data.

      CALL GDATA (IDATA, X, NXROW, NXCOL)

!                                 Compute the dissimilarities.

      CALL CDIST (X, DIST, IND=IND)

!                                 Make sure each distance is unique,

!                                 then copy the upper triangle matrix

!                                 to the lower triangle matrix.

      CALL RNSET (4)

      DO 20  I=1, NODE

         DO 10  J=I + 1, NODE

      RN = RNUNF()

            DIST(I,J) = AMAX1(0.0,DIST(I,J)+(0.001*RN))

   10    CONTINUE

         DIST(I,I) = 0.0

         CALL SCOPY (I-1, DIST(1:,I), 1, DIST(I:,1), LDDIST)

   20 CONTINUE

!                                 The initial clustering

      CALL CLINK (DIST, CLEVEL, ICLSON, ICRSON)

!                                 Print the tree.

      NODENM(1) = 'DEFAULT'

!     CALL TREEP (ICLSON, ICRSON, CLEVEL, NSCALE, SCALE, NODENM)

!                                 Compute membership for 5 clusters

      CALL CNUMB (NODE, ICLSON, ICRSON, K, ICLUS, NCLUS)

 

 

 

 

!                                 Print output

      CALL WRIRN ('ICLUS', ICLUS, 1, NODE, 1)

      CALL WRIRN ('NCLUS', NCLUS, 1, K, 1)

!

      END

Output

 

                                   ICLUS
1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20
5  5  5  5  5  5  5  5  5   5   5   5   5   5   5   5   5   5   5   5

21 22 23 24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40
 5  5  5  5   5   5   5   5   5   5   5   5   5   5   5   5   5   5   5   5

41 42 43 44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60
 5  5  5  5   5   5   5   5   5   5   2   2   2   2   2   2   2   1   2   2

61 62 63 64  65  66  67  68  69  70  71  72  73  74  75  76  77  78  79  80
 1  2  2  2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2

81 82 83 84  85  86  87  88  89  90  91  92  93  94  95  96  97  98  99
 2  2  2  2   2   2   2   2   2   2   2   2   2   1   2   2   2   2   1

100 101 102 103  104  105  106  107  108  109  110  111  112  113  114  115
  2   2   2   2    2    2    2    3    2    2    2    2    2    2    2    2

116 117 118 119  120  121  122  123  124  125  126  127  128  129  130  131
  2   2   4   2    2    2    2    2    2    2    2    2    2    2    2    2

132 133 134 135  136  137  138  139  140  141  142  143  144  145  146  147
  4   2   2   2    2    2    2    2    2    2    2    2    2    2    2    2

148  149  150
  2    2    2

         NCLUS
1    2    3    4    5
4   93    1    2   50



http://www.vni.com/
PHONE: 713.784.3131
FAX:713.781.9260