Computes the variance-covariance or correlation matrix.
NVAR — Number of
variables. (Input)
The weight or frequency variables, if used,
are not counted in NVAR.
X — |NROW| by NVAR + m matrix containing the data, where m is 0, 1, or 2 depending on whether any column(s) of X correspond to weights and/or frequencies. (Input)
COV — NVAR
by NVAR
matrix containing either the correlation matrix (possibly with the standard
deviations on the diagonal), the variance-covariance matrix, or the corrected
sums of squares and crossproducts matrix, as controlled by the COV
option, ICOPT.
(Output, if IDO
= 0 or 1; input/output, if IDO
= 2 or 3)
The elements of COV
correspond to the columns of X,
except for the columns of X
containing weights or frequencies (see XMEAN).
IDO — Processing
option. (Input)
Default: IDO
= 0.
IDO Action
0 This is the only invocation of CORVC for this data set, and all the data are input at once.
1 This is the first invocation, and additional calls to CORVC will be made. Initialization and updating for the NROW observations are performed. The means (in XMEAN) are output correctly, but the quantities output in COV are intermediate results.
2 This is an intermediate invocation of CORVC, and updating for the NROW observations is performed.
3 This is the final invocation of this routine. If NROW is not zero, updating is performed. The wrap-up computations for COV are performed.
It is possible to call CORVC twice in succession with IDO = 3 in order to first compute covariances (ICOPT = 1) and then compute correlations (ICOPT = 2 or 3). This ability is most important when pairwise deletion of missing values is used (MOPT = 3). The workspace arrays (or the workspace) must not be altered in between calls.
NROW — The
absolute value of NROW is the number of
rows of data currently input in X. (Input)
Default: NROW
= size (X,1).
NROW may be positive,
zero, or negative. Negative NROW means that the
−NROW rows of data are
to be deleted from (most aspects of) the analysis. This should be done only if
IDO
is 2 or 3 and the wrap-up computations for COV have not been
performed. When a negative value is input for NROW,
it is assumed that each of the −NROW
rows of X has
been input (with positive NROW
) in previous invocations of CORVC.
Use of negative values of NROW
should be made with care since it is possible that a constant variable in the
remaining data will not be recognized as such.
LDX — Leading
dimension of X
exactly as specified in the dimension statement in the calling
program. (Input)
Default: LDX
= size (X,1).
IFRQ — Frequency
option. (Input)
IFRQ = 0 means that
all frequencies are 1.0. For positive IFRQ,
column IFRQ
of X contains
the frequencies.f
Default: IFRQ
= 0.
IWT — Weighting
option. (Input)
IWT = 0 means that all
weights are 1.0. For positive IWT, column IWT of X contains the
weights. Observations with zero weight are counted as observations in the
frequencies, but do not contribute to the means, variances, covariances, or
correlations. Observations with negative weights are missing.
Default: IWT
= 0.
MOPT — Missing
value option. (Input)
NaN (not a number) is interpreted as the
missing value code, and any value in X equal to NaN is
excluded from the computations. If MOPT
is positive, various pairwise exclusion methods are used. See routine AMACH/DMACH in the Reference Material.
Default: MOPT
= 0.
MOPT Action
0 The exclusion is listwise. (The entire row of X is excluded if any of the values of the row is equal to the missing value code.)
1 Raw crossproducts are computed from all valid pairs and means, and variances are computed from all valid data on the individual variables. Corrected crossproducts, covariances and correlations are computed using these quantities.
2 Raw crossproducts, means and variances are computed as in the case of MOPT = 1. However, corrected crossproducts and covariances are computed only from the valid pairs of data. Correlations are computed using these covariances and the variances from all valid data.
3 Raw crossproducts, means, variances, and covariances are computed as in the case of MOPT = 2. Correlations are computed using these covariances, but the variances used are computed only from the valid pairs of data.
ICOPT — COV
option. (Input)
Default: ICOPT
= 0.
ICOPT Action
0 COV contains the variance-covariance matrix.
1 COV contains the corrected sums of squares and crossproducts matrix.
2 COV contains the correlation matrix.
3 COV contains the correlation matrix, except for the diagonal elements, which are the standard deviations.
XMEAN — Vector of length NVAR
containing the variable means. (Output, if IDO =
0 or 1; input/output, if IDO
= 2 or 3)
The elements of XMEAN correspond to
the columns of X,
except that if weights and/or frequencies are used, the elements of XMEAN beyond the IWT or IFRQ
element are shifted relative to the columns of X.
LDCOV — Leading
dimension of COV
exactly as specified in the dimension statement in the calling
program. (Input)
Default: LDCOV
= size (COV,1).
INCD — Incidence
matrix. (Output, if IDO
= 0 or 1; input/output, if IDO
= 2 or 3)
If MOPT
is zero, INCD is
1 by 1, and contains the number of valid observations. If MOPT
is positive, INCD is NVAR
by NVAR
and contains the numbers of pairs of valid observations that are used in
calculating the crossproducts for COV.
LDINCD — Leading
dimension of INCD exactly as
specified in the dimension statement in the calling program.
(Input)
Default: LDINCD
= size(INCD,1).
NOBS — Total
number of observations (that is, the total of the frequencies).
(Output, if
IDO
= 0 or 1; input/output, if IDO
= 2 or 3)
If MOPT
= 0, observations with missing values are not included in NOBS.
For other values of MOPT,
all observations are included except for observations with missing values for
the weight or the frequency.
NMISS — Total
number of observations that contain any missing values. (Output, if
IDO
= 0 or 1; input/output, if IDO
= 2 or 3)
SUMWT — Sum of
the weights of all observations that are processed. (Output, if
IDO
= 0, or 1; input/output, if IDO
= 2 or 3)
If MOPT
= 0, observations with missing values are not included in SUMWT. For other
values of MOPT,
all observations are included except for observations with missing values for
the weight or the frequency.
Generic: CALL CORVC (NVAR, X, COV [,…])
Specific: The specific interface names are S_CORVC and D_CORVC.
Single: CALL CORVC (IDO, NROW, NVAR, X, LDX, IFRQ, IWT, MOPT, ICOPT, XMEAN, COV, LDCOV, INCD, LDINCD, NOBS, NMISS, SUMWT)
Double: The double precision name is DCORVC.
Routine CORVC computes estimates of correlations, covariances, or sums of squares and crossproducts for a data matrix X. Weights and frequencies are allowed but not required. Also allowed are listwise or pairwise deletion of missing values. Routine CORVC is an “IDO routine,” so it may be called with all of the data in one invocation, or it may be called in several invocations with some (or none) of the data input during each call. By setting NROW to a negative integer, observations that have previously been added to the covariance/correlation statistics may be deleted from the statistics. Exercise care with this option, however, since the program may not be able to detect constant variables when negative NROW is used.
The means, (corrected) sums of squares, and (corrected) sums of crossproducts are computed using the method of provisional means. Let
denote the mean based upon i observations for the k-th variable, fi denote the frequency of the i-th observation, wi denote the weight of the i-th observation, and let cjki denote the sum of crossproducts (or sum of squares if j = k) based upon i observations. Then, the method of provisional means finds new means and sums of crossproducts as follows:
The means and crossproducts are initialized as:
where p denotes the number of variables. Letting
xk(i+1)
denote the k-th variable on observation
i + 1,
each new observation leads to the following updates for
and cjki using update constant ri+1:
If there is no weight variable, weights of 1.0 are used. If there is no frequency column, frequencies of 1.0 are used. Means and variances are computed based upon all of the valid data for each variable or, if required, based upon all of the valid data for each pair of variables.
1. Workspace may be explicitly provided, if desired, by use of C2RVC/DC2RVC.The reference is:
CALL C2RVC (IDO, NROW, NVAR, X, LDX, IFRQ, IWT, MOPT, ICOPT, XMEAN, COV, LDCOV, INCD, LDINCD, NOBS, NMISS, SUMWT, WK)
The additional argument is:
WK — Workspace of the length specified above. WK should not be changed between calls to C2RVC.
The workspace may contain statistics of interest. Let
m = NVAR
k = m(m + 1)/2
Statistics that are stored in the workspace that are part of symmetric matrices are stored in symmetric storage mode, i.e., only the lower triangular elements are stored. The workspace utilization is :
MOPT |
IWT |
Start |
Length |
Contents |
All |
All |
1 |
m |
Indicators of constant data |
All |
All |
m + 1 |
m |
First nonmissing data |
0 |
All |
2m+1 |
m |
Deviation from temporary mean |
0 |
Positive |
3m + 1 |
1 |
Sum of weights |
1, 2 |
All |
2m + 1 |
m2 |
Pairwise means |
1, 2 |
Positive |
2m + m2 + 1 |
k |
Pairwise sums of weights |
3 |
All |
2m + 1 |
m2 |
Pairwise means |
3 |
0 |
2m + m2 + 1 |
m2 |
Pairwise sums of products |
3 |
Positive |
2m + m2 + 1 |
k |
Pairwise sums of weights |
3 |
Positive |
2m + k + m2 + 1 |
m2 |
Pairwise sums of products |
2. Informational errors
Type Code
3 12 The sum of the weights is zero. The means, variance and covariances are set to NaN.
3 13 The sum of the weights is zero. The means and correlations are set to NaN.
3 14 Correlations are requested but the observations on a variable are constant. The pertinent correlations are set to NaN.
3 15 Variances and covariances are requested but fewer than two valid observations are present for some variables. The corresponding variances or covariances are set to NaN.
3 16 Pairwise correlations are requested but the observations on a variable are constant. The pertinent correlations are set to NaN.
3 17 Correlations are requested but fewer than two valid observations are present for some variables. The corresponding variances or covariances are set to NaN.
4 10 More observations have been deleted than were originally entered.
4 11 More observations have been deleted from COV(i, j) than were originally entered. INCD(i, j) is less than zero.
4 18 Different observations have been deleted from COV(i, j) than were originally entered. COV(i, j) is less than zero.
In CORVC, each observation xki with weight wi is assumed to have mean μk and variance
With these assumptions, CORVC uses the following definition of a sample mean:
where nr is the number of cases. The following formula defines the sample covariance, sjk, between variables j and k:
The sample correlation between variables j and k, rjk, is defined as:
The first example illustrates the use of CORVC when inputing all of the data at once. The first 50 observations in the Fisher iris data (see routine GDATA, Chapter 19, Utilities) are used. Note in this example that the first variable is constant over the first 50 observations.
USE GDATA_INT
USE UMACH_INT
USE CORVC_INT
USE WRRRN_INT
USE WRIRN_INT
IMPLICIT NONE
INTEGER LDCOV, LDINCD, LDX, NVAR
PARAMETER (LDCOV=5, LDINCD=1, LDX=150, NVAR=5)
!
INTEGER INCD(LDINCD,1), NMISS, NOBS, NOUT, NROW, NV
REAL COV(LDCOV,NVAR), SUMWT, X(LDX,NVAR), XMEAN(NVAR)
!
CALL GDATA (3, X, NOBS, NV)
!
CALL UMACH (2, NOUT)
NROW = 50
!
CALL CORVC (NVAR, X, COV,
NROW=NROW, XMEAN=XMEAN, INCD=INCD, &
NOBS=NOBS, NMISS=NMISS, SUMWT=SUMWT)
!
CALL WRRRN ('XMEAN', XMEAN, 1, NVAR, 1, 0)
CALL WRRRN ('COV', COV)
CALL WRIRN ('INCD', INCD)
WRITE (NOUT,*) ' NOBS = ', NOBS, ' NMISS = ', NMISS, ' SUMWT = ', &
SUMWT
END
XMEAN
1
2
3
4 5
1.000
5.006 3.428 1.462
0.246
COV
1 2 3 4 5
1 0.0000 0.0000 0.0000 0.0000 0.0000
2 0.0000 0.1242 0.0992 0.0164 0.0103
3 0.0000 0.0992 0.1437 0.0117 0.0093
4 0.0000 0.0164 0.0117 0.0302 0.0061
5 0.0000 0.0103 0.0093 0.0061 0.0111
INCD
50
NOBS = 50 NMISS = 0 SUMWT = 50.0000
In the second example, the IDO
option is used. After the initialization step in which IDO =
1, the first 53 observations in the Fisher iris data are input, one observation
at a time. The last three observations input are then deleted from the
covariances by setting NROW
= −1.
Finally, the
wrap-up step is accomplished by calling CORVC
with IDO
= 3. The output is identical to the output above.
USE IMSL_LIBRARIES
IMPLICIT NONE
INTEGER LDCOV, LDINCD, LDX, LDY, NVAR
PARAMETER (LDCOV=5, LDINCD=1, LDX=150, LDY=1, NVAR=5)
!
INTEGER I, IDO, INCD(LDINCD,1), NMISS, NOBS, NOUT, NROW, NV
REAL COV(LDCOV,NVAR), SUMWT, X(LDX,NVAR),
XMEAN(NVAR), &
Y(LDY,NVAR)
!
CALL GDATA (3, X, NOBS, NV)
!
CALL UMACH (2, NOUT)
!
!
IDO = 1
NROW = 0
! Initialization
CALL CORVC (NVAR, Y, COV,
IDO=IDO, NROW=NROW, XMEAN=XMEAN, &
INCD=INCD, NOBS=NOBS, NMISS=NMISS, SUMWT=SUMWT)
!
IDO = 2
NROW = 1
! Add the observations
DO 10 I=1, 53
CALL SCOPY (NVAR, X(I:,1), LDX, Y(1:,1), 1)
CALL CORVC
(NVAR, Y, COV, IDO=IDO, NROW=NROW, XMEAN=XMEAN,
&
INCD=INCD, NOBS=NOBS, NMISS=NMISS,
SUMWT=SUMWT)
10 CONTINUE
! Delete the last 3 added
NROW = -1
DO 20 I=51, 53
CALL SCOPY (NVAR, X(I:,1), LDX, Y(1:,1), 1)
CALL CORVC
(NVAR, Y, COV, IDO=IDO, NROW=NROW, XMEAN=XMEAN,
&
INCD=INCD, NOBS=NOBS, NMISS=NMISS,
SUMWT=SUMWT)
20 CONTINUE
! Wrap-up
IDO = 3
NROW = 0
CALL CORVC (NVAR, Y, COV,
IDO=IDO, NROW=NROW, XMEAN=XMEAN,
INCD=INCD,&
NOBS=NOBS, NMISS=NMISS,
SUMWT=SUMWT)
CALL WRRRN ('XMEAN', XMEAN, 1, NVAR, 1)
CALL WRRRN ('COV', COV)
CALL WRIRN ('INCD', INCD)
WRITE (NOUT,*) ' NOBS = ',
NOBS, ' NMISS = ', NMISS, ' SUMWT = ',
&
SUMWT
END
XMEAN
1
2 3
4 5
1.000 5.006
3.428 1.462 0.246
COV
1 2 3 4 5
1 0.0000 0.0000 0.0000 0.0000 0.0000
2 0.0000 0.1242 0.0992 0.0164 0.0103
3 0.0000 0.0992 0.1437 0.0117 0.0093
4 0.0000 0.0164 0.0117 0.0302 0.0061
5 0.0000 0.0103 0.0093 0.0061 0.0111
INCD
50
NOBS = 50 NMISS = 0 SUMWT = 50.0000
PHONE: 713.784.3131 FAX:713.781.9260 |