Computes basic univariate statistics.
X |NROW| by NVAR + m matrix containing the data, where m is 0, 1, or 2 depending on whether any column(s) of X correspond to weights and/or frequencies. (Input)
STAT 15 by
NVAR matrix
containing in each row statistics on all of the variables. (Output,
if IDO = 0 or 1;
input/output, if IDO = 2 or 3.)
The
columns of STAT
correspond to the columns of X,
except for the columns of X containing weights
or frequencies. (The columns beyond the weights or frequencies column are
shifted to the left.)
I STAT(I, *)
1 contains means
2 contains variances
3 contains standard deviations
4 contains coefficients of skewness
5 contains coefficients of excess (kurtosis)
6 contains minima
7 contains maxima
8 contains ranges
9 contains coefficients of variation, when they are defined. If the coefficient of variation is not defined for a given variable, STAT(9, *) contains a zero in the corresponding position.
10 contains numbers (counts) of nonmissing observations
11 is used only when CONPRM is positive, and, in this case, contains the lower confidence limit for the mean (assuming normality)
12 is used only when CONPRM is positive, and, in this case, contains the upper confidence limit for the mean (assuming normality)
13 is used only when CONPRV is positive, and, in this case, contains the lower confidence limit for the variance (assuming normality).
14 is used only when CONPRV is positive, and, in this case, contains the upper confidence limit for the variance (assuming normality).
15 is used only when weighting is used (IWT is nonnegative), and, in this case, contains the sums of the weights.
IDO Processing
option. (Input)
Default: IDO
= 0.
IDO Action
0 This is the only invocation of UVSTA for this data set, and all the data are input at once.
1 This is the first invocation, and additional calls to UVSTA will be made. Initialization and updating for the data in X are performed. The means are output correctly, but the other quantities output in STAT are intermediate quantities.
2 This is an intermediate invocation of UVSTA, and updating for the data in X is performed.
3 This is the final invocation of this routine. If NROW is not zero, updating is performed. The wrap-up computations for STAT are performed.
NROW The
absolute value of NROW is the number of
rows of data currently input in X.
(Input)
Default: NROW
= size (X,1).
NROW may be positive,
zero, or negative. Negative NROW means that the
−NROW rows of data are
to be deleted from some aspects of the analysis, and this should be done only if
IDO is 2 or 3
and the wrap-up computations for STAT have not been
performed. When a negative value is input for NROW, it is assumed
that each of the −NROW rows of X
has been input (with positive NROW) in a previous
invocation of UVSTA. Use of negative
values of NROW
should be made with care and with the understanding that some quantities in
STAT cannot be
updated properly in this case. In particular, the minima, maxima, and ranges are
not updated because of deletion. It is also possible that a constant variable in
the remaining data will not be recognized as such.
NVAR Number of
variables (not including the weight or frequency variable, if used).
(Input)
Default: NVAR
= size (X,2).
LDX Leading
dimension of X
exactly as specified in the dimension statement in the calling
program. (Input)
Default: LDX
= size (X,1).
IFRQ Frequency
option. (Input)
IFRQ
= 0 means that all frequencies are 1.0. For positive IFRQ,
column number IFRQ
of X
contains the frequencies.
Default: IFRQ
= 0.
IWT Weighting
option. (Input)
IWT
= 0 means that all weights are 1.0. For positive IWT,
column IWT of
X
contains the weights.
Default: IWT
= 0.
MOPT Missing
value option. (Input)
NaN (not a number from routine AMACH(6)) is
interpreted as the missing value code and any value in X
equal to NaN is excluded from the computations.
Default: MOPT
= 0.
MOPT Action
0 The exclusion is listwise. (The entire row of X is excluded if any of the values of the row is equal to the missing value code.)
1 The exclusion is elementwise. (Statistics for variables with nonmissing values are updated.)
CONPRM Confidence level for two-sided interval
estimate of the means (assuming normality), in percent. (Input)
If CONPRM
≤ 0, no
confidence interval for the mean is computed; otherwise, a CONPRM
percent confidence interval is computed, in which case CONPRM
must be between 0.0 and 100.0. CONPRM
is often 90.0, 95.0, or 99.0. For a one-sided confidence interval with
confidence level ONECL,
set CONPRM
= 100.0 −
2.0 * (100.0
− ONECL).
Default:
CONPRM
= .95.0.
CONPRV
Confidence level for two-sided interval estimate of the variances (assuming
normality), in percent. (Input)
The confidence intervals are
symmetric in probability (rather than in length). See also the description of
CONPRM.
Default:
CONPRV
= .95.0.
IPRINT Printing
option. (Input)
Default: IPRINT
= 0.
IPRINT |
Action |
1 |
No printing is performed. |
2 |
Statistics in STAT are printed if IDO = 0 or 3. |
3 |
Intermediate means, sums of squares about the mean, minima, maxima, and counts are printed when IDO = 1 or 2, and all statistics in STAT are printed when IDO = 0 or 3. |
LDSTAT Leading
dimension of STAT exactly as
specified in the dimension statement in the calling program.
(Input)
Default: LDSTAT
= size (STAT,1).
NRMISS Number
of rows of data encountered in calls to UVSTA that contain any
missing values. (Output, if IDO = 0 or 1;
input/output, if IDO = 2 or 3.)
Rows with a frequency of zero are not counted.
Generic: CALL UVSTA (X, STAT [, ])
Specific: The specific interface names are S_UVSTA and D_UVSTA.
Single: CALL UVSTA (IDO, NROW, NVAR, X, LDX, IFRQ, IWT, MOPT, CONPRM, CONPRV, IPRINT, STAT, LDSTAT, NRMISS)
Double: The double precision name is DUVSTA.
For the data in each column of X, except the columns containing frequencies or weights, UVSTA computes the sample mean, variance, minimum, maximum, and other basic statistics. It also computes confidence intervals for the mean and variance if the sample is assumed to be from a normal population.
Missing values, that is, values equal to NaN (not a number, the value returned by routine AMACH(6)), are excluded from the computations. If MOPT is positive, the exclusion is listwise; that is, the entire observation is excluded and no computations are performed even for the variables with valid values. If frequencies or weights are specified, any observation whose frequency or weight is missing is excluded from the computations.
Frequencies are interpreted as multiple occurrences of the other values in the observations. That is, a row of X with a frequency variable having a value of 2 has the same effect as two rows with frequencies of 1. The total of the frequencies is used in computing all of the statistics based on moments (mean, variance, skewness, and kurtosis). Weights are not viewed as replication factors. The sum of the weights is used only in computing the mean (of course, then the weighted mean is used in computing the central moments). Both weights and frequencies can be zero, but neither can be negative. In general, a zero frequency means that the row is to be eliminated from the analysis; no further processing, counting of missing values, or error checking is done on the row. Although it is not required that frequencies be integers, the logic of their treatment implicitly assumes that they are. Weights, on the other hand, are allowed to be continuous. A weight of zero results in the row being counted, and updates are made of statistics and of the number of missing values. A missing value for the frequency or a missing value for the weight when the frequency is nonzero results in the row being deleted from the analysis; but even in that case, if one is nonmissing, it is an error for that nonmissing weight or frequency to be negative.
The definitions of some of the statistics are given below in terms of a single variable x. The i-th datum is xi, with corresponding frequency fi and weight wi. If either frequencies or weights are not specified, fi and/or wi are identically one. The summation in each case is over the set of valid observations, based on the setting of MOPT and the presence of missing values in the data.
The arguments IDO and NROW allow data to be input a few at a time and even to be deleted after having been included in the analysis. The minima, maxima, and ranges are not updated when observations are deleted.
Workspace may be explicitly provided, if desired, by use of U2STA/DU2STA. The reference is
CALL U2STA (IDO, NROW, NVAR, X, LDX, IFRQ, IWT, MOPT, CONPRM, CONPRV, IPRINT, STAT, LDSTAT, NRMISS, WK)
The additional argument is
WK Real work vector of length specified above. WK should not be changed between calls to U2STA.
This example uses data from Draper and Smith (1981). There are 5 variables and 13 observations.
USE UVSTA_INT
USE GDATA_INT
IMPLICIT NONE
INTEGER LDSTAT, LDX, NVAR
PARAMETER (LDSTAT=15, LDX=13, NVAR=5)
!
INTEGER IPRINT, NR, NROW, NV
REAL CONPRM, CONPRV, STAT(LDSTAT,NVAR), X(LDX,NVAR)
! Get data for example.
CALL GDATA (5, X, NR, NV)
! All data are input at once.
NROW = NR
! No unequal frequencies or weights
! are used.
! Get 95% confidence limits.
! Delete any row containing a missing
! value.
! Print results.
IPRINT = 1
CALL UVSTA (X, STAT, NROW=NROW, IPRINT=IPRINT)
END
Univariate Statistics from UVSTA
Variable
Mean Variance Std.
Dev. Skewness
Kurtosis
1
7.4615
34.6026
5.8824
0.68768 0.07472
2 48.1538
242.1410
15.5609
-0.04726 -1.32257
3
11.7692
41.0256
6.4051
0.61064 -1.07916
4
30.0000
280.1667
16.7382
0.32960 -1.01406
5
95.4231
226.3136 15.0437
-0.19486
-1.34244
Variable
Minimum
Maximum
Range Coef.
Var. Count
1
1.0000 21.0000
20.0000
0.7884 13.0000
2
26.0000
71.0000
45.0000
0.3231 13.0000
3
4.0000 23.0000
19.0000
0.5442 13.0000
4
6.0000 60.0000
54.0000
0.5579 13.0000
5
72.5000 115.9000
43.4000
0.1577 13.0000
Variable Lower
CLM Upper CLM Lower
CLV Upper CLV
1
3.9068 11.0162
17.7930 94.2894
2
38.7505 57.5572
124.5113 659.8163
3
7.8987 15.6398
21.0958 111.7918
4
19.8852 40.1148
144.0645 763.4335
5
86.3322 104.5139
116.3726 616.6877
In this example, we use some simple data to illustrate the use of frequencies, missing values, and the parameters IDO and NROW. In the data below, NaN represents a missing value.
f |
x |
y |
2 |
3.0 |
5.0 |
1 |
9.0 |
2.0 |
3 |
1.0 |
NaN |
We bring in the data one observation at a time in this example. Also, we bring in one false datum and then delete it on a subsequent call to UVSTA.
USE IMSL_LIBRARIES
IMPLICIT NONE
INTEGER LDSTAT, NVAR
PARAMETER (LDSTAT=15, NVAR=2)
!
INTEGER IDO, IFRQ, IPRINT, MOPT, NRMISS, NROW
REAL STAT(LDSTAT,NVAR), X1(1,NVAR+1)
! All data are input one observation
! at a time in the vector X1.
NROW = 1
! Frequencies are in the first
! position. No weights are used.
IFRQ = 1
! Get 95% confidence limits.
! Elementwise deletion of missing
! values.
MOPT = 1
! Print results, intermediate as well.
IPRINT = 2
! Bring in the first observation.
IDO = 1
X1(1,1) = 2.0
X1(1,2) = 3.0
X1(1,3) = 5.0
CALL UVSTA (X1, STAT, IDO=IDO, NVAR=NVAR, IFRQ=IFRQ, MOPT=MOPT, &
IPRINT=IPRINT, NRMISS=NRMISS)
! Bring in the second observation.
IDO = 2
X1(1,1) = 1.0
X1(1,2) = 9.0
X1(1,3) = 2.0
CALL UVSTA (X1, STAT, IDO=IDO, NVAR=NVAR, IFRQ=IFRQ, MOPT=MOPT, &
IPRINT=IPRINT, NRMISS=NRMISS)
! Bring in a false observation.
X1(1,1) = 3.0
X1(1,2) = 6.0
X1(1,3) = 3.0
CALL UVSTA (X1, STAT, IDO=IDO, NVAR=NVAR, IFRQ=IFRQ, MOPT=MOPT, &
IPRINT=IPRINT, NRMISS=NRMISS)
! Delete the false observation.
! This may make the mimina, maxima,
! and range incorrect.
NROW = -1
X1(1,1) = 3.0
X1(1,2) = 6.0
X1(1,3) = 3.0
CALL UVSTA (X1, STAT, IDO=IDO, NROW=NROW, NVAR=NVAR, IFRQ=IFRQ, &
MOPT=MOPT, IPRINT=IPRINT, NRMISS=NRMISS)
NROW = 1
! Bring in the final observation.
IDO = 3
X1(1,1) = 3.0
X1(1,2) = 1.0
X1(1,3) = AMACH(6)
CALL UVSTA (X1, STAT, IDO=IDO, NROW=NROW, NVAR=NVAR, IFRQ=IFRQ, &
MOPT=MOPT, IPRINT=IPRINT, NRMISS=NRMISS)
END
Intermediate Statistics from
UVSTA
Variable
Mean Sum
Sqs.
Minimum
Maximum Count
1
3.0000
0.0000
3.0000
3.0000 2.0000
2
5.0000
0.0000
5.0000
5.0000
2.0000
Intermediate Statistics from
UVSTA
Variable
Mean Sum Sqs.
Minimum
Maximum Count
1
5.0000
24.0000
3.0000
9.0000 3.0000
2
4.0000
6.0000
2.0000
5.0000
3.0000
Intermediate
Statistics from
UVSTA
Variable
Mean Sum Sqs.
Minimum
Maximum Count
1
5.5000
25.5000
3.0000
9.0000 6.0000
2
3.5000
7.5000
2.0000
5.0000
6.0000
Intermediate Statistics from
UVSTA
Variable
Mean Sum Sqs.
Minimum
Maximum Count
1
5.0000
24.0000
3.0000
9.0000 3.0000
2
4.0000
6.0000
2.0000
5.0000
3.0000
Univariate Statistics from
UVSTA
Variable
Mean Variance Std.
Dev. Skewness
Kurtosis
1
3.0000
9.6000
3.0984
1.4142 0.5000
2
4.0000
3.0000
1.7321 -0.7071
-1.5000
Variable
Minimum
Maximum Range
Coef. Var. Count
1
1.0000
9.0000
8.0000
1.0328 6.0000
2 2.0000
5.0000
3.0000
0.4330 3.0000
Variable Lower
CLM Upper CLM Lower
CLV Upper CLV
1
-0.2516
6.2516
3.7405 57.7470
2
-0.3027
8.3027
0.8133 118.4935
PHONE: 713.784.3131 FAX:713.781.9260 |