Chapter 1: Basic Statistics

UVSTA

Computes basic univariate statistics.

Required Arguments

X — |NROW| by NVAR + m matrix containing the data, where m is 0, 1, or 2 depending on whether any column(s) of X correspond to weights and/or frequencies.   (Input)

STAT — 15 by NVAR matrix containing in each row statistics on all of the variables.   (Output, if IDO = 0 or 1; input/output, if IDO = 2 or 3.)
The columns of STAT correspond to the columns of X, except for the columns of X containing weights or frequencies. (The columns beyond the weights or frequencies column are shifted to the left.)

I          STAT(I, *)

1          contains means

2          contains variances

3          contains standard deviations

4          contains coefficients of skewness

5          contains coefficients of excess (kurtosis)

6          contains minima

7          contains maxima

8          contains ranges

9          contains coefficients of variation, when they are defined. If the coefficient of variation is not defined for a given variable, STAT(9, *) contains a zero in the corresponding position.

10        contains numbers (counts) of nonmissing observations

11        is used only when CONPRM is positive, and, in this case, contains the lower confidence limit for the mean (assuming normality)

12        is used only when CONPRM is positive, and, in this case, contains the upper confidence limit for the mean (assuming normality)

13        is used only when CONPRV is positive, and, in this case, contains the lower confidence limit for the variance (assuming normality).

14        is used only when CONPRV is positive, and, in this case, contains the upper confidence limit for the variance (assuming normality).

15        is used only when weighting is used (IWT is nonnegative), and, in this case, contains the sums of the weights.

Optional Arguments

IDO — Processing option.   (Input)
Default: IDO = 0.

IDO  Action

0          This is the only invocation of UVSTA for this data set, and all the data are input at once.

1          This is the first invocation, and additional calls to UVSTA will be made. Initialization and updating for the data in X are performed. The means are output correctly, but the other quantities output in STAT are intermediate quantities.

2          This is an intermediate invocation of UVSTA, and updating for the data in X is performed.

3          This is the final invocation of this routine. If NROW is not zero, updating is performed. The wrap-up computations for STAT are performed.

NROW — The absolute value of NROW is the number of rows of data currently input in X.   (Input)
Default: NROW = size (X,1).
NROW may be positive, zero, or negative. Negative NROW means that the NROW rows of data are to be deleted from some aspects of the analysis, and this should be done only if IDO is 2 or 3 and the wrap-up computations for STAT have not been performed. When a negative value is input for NROW, it is assumed that each of the NROW rows of X has been input (with positive NROW) in a previous invocation of UVSTA. Use of negative values of NROW should be made with care and with the understanding that some quantities in STAT cannot be updated properly in this case. In particular, the minima, maxima, and ranges are not updated because of deletion. It is also possible that a constant variable in the remaining data will not be recognized as such.

NVAR — Number of variables (not including the weight or frequency variable, if used).   (Input)
Default: NVAR = size (X,2).

LDX — Leading dimension of X exactly as specified in the dimension statement in the calling program.   (Input)
Default: LDX = size (X,1).

IFRQ — Frequency option.   (Input)
IFRQ = 0 means that all frequencies are 1.0. For positive IFRQ, column number IFRQ of X contains the frequencies.
Default: IFRQ = 0.

IWT — Weighting option.   (Input)
IWT = 0 means that all weights are 1.0. For positive IWT, column IWT of X contains the weights.
Default: IWT = 0.

MOPT — Missing value option.   (Input)
NaN (not a number from routine AMACH(6)) is interpreted as the missing value code and any value in X equal to NaN is excluded from the computations.
Default: MOPT = 0.

MOPT   Action

0          The exclusion is listwise. (The entire row of X is excluded if any of the values of the row is equal to the missing value code.)

1          The exclusion is elementwise. (Statistics for variables with nonmissing values are updated.)

CONPRM — Confidence level for two-sided interval estimate of the means (assuming normality), in percent.   (Input)
If CONPRM 0, no confidence interval for the mean is computed; otherwise, a CONPRM percent confidence interval is computed, in which case CONPRM must be between 0.0 and 100.0. CONPRM is often 90.0, 95.0, or 99.0. For a one-sided confidence interval with confidence level ONECL, set CONPRM = 100.0 2.0 * (100.0 ONECL).
Default: CONPRM = .95.0.

CONPRV — Confidence level for two-sided interval estimate of the variances (assuming normality), in percent.   (Input)
The confidence intervals are symmetric in probability (rather than in length). See also the description of CONPRM.
Default: CONPRV = .95.0.

IPRINT — Printing option.   (Input)
Default: IPRINT = 0.

IPRINT

Action

1

No printing is performed.

2

Statistics in STAT are printed if IDO = 0 or 3.

3

Intermediate means, sums of squares about the mean, minima, maxima, and counts are printed when IDO = 1 or 2, and all statistics in STAT are printed when IDO = 0 or 3.

LDSTAT — Leading dimension of STAT exactly as specified in the dimension statement in the calling program.   (Input)
Default: LDSTAT = size (STAT,1).

NRMISS — Number of rows of data encountered in calls to UVSTA that contain any missing values.   (Output, if IDO = 0 or 1; input/output, if IDO = 2 or 3.)
Rows with a frequency of zero are not counted.

FORTRAN 90 Interface

Generic:                              CALL UVSTA (X, STAT [,…])

Specific:                             The specific interface names are S_UVSTA and D_UVSTA.

FORTRAN 77 Interface

Single:            CALL UVSTA (IDO, NROW, NVAR, X, LDX, IFRQ, IWT, MOPT, CONPRM, CONPRV, IPRINT, STAT, LDSTAT, NRMISS)

Double:                              The double precision name is DUVSTA.

Description

For the data in each column of X, except the columns containing frequencies or weights, UVSTA computes the sample mean, variance, minimum, maximum, and other basic statistics. It also computes confidence intervals for the mean and variance if the sample is assumed to be from a normal population.

Missing values, that is, values equal to NaN (not a number, the value returned by routine AMACH(6)), are excluded from the computations. If MOPT is positive, the exclusion is listwise; that is, the entire observation is excluded and no computations are performed even for the variables with valid values. If frequencies or weights are specified, any observation whose frequency or weight is missing is excluded from the computations.

Frequencies are interpreted as multiple occurrences of the other values in the observations. That is, a row of X with a frequency variable having a value of 2 has the same effect as two rows with frequencies of 1. The total of the frequencies is used in computing all of the statistics based on moments (mean, variance, skewness, and kurtosis). Weights are not viewed as replication factors. The sum of the weights is used only in computing the mean (of course, then the weighted mean is used in computing the central moments). Both weights and frequencies can be zero, but neither can be negative. In general, a zero frequency means that the row is to be eliminated from the analysis; no further processing, counting of missing values, or error checking is done on the row. Although it is not required that frequencies be integers, the logic of their treatment implicitly assumes that they are. Weights, on the other hand, are allowed to be continuous. A weight of zero results in the row being counted, and updates are made of statistics and of the number of missing values. A missing value for the frequency or a missing value for the weight when the frequency is nonzero results in the row being deleted from the analysis; but even in that case, if one is nonmissing, it is an error for that nonmissing weight or frequency to be negative.

The definitions of some of the statistics are given below in terms of a single variable x. The i-th datum is xi, with corresponding frequency fi and weight wi. If either frequencies or weights are not specified, fi and/or wi are identically one. The summation in each case is over the set of valid observations, based on the setting of MOPT and the presence of missing values in the data.

Number of nonmissing observations, STAT(10, *)

Mean, STAT(1, *)

Variance, STAT(2, *)

Skewness, STAT(4, *)

Excess or Kurtosis, STAT(5, *)

Minimum, STAT(6, *)

Maximum, STAT(7, *)

Range, STAT(8, *)

Coefficient of Variation, STAT(9, *)

The arguments IDO and NROW allow data to be input a few at a time and even to be deleted after having been included in the analysis. The minima, maxima, and ranges are not updated when observations are deleted.

Comments

Workspace may be explicitly provided, if desired, by use of U2STA/DU2STA. The reference is

CALL U2STA (IDO, NROW, NVAR, X, LDX, IFRQ, IWT, MOPT, CONPRM, CONPRV, IPRINT, STAT, LDSTAT, NRMISS, WK)

The additional argument is

WK — Real work vector of length specified above. WK should not be changed between calls to U2STA.

Example 1

This example uses data from Draper and Smith (1981). There are 5 variables and 13 observations.

 

      USE UVSTA_INT

      USE GDATA_INT

 

      IMPLICIT   NONE

      INTEGER    LDSTAT, LDX, NVAR

      PARAMETER  (LDSTAT=15, LDX=13, NVAR=5)

!

      INTEGER    IPRINT, NR, NROW, NV

      REAL       CONPRM, CONPRV, STAT(LDSTAT,NVAR), X(LDX,NVAR)

!                                 Get data for example.

      CALL GDATA (5, X, NR, NV)

!                                 All data are input at once.

      NROW = NR

!                                 No unequal frequencies or weights

!                                 are used.

!                                 Get 95% confidence limits.

!                                 Delete any row containing a missing

!                                 value.

!                                 Print results.

      IPRINT = 1

      CALL UVSTA (X, STAT, NROW=NROW, IPRINT=IPRINT)

      END

Output

 

Univariate Statistics from UVSTA


Variable      Mean       Variance    Std. Dev.       Skewness      Kurtosis
   1        7.4615        34.6026      5.8824         0.68768       0.07472
   2       48.1538       242.1410      15.5609       -0.04726      -1.32257
   3       11.7692        41.0256       6.4051        0.61064      -1.07916
   4       30.0000       280.1667      16.7382        0.32960      -1.01406
   5       95.4231       226.3136      15.0437       -0.19486      -1.34244


Variable    Minimum       Maximum        Range       Coef. Var.       Count
   1         1.0000       21.0000      20.0000          0.7884      13.0000
   2        26.0000       71.0000      45.0000          0.3231      13.0000
   3         4.0000       23.0000      19.0000          0.5442      13.0000
   4         6.0000       60.0000      54.0000          0.5579      13.0000
   5        72.5000      115.9000      43.4000          0.1577      13.0000

Variable  Lower CLM     Upper CLM    Lower CLV      Upper CLV
   1         3.9068       11.0162      17.7930        94.2894
   2        38.7505       57.5572     124.5113       659.8163
   3         7.8987       15.6398      21.0958       111.7918
   4        19.8852       40.1148     144.0645       763.4335
   5        86.3322      104.5139     116.3726       616.6877

Additional Examples

Example 2

In this example, we use some simple data to illustrate the use of frequencies, missing values, and the parameters IDO and NROW. In the data below, “NaN” represents a missing value.

f

x

y

2

3.0

5.0

1

9.0

2.0

3

1.0

NaN

We bring in the data one observation at a time in this example. Also, we bring in one false datum and then delete it on a subsequent call to UVSTA.

 

      USE IMSL_LIBRARIES

 

      IMPLICIT   NONE

      INTEGER    LDSTAT, NVAR

      PARAMETER  (LDSTAT=15, NVAR=2)

!

      INTEGER    IDO, IFRQ, IPRINT, MOPT, NRMISS, NROW

      REAL       STAT(LDSTAT,NVAR), X1(1,NVAR+1)

!                                 All data are input one observation

!                                 at a time in the vector X1.

      NROW = 1

!                                 Frequencies are in the first

!                                 position.  No weights are used.

      IFRQ = 1

!                                 Get 95% confidence limits.

!                                 Elementwise deletion of missing

!                                 values.

      MOPT = 1

!                                 Print results, intermediate as well.

      IPRINT = 2

!                                 Bring in the first observation.

      IDO   = 1

      X1(1,1) = 2.0

      X1(1,2) = 3.0

      X1(1,3) = 5.0

      CALL UVSTA (X1, STAT, IDO=IDO, NVAR=NVAR, IFRQ=IFRQ, MOPT=MOPT, &

                 IPRINT=IPRINT, NRMISS=NRMISS)

!                                 Bring in the second observation.

      IDO   = 2

      X1(1,1) = 1.0

      X1(1,2) = 9.0

      X1(1,3) = 2.0

      CALL UVSTA (X1, STAT, IDO=IDO, NVAR=NVAR, IFRQ=IFRQ, MOPT=MOPT, &

                 IPRINT=IPRINT, NRMISS=NRMISS)

!                                 Bring in a false observation.

      X1(1,1) = 3.0

      X1(1,2) = 6.0

      X1(1,3) = 3.0

      CALL UVSTA (X1, STAT, IDO=IDO, NVAR=NVAR, IFRQ=IFRQ, MOPT=MOPT, &

                 IPRINT=IPRINT, NRMISS=NRMISS)

!                                 Delete the false observation.

!                                 This may make the mimina, maxima,

!                                 and range incorrect.

      NROW  = -1

      X1(1,1) = 3.0

      X1(1,2) = 6.0

      X1(1,3) = 3.0

      CALL UVSTA (X1, STAT, IDO=IDO, NROW=NROW, NVAR=NVAR, IFRQ=IFRQ, &

                 MOPT=MOPT, IPRINT=IPRINT, NRMISS=NRMISS)

      NROW = 1

!                                 Bring in the final observation.

      IDO   = 3

      X1(1,1) = 3.0

      X1(1,2) = 1.0

      X1(1,3) = AMACH(6)

      CALL UVSTA (X1, STAT, IDO=IDO, NROW=NROW, NVAR=NVAR, IFRQ=IFRQ, &

                 MOPT=MOPT, IPRINT=IPRINT, NRMISS=NRMISS)

      END

Output

 

                   Intermediate Statistics from UVSTA
Variable         Mean      Sum Sqs.       Minimum       Maximum       Count
   1           3.0000        0.0000        3.0000        3.0000      2.0000
   2           5.0000        0.0000        5.0000        5.0000      2.0000

                       Intermediate Statistics from UVSTA
Variable         Mean      Sum Sqs.       Minimum       Maximum       Count
   1           5.0000       24.0000        3.0000        9.0000      3.0000
   2           4.0000        6.0000        2.0000        5.0000      3.0000

                       Intermediate Statistics from UVSTA
Variable         Mean      Sum Sqs.       Minimum       Maximum       Count
   1           5.5000       25.5000        3.0000        9.0000      6.0000
   2           3.5000        7.5000        2.0000        5.0000      6.0000

                       Intermediate Statistics from UVSTA
Variable         Mean      Sum Sqs.       Minimum       Maximum       Count
   1           5.0000       24.0000        3.0000        9.0000      3.0000
   2           4.0000        6.0000        2.0000        5.0000      3.0000

                        Univariate Statistics from UVSTA
Variable         Mean      Variance     Std. Dev.      Skewness    Kurtosis
   1           3.0000        9.6000        3.0984        1.4142      0.5000
   2           4.0000        3.0000        1.7321       -0.7071     -1.5000


Variable      Minimum       Maximum         Range    Coef. Var.       Count
   1           1.0000        9.0000        8.0000        1.0328      6.0000
   2           2.0000        5.0000        3.0000        0.4330      3.0000


Variable     Lower CLM     Upper CLM     Lower CLV     Upper CLV
    1          -0.2516        6.2516        3.7405       57.7470
    2          -0.3027        8.3027        0.8133      118.4935



http://www.vni.com/
PHONE: 713.784.3131
FAX:713.781.9260