Chapter 4: Analysis of Variance

ABALD

Analyzes a balanced complete experimental design for a fixed, random, or mixed model.

Required Arguments

NL — Vector of length NF containing the number of levels for each of the factors.   (Input)

Y — Vector of length NL(1) * NL(2) *…* NL(NF) containing the responses.   (Input)
Y must not contain NaN (not a number) for any of its elements, i.e., missing values are not allowed.

NRF — For positive NRF, NRF is the number of random factors.   (Input)
For negative NRF, NRF is the number of random effects (sources of variation).

INDRF — Index vector of length |NRF| containing either the factor numbers to be considered random (for NRF positive) or containing the effect numbers to be considered random (for NRF negative).   (Input)
If NRF = 0, INDRF is not referenced and can be a vector of length one.

NFEF — Vector of length NEF containing the number of factors associated with each effect in the model.   (Input)

INDEF — Index vector of length NFEF(1) + NFEF(2) + … + NFEF(NEF).   (Input)
The first NFEF(1) elements give the factor numbers in the first effect. The next NFEF(2) elements give the the factor numbers in the second effect. The last NFEF(NEF) elements give the factor numbers in the last effect. Main effects must appear before their interactions. In general, an effect E cannot appear after an effect F if all of the indices for E appear also in F .

AOV — Vector of length 15 containing statistics relating to the analysis of variance.   (Output)

I

AOV(I)

1

Degrees of freedom for regression

2

Degrees of freedom for error

3

Total degrees of freedom

4

Sum of squares for regression

5

Sum of squares for error

6

Total sum of squares

7

Regression mean square

8

Error mean square

9

F-statistic

10

p-value

11

R2 (in percent)

12

Adjusted R2 (in percent)

13

Estimated standard deviation of the model error

14

Mean of the response (dependent) variable

15

Coefficient of variation (in percent)

Optional Arguments

NF — Number of factors (number of subscripts) in the model, including
error.   (Input)
Default: NF = size (NL,1).

NEF — Number of effects (sources of variation) due to the model excluding the overall mean and error.   (Input)
Default: NEF = size (NFEF,1).

CONPER — Confidence level for two-sided interval estimates on the variance components, in percent.   (Input)
CONPER percent confidence intervals are computed, hence, CONPER must be in the interval [0.0, 100.0). CONPER often will be 90.0, 95.0, or 99.0. For one-sided intervals with confidence level ONECL, ONECL in the interval [50.0, 100.0), set
CONPER = 100.0  2.0 * (100.0 ONECL).
Default: CONPER = 95.0.

IPRINT — Printing option.   (Input)
Default: IPRINT = 0.

IPRINT

Action

0

No printing is performed.

1

All is performed.

-k

Printing restricted to exclude marginal means higher than k ways. For example, only one-way and two-way marginal means will be printed if IPRINT = 2.

Let

The value of IPRINT must be between n and 1, inclusively.

MODEL — Model Option.   (Input)
Default: MODEL = 0.

MODEL                    Meaning

0                              Searle model

1                              Scheffe model

For the Scheffe model, effects corresponding to interactions of fixed and random factors have their sum over the subscripts corresponding to fixed factors equal to zero. Also, the variance of a random interaction effect involving some fixed factors has a multiplier for the associated variance component that involves the number of levels in the fixed factors. The Searle model has no summation restrictions on the random interaction effects and has a multiplier of one for each variance component.

EMS — Vector of length (NEF + 1) * (NEF + 2)/2 containing expected mean square coefficients.   (Output)
Suppose the effects are A, B, and AB. The ordering of the coefficients in EMS is as follows:

                                            Error   AB   B             A
A                             EMS(1)        EMS(2)    EMS(3)    EMS(4)
B                             EMS(5)        EMS(6)    EMS(7)
AB                           EMS(8)        EMS(9)
Error                       EMS(10)

VC — NEF + 1 by 9 matrix containing statistics relating to the particular variance components or effects in the model and the error.   (Output)
Rows of VC correspond to the NEF effects plus error. Columns of VC are as follows:

Column

Description

1

Degrees of freedom

2

Sum of squares

3

Mean squares

4

F -statistic

5

p-value for F test

6

Variance component estimate

7

Percent of variance of y explained by random effect

8

Lower endpoint for a confidence interval on the variance component

9

Upper endpoint for a confidence interval on the variance component

            Columns 6 through 9 contain NaN (not a number) if the effect is fixed, i.e., if there is no variance component to be estimated. If the variance component estimate is negative, columns 8 and 9 contain NaN.

LDVC — Leading dimension of VC exactly as specified in the dimension statement of the calling program.   (Input)
Deafult:  LDVC = size( VC ,1).

YMEANS — Vector of length (NL(1) + 1) * (NL(2) + 1) * … * (NL(n) + 1) containing the subgroup means.   (Output)
Suppose the factors are A, B, and C. The ordering of the means is grand mean, A means, B means, C means, AB means, AC means, BC means, and ABC means.

FORTRAN 90 Interface

Generic:          CALL ABALD (NL, Y, NRF, INDRF, NFEF, INDEF, AOV [,…])

Specific:                             The specific interface names are S_ABALD and D_ABALD.

FORTRAN 77 Interface

Single:            CALL ABALD (NF, NL, Y, NRF, INDRF, NEF, NFEF, INDEF, CONPER, IPRINT, MODEL, AOV, EMS, VC, LDVC, YMEANS)

Double:                              The double precision name is DABALD.

Description

Routine ABALD analyzes a balanced complete experimental design for a fixed, random, or mixed model. The analysis includes an analysis of variance table, and computation of subgroup means and variance component estimates. A choice of two parameterizations of the variance components for the model can be made.

Scheffι (1959, pages 274289) discusses the parameterization for MODEL = 1. For example, consider the following model equation with fixed factor A and random factor B:

yijk = μ + α i + bj + cij + eijk     i = 1, 2, …, a; j = 1, 2, …, b; k = 1, 2, …, n

The fixed effects α i’s are subject to the restriction

the bj’s are random effects identically and independently distributed

cij are interaction effects each distributed

and are subject to the restrictions

and the eijk’s are errors identically and independently distributed N(0, σ2). In general, interactions of fixed and random factors have sums over subscripts corresponding to fixed factors equal to zero. Also in general, the variance of a random interaction effect is the associated variance component times a product of ratios for each fixed factor in the random interaction term. Each ratio depends on the number of levels in the fixed factor. In the earlier example, the random interaction AB has the ratio (a 1)/a as a multiplier of

and

In a three-way crossed classification model, an ABC interaction effect with A fixed, B random, and C fixed would have variance

Searle (1971, pages 400401) discusses the parameterization for MODEL = 0. This parameterization does not have the summation restrictions on the effects corresponding to interactions of fixed and random factors. Also, the variance of each random interaction term is the associated variance component, i.e., without the multiplier. This parameterization is also used with unbalanced data, which is one reason for is popularity with balanced data also. In the earlier example,

Searle (1971, pages 400404) compares these two parameterizations. Hocking (1973) considers these different parameterizations and concludes they are equivalent because they yield the same variance-covariance structure for the responses. Differences in covariances for individual terms, differences in expected mean square coefficients and differences in F tests are just a consequence of the definition of the individual terms in the model and are not caused by any fundamental differences in the models. For the earlier two-way model, Hocking states that the relations between the two parameterizations of the variance components are

where

are the variance components in the parameterization with MODEL = 0.

The computations for degrees of freedom and sums of squares are the same regardless of the option specified by MODEL. ABALD first computes degrees of freedom and sum of squares for a full factorial design. Degrees of freedom for effects in the factorial design that are missing from the specified model are pooled into the model effect containing the fewest subscripts but still containing the factorial effect. If no such model effect exists, the factorial effect is pooled into error. If more than one such effect exists, a terminal error message is issued indicating a misspecified model.

The analysis of variance method is used for estimating the variance components. This method solves a linear system in which the mean squares are set to the expected mean squares. A problem that Hocking (1985, pages 324330) discusses is that this method can yield a negative variance component estimate. Hocking suggests a diagnostic procedure for locating the cause of the negative estimate. It may be necessary to re-examine the assumptions of the model.

The percentage of variation explained by each random effect is computed (output in VC(i, 7)) as the variance of the associated random effect divided by the variance of y. The two parameterizations can lead to different values because of the different definitions of the individual terms in the model. For example, the percentage associated with the AB interaction term in the earlier two-way mixed model is computed for MODEL = 1 using the formula

while for the parameterization MODEL = 0, the percentage is computed using the formula

In each case, the variance compenents are replaced by their estimates (stored in VC(i, 6)).

Confidence intervals on the variance components are computed using the method discussed by Graybill (1976, Theorem 15.3.5, page 624, and Note 4, page 620). Routine CIDMS is used.

Comments

Workspace may be explicitly provided, if desired, by use of A2ALD/DA2ALD. The reference is:

CALL A2ALD (NF, NL, Y, NRF, INDRF, NEF, NFEF, INDEF, CONPER, IPRINT, MODEL, AOV, EMS, VC, LDVC, YMEANS, WK, IWK, CHWK)

The additional arguments are as follows:

WK — Work vector of length 3 * 2NFNF + 2 * NEF+ m + 4.

IWK — Work vector of length max(2NFNF, NF + NEF + LINDEF )+ 2NFNF 1 + NF * 2NFNF1.

CHWK — CHARACTER * 13 vector of length max(NEF + 3, 2n 1). If IPRINT = 0, CHWK is not referenced and can be a vector of length one.

Example 1

An analysis of a generalized randomized block design is performed using data discussed by Kirk (1982, Table 6.10-1, pages 293297). The model is

yijk = μ + α i + bj + cij + eijk     i = 1, 2, 3, 4; j = 1, 2, 3, 4; k = 1, 2

where yijk is the response for the k-th experimental unit in block j with treatment i; the α i’s are the treatment effects and are subject to the restriction

the bj’s are block effects identically and independently distributed

cij are interaction effects each distributed

and are subject to the restrictions

and the eijk’s are errors, identically and independently distributed N(0, σ2). The interaction effects are assumed to be distributed independently of the errors.

The data are given in the following table:

 

 

Block

Treatment

1

2

3

4

1

3, 6

3, 1

2, 2

3, 2

2

4, 5

4, 2

3, 4

3, 3

3

7, 8

7, 5

6, 5

6, 6

4

7, 8

9, 10

10, 9

8, 11

 

      USE ABALD_INT

 

      IMPLICIT   NONE

      INTEGER    LINDEF, NEF, NF, NOBS, NRF

      PARAMETER  (LINDEF=4, NEF=3, NF=3, NOBS=32, NRF=2)

!

      INTEGER    INDEF(LINDEF), INDRF(NRF), IPRINT, MODEL, NFEF(NEF), &

                 NL(NF)

      REAL       AOV(15), Y(NOBS)

!

      DATA NL/4, 4, 2/

      DATA INDRF/2, 3/

      DATA NFEF/1, 1, 2/

      DATA INDEF/1, 2, 1, 2/

      DATA Y/3.0, 6.0, 3.0, 1.0, 2.0, 2.0, 3.0, 2.0, 4.0, 5.0, 4.0, &

          2.0, 3.0, 4.0, 3.0, 3.0, 7.0, 8.0, 7.0, 5.0, 6.0, 5.0, &

          6.0, 6.0, 7.0, 8.0, 9.0, 10.0, 10.0, 9.0, 8.0, 11.0/

!

      IPRINT = 1

      MODEL  = 1

      CALL ABALD  (NL, Y, NRF, INDRF, NFEF, INDEF, AOV, IPRINT=IPRINT, &
      MODEL=MODEL)

      END

Output

 

Dependent  R-squared   Adjusted  Est. Std. Dev.              Coefficient of
Variable   (percent)  R-squared  of Model Error        Mean  Var. (percent)
Y             91.932     84.368            1.09       5.375           20.27

 


                   * * * Analysis of Variance * * *

                               Sum of        Mean             Prob. of

 Source                DF     Squares      Square  Overall F  Larger F

 Model                 15       216.5       14.43     12.154    0.0000

 Error                 16        19.0        1.19

 Corrected Total       31       235.5


                               Sum of        Mean             Prob. of

 Source                DF     Squares      Square          F  Larger F

 A                      3      194.50     64.8333     32.873    0.0000

 B                      3        4.25      1.4167      1.193    0.3440

 AB                     9       17.75      1.9722      1.661    0.1802


      * * * EMS * * *

        Error  AB   B   A

 A          1   2   0   8

 B          1   0   8

 AB         1   2

 Error      1


                 * * * Variance Components * * *

                                     95.0% Confidence Interval

 Variance                            --------------------------

 Component     Estimate     Percent   Lower Limit   Upper Limit

 B               0.0286       1.897       0.00000        2.3168

 AB              0.3924      19.483       0.00000        2.7580

 Error           1.1875      78.621       0.65868        2.7506


 * * * Subgroup Means * * *

  A Means (N=8)

 1        2.7500

 2        3.5000

 3        6.2500

 4        9.0000
  B Means (N=8)

 1        6.0000

 2        5.1250

 3        5.1250

 4        5.2500
   AB  Means (N=2)

 1  1        4.5000

 1  2        2.0000

 1  3        2.0000

 1  4        2.5000

 2  1        4.5000

 2  2        3.0000

 2  3        3.5000

 2  4        3.0000

 3  1        7.5000

 3  2        6.0000

 3  3        5.5000

 3  4        6.0000

 4  1        7.5000

 4  2        9.5000

 4  3        9.5000

 4  4        9.5000

Additional Examples

Example 2

An analysis of a split-plot design is performed using data discussed by Milliken and Johnson (1984, Table 24.1, page 297). Label the two treatment factors A and C. Denote the treatment combination aick as that at the i-th level of A and the k-th level of C. The model is

yijk = μ + α i + bj + dij + δik + eijk     i = 1, 2; j = 1, 2; k = 1, 2, 3, 4

where yijk is the response for the j-th experimental unit with treatment combination aick; the α i’s are the effects due to treatment factor A, the bj’s are block effects identically and independently distributed

the dij are split plot errors that are identically and independently distributed

the γk’s are the effects due to treatment factor C, the δik’s are interaction effects between factors A and C, and the eijk’s are identically and independently distributed N(0, σ2). The block effects, whole plot errors, and split plot errors are independent.

The data are given in the following table.

 

 

C

A

Block

1

2

1

1

2

35.4

41.6

37.9

40.3

2

1

2

36.7

42.7

38.2

41.6

3

1

2

34.8

43.6

36.4

42.8

4

1

2

39.5

44.5

40.0

47.6

 

      USE ABALD_INT

 

      IMPLICIT   NONE

      INTEGER    LDVC, LINDEF, NEF, NF, NOBS, NRF

      PARAMETER  (LINDEF=7, NEF=5, NF=3, NOBS=16, NRF=1)

!

      INTEGER    INDEF(LINDEF), INDRF(NRF), IPRINT, NFEF(NEF), NL(NF)

      REAL       AOV(15), Y(NOBS)

!

      DATA NL/4, 2, 2/

      DATA INDRF/2/

      DATA NFEF/1, 1, 2, 1, 2/

      DATA INDEF/1, 2, 1, 2, 3, 1, 3/

      DATA Y/35.4, 37.9, 41.6, 40.3, 36.7, 38.2, 42.7, 41.6, 34.8, &

          36.4, 43.6, 42.8, 39.5, 40.0, 44.5, 47.6/

!

      IPRINT = -2

      CALL ABALD (NL, Y, NRF, INDRF, NFEF, INDEF, AOV, IPRINT=IPRINT)

      END

Output

 

Dependent  R-squared   Adjusted  Est. Std. Dev.              Coefficient of
Variable   (percent)  R-squared  of Model Error        Mean  Var. (percent)
Y             95.574     83.401           1.452       40.22           3.609


                   * * * Analysis of Variance * * *

                               Sum of        Mean             Prob. of

 Source                DF     Squares      Square  Overall F  Larger F

 Model                 11       182.0       16.55      7.852    0.0306

 Error                  4         8.4        2.11

 Corrected Total       15       190.4


                               Sum of        Mean             Prob. of

 Source                DF     Squares      Square          F  Larger F

 A                      3      40.190      13.397      5.802    0.0914

 B                      1     131.102     131.102     56.775    0.0048

 AB                     3       6.928       2.309      1.096    0.4476

 C                      1       2.250       2.250      1.068    0.3599

 AC                     3       1.550       0.517      0.245    0.8612


          * * * EMS * * *

        Error  AC   C  AB   B   A

 A          1   0   0   2   0   4

 B          1   0   0   2   8

 AB         1   0   0   2

 C          1   0   8

 AC         1   2

 Error      1


                 * * * Variance Components * * *

                                     95.0% Confidence Interval

 Variance                            --------------------------

 Component     Estimate     Percent   Lower Limit   Upper Limit

 B               16.099      87.938        2.2597       16686.7

 AB               0.101       0.551        0.0000          15.1

 Error            2.108      11.512        0.7565          17.4


 * * * Subgroup Means * * *

  A Means (N=4)

 1       38.8000

 2       39.8000

 3       39.4000

 4       42.9000
  B Means (N=8)

 1       37.3625

 2       43.0875
  C Means (N=8)

 1       39.8500

 2       40.6000
   AB  Means (N=2)

 1  1       36.6500

 1  2       40.9500

 2  1       37.4500

 2  2       42.1500

 3  1       35.6000

 3  2       43.2000

 4  1       39.7500

 4  2       46.0500
   AC  Means (N=2)

 1  1       38.5000

 1  2       39.1000

 2  1       39.7000

 2  2       39.9000

 3  1       39.2000

 3  2       39.6000

 4  1       42.0000

 4  2       43.8000
   BC  Means (N=4)

 1  1       36.6000

 1  2       38.1250

 2  1       43.1000

 2  2       43.0750

Example 3

An analysis of a split-plot factorial design is performed using data discussed by Kirk (1982, Table 11.2-1, pages 493496). Label the two treatment factors A and C. Denote the treatment combination aick as that at the i-th level of A and the k-th level of C. The model is

yijk = μ + α i + bjj + γk + δik + eijk     i = 1, 2; j = 1, 2, 3, 4; k = 1, 2, 3, 4

where yijk is the response for the j-th experimental unit with treatment combination aick; the α i’s are the effects due to treatment factor A and are subject to the restriction

the bij’s are block effects identically and independently distributed

the γk’s are the effects due to treatment factor C and are subject to the restriction

the δik’s are interaction effects between factors A and C and are subject to the restrictions

for each k, and

for each i, and the eijk’s are identically and independently distributed N(0, σ2). The block effects are assumed to be distributed independently of the errors.

The data are given in the following table:

 

 

 

C

A

Block

1

2

3

4

1

1

3

4

7

7

 

2

6

5

8

8

 

3

3

4

7

9

 

4

3

3

6

8

2

5

1

2

5

10

 

6

2

3

6

10

 

7

2

4

5

9

 

8

2

3

6

11

 

      USE ABALD_INT

 

      IMPLICIT   NONE

      INTEGER    LINDEF, NEF, NF, NOBS, NRF

      PARAMETER  (LINDEF=6, NEF=4, NF=3, NOBS=32, NRF=-1)

!

      INTEGER    INDEF(LINDEF), INDRF(-NRF), IPRINT, MODEL, NFEF(NEF), &

                 NL(NF)

      REAL       AOV(15), Y(NOBS)

!

      DATA NL/2, 4, 4/

      DATA INDRF/2/

      DATA NFEF/1, 2, 1, 2/

      DATA INDEF/1, 1, 2, 3, 1, 3/

      DATA Y/3.0, 4.0, 7.0, 7.0, 6.0, 5.0, 8.0, 8.0, 3.0, 4.0, 7.0, 9.0, &

           3.0, 3.0, 6.0, 8.0, 1.0, 2.0, 5.0, 10.0, 2.0, 3.0, 6.0, &

            10.0, 2.0, 4.0, 5.0, 9.0, 2.0, 3.0, 6.0, 11.0/

!

      IPRINT = 1

      MODEL  = 1

      CALL ABALD (NL, Y, NRF, INDRF, NFEF, INDEF, AOV, IPRINT=IPRINT, &
      MODEL=MODEL)

      END

Output

 

Dependent  R-squared   Adjusted  Est. Std. Dev.              Coefficient of
Variable   (percent)  R-squared  of Model Error        Mean  Var. (percent)
Y             96.125     93.327           0.712       5.375           13.25


                   * * * Analysis of Variance * * *

                               Sum of        Mean             Prob. of

 Source                DF     Squares      Square  Overall F  Larger F

 Model                 13       226.4       17.41     34.350    0.0000

 Error                 18         9.1        0.51

 Corrected Total       31       235.5


                               Sum of        Mean             Prob. of

 Source                DF     Squares      Square          F  Larger F

 A                      1       3.125      3.1250      2.000    0.2070

 AB                     6       9.375      1.5625      3.082    0.0296

 C                      3     194.500     64.8333    127.890    0.0000

 AC                     3      19.375      6.4583     12.740    0.0001
        * * * EMS * * *

        Error  AC   C  AB   A

 A          1   0   0   4  16

 AB         1   0   0   4

 C          1   0   8

 AC         1   4
 Error      1

                 * * * Variance Components * * *

                                     95.0% Confidence Interval

 Variance                            --------------------------

 Component     Estimate     Percent   Lower Limit   Upper Limit

 AB             0.26389      34.234       0.00000        1.7760

 Error          0.50694      65.766       0.28944        1.1086


 * * * Subgroup Means * * *

 A Means (N=16)

 1        5.6875

 2        5.0625
  B Means (N=8)

 1        4.8750

 2        6.0000

 1        5.3750

 2        5.2500
  C Means (N=8)

 1        2.7500

 2        3.5000

 1        6.2500

 2        9.0000
   AB  Means (N=4)

 1  1        5.2500

 1  2        6.7500

 1  3        5.7500

 1  4        5.0000

 2  1        4.5000

 2  2        5.2500

 2  3        5.0000

 2  4        5.5000
   AC  Means (N=4)

 1  1        3.7500

 1  2        4.0000

 1  3        7.0000

 1  4        8.0000

 2  1        1.7500

 2  2        3.0000

 2  3        5.5000

 2  4       10.0000
   BC  Means (N=2)

 1  1        2.0000

 1  2        3.0000

 1  3        6.0000

 1  4        8.5000

 2  1        4.0000

 2  2        4.0000

 2  3        7.0000

 2  4        9.0000

 1  1        2.5000

 1  2        4.0000

 1  3        6.0000

 1  4        9.0000

 2  1        2.5000

 2  2        3.0000

 2  3        6.0000

 2  4        9.5000
   ABC   Means (N=1)

 1  1  1        3.0000

 1  1  2        4.0000

 1  1  3        7.0000

 1  1  4        7.0000

 1  2  1        6.0000

 1  2  2        5.0000

 1  2  3        8.0000

 1  2  4        8.0000

 1  3  1        3.0000

 1  3  2        4.0000

 1  3  3        7.0000

 1  3  4        9.0000

 1  4  1        3.0000

 1  4  2        3.0000

 1  4  3        6.0000

 1  4  4        8.0000

 2  1  1        1.0000

 2  1  2        2.0000

 2  1  3        5.0000

 2  1  4       10.0000

 2  2  1        2.0000

 2  2  2        3.0000

 2  2  3        6.0000

 2  2  4       10.0000

 2  3  1        2.0000

 2  3  2        4.0000

 2  3  3        5.0000

 2  3  4        9.0000

 2  4  1        2.0000

 2  4  2        3.0000

 2  4  3        6.0000

 2  4  4       11.0000



http://www.vni.com/
PHONE: 713.784.3131
FAX:713.781.9260