Chapter 2: Regression

RONE

Analyzes a simple linear regression model.

Required Arguments                             

XNOBS by NCOL matrix containing the data.   (Input)

IRSP — Column number IRSP of X contains the data for the response (dependent) variable.   (Input)

IND — Column number IND of X contains the data for the independent (explanatory) variable.   (Input)

AOV — Vector of length 15 containing statistics relating to the analysis of variance.   (Output)

I

AOV(I)

1

Degrees of freedom for regression

2

Degrees of freedom for error

3

Total degrees of freedom

4

Sum of squares for regression

5

Sum of squares for error

6

Total sum of squares

7

Regression mean square

8

Error mean square

9

F-statistic

10

p-value

11

R2 (in percent)

12

Adjusted R2 (in percent)

13

Estimated standard deviation of the model error

14

Mean of the response (dependent) variable

15

Coefficient of variation (in percent)

            If INTCEP = 1, the regression and total are corrected for the mean. If INTCEP = 0, the regression and total are not corrected for the mean, and AOV(14) and AOV(15) are set to NaN (not a number).

COEFINTCEP + 1 by 5 matrix containing statistics relating the regression coefficients.   (Output)
If INTCEP = 1, the first row corresponds to the intercept. Row INTCEP + 1 corresponds to the coefficient for the slope. The statistics in the columns are

Col.

Description

1

Coefficient estimate

2

Estimated standard error of the coefficient estimate

3

t-statistic for the test that the coefficient is zero

4

p-value for the two-sided t test

5

Variance inflation factor

COVBINTCEP + 1 by INTCEP + 1 matrix that is the estimated variance-covariance matrix of the estimated regression coefficients.   (Output)

TESTLF — Vector of length 10 containing statistics relating to the test for lack of fit of the model.   (Output)

Elem

Description

1

Degrees of freedom for lack of fit

2

Degrees of freedom for pure error

3

Degrees of freedom for error (TESTLF(1) + TESTLF(2))

4

Sum of squares for lack of fit

5

Sum of squares for pure error

6

Sum of squares for error

7

Mean square for lack of fit

8

Mean square for pure error

9

F statistic

10

p-value

            If there are no replicates in the data set, a test for lack of fit cannot be performed. In this case, elements 7, 8, 9, and 10 of TESTLF are set to NaN (not a number).

CASENOBS by 12 matrix containing case statistics.   (Output)
Columns 1 through 12 contain the following:

Col.

Description

1

Observed response

2

Predicted response

3

Residual

4

Leverage

5

Standardized residual

6

Jackknife residual

7

Cook’s distance

8

DFFITS

9, 10

Confidence interval on the mean

11, 12

Prediction interval

Optional Arguments

NOBS — Number of observations.   (Input)
Default: NOBS = size (X,1).

NCOL — Number of columns in X.   (Input)
Default: NCOL = size (X,2).

LDX — Leading dimension of X exactly as specified in the dimension statement in the calling program.   (Input)
Default: LDX = size (X,1).

INTCEP — Intercept option.   (Input)
Default: INTCEP = 1.

INTCEP                  Action

0          An intercept is not in the model.

1          An intercept is in the model.

IFRQ — Frequency option.   (Input)
IFRQ = 0 means that all frequencies are 1.0. For positive IFRQ, column number IFRQ of X contains the frequencies. If X(I, IFRQ) = 0.0, none of the remaining elements of row I of X are referenced, and updating of statistics is skipped for row I.
Default: IFRQ = 0.

IWT — Weighting option.   (Input)
IWT = 0 means that all weights are 1.0. For positive IWT, column number IWT of X contains the weights.
Default: IWT = 0.

IPRED — Prediction interval option.   (Input)
IPRED = 0 means that prediction intervals are computed for a single future response. For positive IPRED, a prediction interval is computed on the average of future responses, and column number IPRED of X contains the number of future responses in each average.
Default: IPRED =0.

CONPCM — Confidence level for two-sided interval estimates on the mean, in percent.   (Input)
CONPCM percent confidence intervals are computed, hence, CONPCM must be greater than or equal to 0.0 and less than 100.0. CONPCM often will be 90.0, 95.0, or 99.0. For one-sided intervals with confidence level ONECL, where ONECL is greater than or equal to 50.0 and less than 100.0, set CONPCM = 100.0 2.0 * (100.0 ONECL).
Default: CONPCM = 95.0.

CONPCP — Confidence level for two-sided prediction intervals, in percent.   (Input)
CONPCP percent prediction intervals are computed, hence, CONPCP must be greater than or equal to 0.0 and less than 100.0. CONPCP often will be 90.0, 95.0, or 99.0. For one-sided intervals with confidence level ONECL, where ONECL is greater than or equal to 50.0 and less than 100.0, set CONPCP = 100.0 2.0 * (100.0 ONECL).
Default: CONPCP = 95.0.

IPRINT — Printing option.   (Input)
Default: IPRINT = 0.

IPRINT

Action

0

No printing is performed.

1

AOV, COEF, TESTLF, and unusual rows of CASE are printed.

2

AOV, COEF, TESTLF, and unusual rows of CASE are printed. A plot of the data with the regression line is printed.

3

All printing is performed. A plot of the data with the regression line, a plot of the standardized residuals versus the independent variable, and a half-normal probability plot of the standardized residuals are printed.

LDCOEF — Leading dimension of COEF exactly as specified in the dimension statement in the calling program.   (Input)
Default: LDCOEF = size (COEF,1).

LDCOVB — Leading dimension of COVB exactly as specified in the dimension statement in the calling program.   (Input)
Default: LDCOVB = size (COVB,1).

LDCASE — Leading dimension of CASE exactly as specified in the dimension statement in the calling program.   (Input)
Default: LDCASE = size (CASE,1).

NRMISS — Number of rows of data encountered containing missing values for the independent, dependent, weight, or frequency variables.   (Output)
NaN (not a number) is used as the missing value code. Any row of X containing NaN as a value of the independent, dependent, weight, or frequency variables is omitted from the computations for fitting the model.

FORTRAN 90 Interface

Generic:                              CALL RONE (X, IRSP, IND, AOV, COEF, COVB, TESTLF, CASE [,…])

Specific:                             The specific interface names are S_RONE and D_RONE.

FORTRAN 77 Interface

Single:            CALL RONE (NOBS, NCOL, X, LDX, INTCEP, IRSP, IND, IFRQ, IWT, IPRED, CONPCM, CONPCP, IPRINT, AOV, COEF, LDCOEF, COVB, LDCOVB, TESTLF, CASE, LDCASE, NRMISS)

Double:                              The double precision name is DRONE.

Description

Routine RONE performs an analysis for the simple linear regression model. In addition to the fit, summary statistics (analysis of variance, t tests, lack-of-fit test), and confidence intervals and diagnostics for individual cases are computed. With the printing option, diagnostic plots can also be produced. Draper and Smith (1981, chapter 1) give formulas for many of the statistics computed by RONE. For definitions of the case diagnostics (stored in CASE), see the Usage Notes of this chapter.

Comments

1.         Workspace may be explicitly provided, if desired, by use of R2NE/DR2NE. The reference is:

CALL R2NE (NOBS, NCOL, X, LDX, INTCEP, IRSP, IND, IFRQ, IWT, IPRED, CONPCM, CONPCP, IPRINT, AOV, COEF, LDCOEF, COVB, LDCOVB, TESTLF, CASE, LDCASE, NRMISS, IWK, WK)

The additional arguments are as follows:

IWK — Work vector of length NOBS.

WK — Work vector of length 3 * NOBS.

2.         Informational errors

Type Code

3         5                  CONPCM is less than 50.0. Confidence percentages commonly used are 90.0, 95.0, and 99.0.

3         6                  CONPCP is less than 50.0. Confidence percentages commonly used are 90.0, 95.0, and 99.0.

4         1                  Negative weight encountered.

4         2                  Negative frequency encountered.

4         7                  Each row of X contains NaN.

Example 1

This example fits a line to a set of data discussed by Draper and Smith (1981, pages 933). The response y is the amount of steam used per month (in pounds), and the independent variable x is the average atmospheric temperature (in degrees Fahrenheit). The IPRINT = 1 option is selected. Hence, plots are not produced and only unusual cases are printed. Note in the case analysis, with the default page width, the observation number and the associated 12 statistics require two lines of output. (Routine PGOPT, Chapter 19, Utilities;, can be invoked to increase the page width to put all 12 statistics on the same line.) Also note that observation 11 is labeled with a “Y” to indicate an unusual y (response). The residual for this case is about 2 standard deviations from zero.

 

      USE RONE_INT

 

      IMPLICIT   NONE

      INTEGER    INTCEP, LDCASE, LDCOEF, LDCOVB, LDX, NCOEF, NCOL, NOBS

      INTEGER    J

      PARAMETER  (NOBS=25, LDX=25, LDCASE=25, INTCEP=1, NCOEF=INTCEP+1, &

                 LDCOEF=NCOEF, LDCOVB=NCOEF, NCOL=2)

!

      INTEGER    IND, IPRINT, IRSP, NRMISS

      REAL       AOV(15), CASE(LDCASE,12), COEF(LDCOEF,5), CONPCP, &

                 COVB(LDCOVB,NCOEF), TESTLF(10), X(LDX,NCOL)

!

      DATA (X(1,J),J=1,2)  /35.3, 10.98/

      DATA (X(2,J),J=1,2)  /29.7, 11.13/

      DATA (X(3,J),J=1,2)  /30.8, 12.51/

      DATA (X(4,J),J=1,2)  /58.8,  8.40/

      DATA (X(5,J),J=1,2)  /61.4,  9.27/

      DATA (X(6,J),J=1,2)  /71.3,  8.73/

      DATA (X(7,J),J=1,2)  /74.4,  6.36/

      DATA (X(8,J),J=1,2)  /76.7,  8.50/

      DATA (X(9,J),J=1,2)  /70.7,  7.82/

      DATA (X(10,J),J=1,2) /57.5,  9.14/

      DATA (X(11,J),J=1,2) /46.4,  8.24/

      DATA (X(12,J),J=1,2) /28.9, 12.19/

      DATA (X(13,J),J=1,2) /28.1, 11.88/

      DATA (X(14,J),J=1,2) /39.1,  9.57/

      DATA (X(15,J),J=1,2) /46.8, 10.94/

      DATA (X(16,J),J=1,2) /48.5,  9.58/

      DATA (X(17,J),J=1,2) /59.3, 10.09/

      DATA (X(18,J),J=1,2) /70.0,  8.11/

      DATA (X(19,J),J=1,2) /70.0,  6.83/

      DATA (X(20,J),J=1,2) /74.5,  8.88/

      DATA (X(21,J),J=1,2) /72.1,  7.68/

      DATA (X(22,J),J=1,2) /58.1,  8.47/

      DATA (X(23,J),J=1,2) /44.6,  8.86/

      DATA (X(24,J),J=1,2) /33.4, 10.36/

      DATA (X(25,J),J=1,2) /28.6, 11.08/

!

      IRSP   = 2

      IND    = 1

      CONPCP = 99.0

      IPRINT = 1

      CALL RONE (X, IRSP, IND, AOV, COEF, COVB, TESTLF, CASE, &

                CONPCP=CONPCP, IPRINT=IPRINT, NRMISS=NRMISS)

!

      END

Output

 

R-squared   Adjusted  Est. Std. Dev.              Coefficient of
(percent)  R-squared  of Model Error        Mean  Var. (percent)
   71.444     70.202          0.8901       9.424           9.445


                   * * * Analysis of Variance * * *

                              Sum of        Mean             Prob. of

Source                DF     Squares      Square  Overall F  Larger F

Regression             1       45.59       45.59     57.543    0.0000

Residual              23       18.22        0.79

Corrected Total       24       63.82


                * * * Inference on Coefficients * * *

                     Standard                 Prob. of    Variance

Coef.    Estimate       Error  t-statistic  Larger |t|   Inflation

    1       13.62      0.5815        23.43      0.0000       10.67

    2       -0.08      0.0105        -7.59      0.0000        1.00


                 * * * Test for Lack of Fit * * *

                          Sum of        Mean             Prob. of

Source            DF     Squares      Square  Overall F  Larger F

Lack of fit       22       17.40      0.7911      0.966    0.6801

Pure error         1        0.82      0.8192

Residual          23       18.22


                         * * * Case Analysis * * *

     Obs.   Observed  Predicted   Residual   Leverage  Std. Res.  Jack Res.

            Cook’s D     DFFITS   95.0% CI   95.0% CI   99.0% PI   99.0% PI

Y      11     8.2400     9.9189    -1.6789     0.0454    -1.9305    -2.0625

              0.0886    -0.4497     9.5267    10.3112     7.3640    12.4739

Figure 2- 2  Plot of Line and 99% One-at-a-Time Prediction Intervals

Additional Example

Example 2

This example fits a line to a data set discussed by Draper and Smith (1981, pages 3840). The data set contains several repeated x values in order to assess lack of fit of the straight line. The
IPRINT = 1 option is selected. Hence, plots are not produced and only unusual cases are printed. Note in the case analysis that observations 1 and 2 are labeled with an “X” to indicate an unusual x value. Each have leverage 0.1944 that exceeds the average leverage of p/n = 2/24 by a factor of 2.

 

      USE RONE_INT

 

      IMPLICIT   NONE

      INTEGER    LDCASE, LDCOEF, LDCOVB, LDX, NCOEF, NCOL, NOBS,J

      INTEGER    INTCEP, NRMISS

      PARAMETER  (INTCEP=1, NCOL=2, NOBS=24, LDCASE=NOBS, LDX=NOBS, &

                  NCOEF=INTCEP+1, LDCOEF=NCOEF, LDCOVB=NCOEF)

!

      INTEGER    IFRQ, IND, IPRED, IPRINT, IRSP

      REAL       AOV(15), CASE(LDCASE,12),COEF(LDCOEF,5), &

                 COVB(LDCOVB,NCOEF), TESTLF(10), X(LDX,NCOL)               

!

      DATA (X(1,J),J=1,2)  /2.3, 1.3/

      DATA (X(2,J),J=1,2)  /1.8, 1.3/

      DATA (X(3,J),J=1,2)  /2.8, 2.0/

      DATA (X(4,J),J=1,2)  /1.5, 2.0/

      DATA (X(5,J),J=1,2)  /2.2, 2.7/

      DATA (X(6,J),J=1,2)  /3.8, 3.3/

      DATA (X(7,J),J=1,2)  /1.8, 3.3/

      DATA (X(8,J),J=1,2)  /3.7, 3.7/

      DATA (X(9,J),J=1,2)  /1.7, 3.7/

      DATA (X(10,J),J=1,2) /2.8, 4.0/

      DATA (X(11,J),J=1,2) /2.8, 4.0/

      DATA (X(12,J),J=1,2) /2.2, 4.0/

      DATA (X(13,J),J=1,2) /5.4, 4.7/

      DATA (X(14,J),J=1,2) /3.2, 4.7/

      DATA (X(15,J),J=1,2) /1.9, 4.7/

      DATA (X(16,J),J=1,2) /1.8, 5.0/

      DATA (X(17,J),J=1,2) /3.5, 5.3/

      DATA (X(18,J),J=1,2) /2.8, 5.3/

      DATA (X(19,J),J=1,2) /2.1, 5.3/

      DATA (X(20,J),J=1,2) /3.4, 5.7/

      DATA (X(21,J),J=1,2) /3.2, 6.0/

      DATA (X(22,J),J=1,2) /3.0, 6.0/

      DATA (X(23,J),J=1,2) /3.0, 6.3/

      DATA (X(24,J),J=1,2) /5.9, 6.7/

!

      IRSP = 1

      IND  = 2

      IPRINT = 1

      CALL RONE (X, IRSP, IND, AOV, COEF, COVB, TESTLF, CASE, &

                IPRINT=IPRINT, NRMISS=NRMISS)

      END

Output

 

R-squared   Adjusted  Est. Std. Dev.              Coefficient of
(percent)  R-squared  of Model Error        Mean  Var. (percent)
   22.983     19.483          0.9815       2.858          34.34


                   * * * Analysis of Variance * * *

                              Sum of        Mean             Prob. of

Source                DF     Squares      Square  Overall F  Larger F

Regression             1        6.32       6.325      6.565    0.0178

Residual              22       21.19       0.963

Corrected Total       23       27.52


                * * * Inference on Coefficients * * *

                     Standard                 Prob. of    Variance

Coef.    Estimate       Error  t-statistic  Larger |t|   Inflation

    1       1.436      0.5900        2.435      0.0235       8.672

    2       0.338      0.1319        2.562      0.0178       1.000


                 * * * Test for Lack of Fit * * *

                          Sum of        Mean             Prob. of

Source            DF     Squares      Square  Overall F  Larger F

Lack of fit       11        8.72       0.793      0.700    0.7183

Pure error        11       12.47       1.134

Residual          22       21.19


                       * * * Case Analysis * * *

      Obs.  Observed  Predicted   Residual   Leverage  Std. Res.  Jack Res.

            Cook’s D     DFFITS   95.0% CI   95.0% CI   95.0% PI   95.0% PI

X       1     2.3000     1.8756     0.4244     0.1944     0.4817     0.4731

              0.0280     0.2324     0.9783     2.7730    -0.3489     4.1002

X       2     1.8000     1.8756    -0.0756     0.1944    -0.0859    -0.0839

              0.0009    -0.0412     0.9783     2.7730    -0.3489     4.1002

Y      13     5.4000     3.0245     2.3755     0.0460     2.4780     2.8515

              0.1481     0.6264     2.5877     3.4612     0.9426     5.1063

Y      24     5.9000     3.7002     2.1998     0.1537     2.4363     2.7855

              0.5391     1.1873     2.9021     4.4983     1.5138     5.8866

Figure 2- 3  Plot of Leverages hi and the Average (p/n = 2/24)



http://www.vni.com/
PHONE: 713.784.3131
FAX:713.781.9260