Performs a chi-squared goodness-of-fit test.
CDF User-supplied FUNCTION to compute the cumulative distribution function (CDF) at a given value. The form is CDF(Y), where
Y Value at which the CDF is to be evaluated. (Input)
CDF
Value of the CDF at Y. (Output)
CDF must be declared EXTERNAL in the calling program.
NELM The
absolute value of NELM is the number of
data elements currently input in X. (Input)
NELM may be
positive, zero, or negative. Negative NELM means delete the
−NELM data elements
from the analysis.
X Vector of
length |NELM|
containing the data elements for this call. (Input)
If the data
element is missing (NaN, not a number), then the observation is ignored.
NCAT The
absolute value of NCAT is the number of
cells into which the observations are to be tallied. (Input)
If
NCAT is
negative, then CHIGF
chooses the cutpoints in CUTP so that the cells
are equiprobable in continuous distributions. NCAT should not be
negative in discrete distributions. The user must be careful to define cutpoints
in discrete distributions since no error message can be generated in this
situation if NCAT is negative.
RNGE Vector of
length 2 containing the lower and upper endpoints of the range of the
distribution, respectively. (Input)
If the lower and upper
endpoints are equal, a range on the whole real line is used. If the lower and
upper endpoints are different, points outside of the range are ignored so that
distributions conditional on the range can be used. In this case, the point
RNGE(1) is
excluded from the first interval, but the point RNGE(2) is included in
the last interval.
NDFEST Number of parameters estimated in computing the CDF. (Input)
CUTP Vector of
length |NCAT|
− 1
containing the cutpoints defining the cells. (Input, if NCAT is positive,
output, otherwise)
|NCAT| − 1 cutpoints
define the cells to be used. If NCAT is positive, then
the cutpoints are input by the user. The intervals defined by the cutpoints are
such that the lower endpoint is not included while the upper endpoint is
included in the interval.
P
p-value for the chi-squared statistic in CHISQ(|NCAT| +
1). (Output)
This chi-squared statistic has DF degrees of
freedom.
IDO Processing
option. (Input)
Default: IDO = 0.
IDO Action
0 This is the only call to CHIGF, and all of the data are input on this call.
1 This is the first call to CHIGF, and additional calls to CHIGF will be made. Initialization and updating for the data in X are performed.
2 This is an intermediate call to CHIGF. Updating for the data in X is performed.
3 This is the final call to CHIGF. Updating for the data in X and wrap-up computations are performed.
Calls to CHIGF with IDO = 2 or 3 may be intermixed. It is permissible for a call with IDO = 2 to follow a call with IDO = 3.
FRQ Vector
containing the frequencies. (Input)
If the first element of
FRQ
is −1.0, then all
frequencies are taken to be 1 and FRQ
is of length 1. Otherwise, FRQ
is of length |NELM|, and the
elements in FRQ
contain the frequency of the corresponding observation in X. If the frequency is
missing (NaN, not a number) (and FRQ(1)
is not −1.0), the
observation is ignored.
Default: FRQ(1) = -1.0.
COUNTS Vector of length |NCAT| containing the counts in each of the cells. (Output, if IDO = 0 or 1; input/output, if IDO > 1)
EXPECT Vector of length |NCAT| containing the expected count in each cell. (Output, if IDO = 0 or 3; not referenced otherwise)
CHISQ Vector of
length |NCAT| +
1 containing the contributions to chi-squared. (Output, if IDO
= 0 or 3, not referenced otherwise)
Elements 1 through |NCAT| contain the
contributions to chi-squared for the corresponding cell. Element |NCAT| + 1 contains the
total chi-squared statistic.
DF Degrees of freedom in chi-squared. (Output)
Generic: CALL CHIGF (CDF, NELM, X, NCAT, RNGE, NDFEST, CUTP, P [, ])
Specific: The specific interface names are S_CHIGF and D_CHIGF.
Single: CALL CHIGF (IDO, CDF, NELM, X, FRQ, NCAT, RNGE, NDFEST, CUTP, COUNTS, EXPECT, CHISQ, P, DF)
Double: The double precision name is DCHIGF.
Routine CHIGF performs a chi-squared goodness-of-fit test that a random sample of observations is distributed according to a specified theoretical cumulative distribution. The theoretical distribution, which may be continuous, discrete, or a mixture of discrete and continuous distributions, is specified via a user-defined FUNCTION. Because the user is allowed to specify a range for the observations, a test that is conditional upon the specified range is performed.
|NCAT| gives the number of intervals into which the observations are to be divided. These intervals can be specified via the vector CUTP, which contains the cutpoints (or endpoints) for the intervals. Or if NCAT is negative, equiprobable intervals computed by CHIGF can be used. Regardless of the method used to obtain them, the intervals are such that the lower endpoint is not included in the interval while the upper endpoint is always included. The user should determine the cutpoints when the cumulative distribution function has discrete elements since CHIGF cannot determine them in this case. Regardless of how the cutpoints are determined, the lower endpoint of the first interval is specified by RNGE(1) when RNGE(1) ≠ RNGE(2) and is given as minus machine infinity otherwise. The upper endpoint of the last interval is defined similarly.
Routine CHIGF tallies the observations in X as follows. If the cutpoints are determined by CHIGF, then the cumulative probability at xi, F(xi), is computed via function CDF. The tally for xi is made in interval number ⌊mF (x) + 1⌋, where m = |NCAT| and ⌊ ⌋ is the function that takes the greatest integer that is no larger than the argument of the function. If the cutpoints are specified by the user, the tally is made in the interval to which xi belongs using the endpoints specified by the user. Thus, if the computer time required to calculate the cumulative distribution function is large, user-specified cutpoints may be preferred in order to reduce the total computing time.
If the expected count in any cell is less than 1, then a rule of thumb is that the chi-squared approximation may be suspect. A warning message to this effect is issued in this case, as well as when an expected value is less than 5.
The user must supply a function CDF with calling sequence CDF(Y), which returns the value of the cumulative distribution function at any point Y in the range of the distribution. The supplied function must be declared in an EXTERNAL statement in the calling program. Many of the IMSL cumulative distribution functions in Chapter 17, Probability Distribution Functions and Inverses, can be used for CDF, either directly, if the calling sequence is correct, or indirectly, if, for example, the sample means and standard deviations are to be used in computing the theoretical CDF.
Informational errors
Type Code
4 4 There are more observations deleted from a cell than added.
4 5 All observations are missing.
3 6 An expected value is less than 1.
3 7 An expected value is less than 5.
4 8 The function CDF is not a cumulative distribution function.
4 9 The probability of the range of the distribution is not positive.
4 10 An error has occurred when inverting the cumulative distribution function. This function must be continuous and defined over the whole real line. If all else fails, you must specify the cutpoints (i.e., NCAT must be positive).
In this example, a discrete binomial random sample of size
1000 with binomial parameter
p = 0.3 and binomial
sample size 5 is generated via routine RNBIN
(see Chapter 18, Random Number Generation;). routine RNSET
is first used to set the seed. One call to CHIGF
is made. Routine BINDF
(see Chapter 17, Probability Distribution Funtions and
Inverses;) is used to compute the CDF.
USE IMSL_LIBRARIES
IMPLICIT NONE
INTEGER ISEED, NCAT, NDFEST, NELM
PARAMETER (ISEED=123457, NCAT=6, NDFEST=0, NELM=1000)
!
INTEGER I, IX(NELM), NOUT
REAL CDF, CHISQ(NCAT+1), COUNTS(NCAT), CUTP(NCAT-1), DF, &
EXPECT(NCAT), P, RNGE(2), X(NELM)
EXTERNAL CDF
!
DATA RNGE/0.0, 0.0/
DATA CUTP/.5, 1.5, 2.5, 3.5, 4.5/
!
CALL RNSET (ISEED)
! Generate the data
CALL RNBIN (5, 0.3, IX)
DO 10 I=1, NELM
X(I) = IX(I)
10 CONTINUE
!
CALL CHIGF (CDF, NELM, X, NCAT, RNGE, NDFEST, CUTP, P, &
COUNTS=COUNTS, EXPECT=EXPECT, CHISQ=CHISQ, DF=DF)
! Print results
CALL WRRRN ('Counts', COUNTS, 1, NCAT, 1)
CALL WRRRN ('Expect', EXPECT, 1, NCAT, 1)
CALL WRRRN ('Contributions to Chi-squared', CHISQ, 1, NCAT, 1)
CALL UMACH (2, NOUT)
WRITE (NOUT,99999) CHISQ(NCAT+1), P, DF
99999 FORMAT (///'0Chi-squared ', F8.4, /, ' P-value ' &
, F8.4, /, ' Degrees of freedom', F8.4)
END
!
REAL FUNCTION CDF (Y)
REAL Y
!
INTEGER I
REAL BINDF
EXTERNAL BINDF
!
I = Y
CDF = BINDF(I,5,0.3)
RETURN
END
*** WARNING ERROR 7 from CHIGF. An expected value is less
than
5.
Counts
1
2 3
4 5
6
170.0 331.0 320.0
148.0 28.0
3.0
Expect
1
2 3
4 5
6
168.1 360.2 308.7
132.3 28.3
2.4
Contributions to
Chi-squared
1
2 3
4 5
6
0.022 2.359 0.414 1.863
0.004
0.134
Chi-squared
4.7963
P-value
0.4412
Degrees of freedom 5.0000
This example illustrates the use of CHIGF
on a randomly generated sample from the normal distribution. One thousand
randomly generated observations are tallied into 10 equiprobable intervals.
Twelve calls to CHIGF
are made. The first call is solely for initialization since IDO
= 1 and NROW
= 0. The next 10 calls tally the data, 100 observations at a time, with IDO
= 2 and
NROW
= 100. The last call is for wrap up only since IDO
= 3 and NROW
= 0. All twelve calls could have been replaced with one call to CHIGF
with IDO
= 0 and NROW
= 1000. X
would need to be of length 1000 if one call were used. In this example, the null
hypothesis is not rejected.
USE IMSL_LIBRARIES
IMPLICIT NONE
INTEGER ISEED, NCAT, NDFEST
PARAMETER (ISEED=123457, NCAT=-10, NDFEST=0)
!
INTEGER I, IDO, NOUT, NELM
REAL CHISQ(-NCAT+1), COUNTS(-NCAT), CUTP(-NCAT-1), &
DF, EXPECT(-NCAT), P, RNGE(2), X(100)
!
DATA RNGE/0.0, 0.0/
!
CALL RNSET (ISEED)
! Initialization
IDO = 1
NELM = 0
CALL CHIGF (S_ANORDF, NELM, X, NCAT, RNGE, NDFEST, CUTP, P,&
IDO=IDO, COUNTS=COUNTS, EXPECT=EXPECT,&
CHISQ=CHISQ, DF=DF)
! Add the data
IDO = 2
NELM = 100
DO 10 I=1, 10
CALL RNNOR (X)
CALL CHIGF (S_ANORDF, NELM, X, NCAT, RNGE, NDFEST, CUTP, P, &
IDO=IDO, COUNTS=COUNTS, EXPECT=EXPECT, &
CHISQ=CHISQ, DF=DF)
10 CONTINUE
! Wrap up
IDO = 3
NELM = 0
CALL CHIGF (S_ANORDF, NELM, X, NCAT, RNGE, NDFEST, CUTP, &
P, IDO=IDO, COUNTS=COUNTS, EXPECT=EXPECT, &
CHISQ=CHISQ, DF=DF)
! Print results
CALL WRRRN ('Cutpoints', CUTP, 1, -NCAT, 1)
CALL WRRRN ('Counts', COUNTS, 1, -NCAT, 1)
CALL WRRRN ('Expect', EXPECT, 1, -NCAT, 1)
CALL WRRRN ('Contributions to Chi-squared', CHISQ, 1, -NCAT, 1)
CALL UMACH (2, NOUT)
WRITE (NOUT,99999) CHISQ(-NCAT+1), P, DF
99999 FORMAT (///'0Chi-squared ', F8.4, /, ' P-value ' &
, F8.4, /, ' Degrees of freedom', F8.4)
END
Cutpoints
1
2 3
4 5
6 7
8 9
-1.282 -0.842
-0.524 -0.253 0.000 0.253
0.524 0.842
1.282
Counts
1 2
3 4
5 6
7 8
9 10
106.0 109.0
89.0 92.0 83.0
87.0 110.0 104.0 121.0
99.0
Expect
1
2 3
4 5
6
7 8
9 10
100.0 100.0
100.0 100.0 100.0 100.0
100.0 100.0 100.0
100.0
Contributions to Chi-squared
1
2 3
4 5
6 7
8 9
10
0.360 0.810 1.210 0.640
2.890 1.690 1.000 0.160
4.410
0.010
Chi-squared
13.1806
P-value
0.1546
Degrees of freedom 9.0000
PHONE: 713.784.3131 FAX:713.781.9260 |