Computes the biserial and point-biserial correlation coefficients for a dichotomous variable and a numerically measurable classification variable.
A — 3 by K matrix containing
the frequencies and the class marks of the measured classification
variable. (Input)
The first row of A contains frequencies
for the classification variable when the dichotomous variable takes on one of
its values, and the second row of A contains the
frequencies when the dichotomous variable takes on its other value. The third
row of A
contains the values (class marks) of the classification variable. The elements
of the first two rows of A must be
nonnegative.
STAT — Vector of length 11 containing various statistics. (Output)
I STAT(I)
1 Total count of the first value of the dichotomous variable (the sum of the first row of A)
2 Total count for the second value
3 Total count (sum of STAT(1) and STAT(2))
4 Mean of the measured variable
5 Mean of the measured variable in the first class of the dichotomy
6 Mean of the measured variable in the second class of the dichotomy
7 Standard deviation of the measured variable
8 Biserial correlation coefficient estimate
9 Standard deviation estimate for the biserial correlation coefficient estimate
10 Asymptotic significance level of the biserial correlation coefficient, that is, the probability of a more extreme value
11 Point-biserial correlation coefficient estimate
K — Number of
classes for the measured classification variable.
(Input)
Default: K = size (A,2).
LDA — Leading
dimension of A
exactly as specified in the dimension statement in the calling
program. (Input)
Default: LDA = size (A,1).
Generic: CALL BSPBS (A, STAT [,…])
Specific: The specific interface names are S_BSPBS and D_BSPBS.
Single: CALL BSPBS (K, A, LDA, STAT)
Double: The double precision name is DBSPBS.
Routine BSPBS computes the biserial and point-biserial correlation coefficient for a dichotomous variable and a numerically measurable (classification) variable. Input to BSPBS is a 3 × K array, A. The first two rows of A contain the frequencies for the dichotomous variable as measured at each level of the classification variable. The third row contains the values (class marks) to be used for the classification variable.
The biserial correlation coefficient should be used in situations where the dichotomous variable and the classification variable are assumed to come from a bivariate normal distribution. If this is not the case (i.e., if the bivariate normal assumption cannot be made), then the point-biserial correlation should be used (see Kendall and Stuart 1979, page 331).
Let a∙1 and a∙2 denote the total
count in rows one and two of A,
respectively, and let
n = a∙1+ a∙2. Let Φ denote the
cumulative normal distribution; let aij, i = 1, 2,
j = 1, …, K,
denote the counts in rows 1 and 2 of A,
and let xj denote the values in
row 3 of A.
The biserial correlation coefficient rb is computed as
follows:
Let
If the underlying distributions are normal with zero correlation, then z is asymptotically a standard normal deviate that may be used to test that the correlation is zero. The p-value for z is reported in STAT(10).
The point-biserial correlation coefficient is computed as
The example is taken from Kendall and Stuart (1979, page 327). The data involve the classification of criminals as alcoholic (first row) or nonalcoholic for each level of a crimetype classification. The severity of the crime decreases with increasing column number. In the example, the column number is used for the column score. The biserial correlation of −0.17 indicates that more criminals responsible for the most serious crimes tend to be alcoholic.
USE IMSL_LIBRARIES
IMPLICIT NONE
INTEGER K, LDA
PARAMETER (K=6, LDA=3)
!
REAL A(LDA,K), STAT(11)
CHARACTER CLABEL(2)*10, RLABEL(11)*10
!
DATA A/50, 43, 1, 88, 62, 2, 155, 110, 3, 379, 300, 4, &
18, 14, 5, 63, 144, 6/
DATA RLABEL/'Count-1', 'Count-2', 'Count', 'Mean(X)', &
'Mean(X-1)', 'Mean(X-2)', 'S-X', 'r-b', 'std(r-b)', &
'p-value', 'r-p'/
DATA CLABEL/'Statistic', ' '/
!
CALL WRRRN('A', A)
!
CALL BSPBS (A, STAT)
!
CALL WRRRL (' ', STAT, RLABEL, CLABEL, FMT='(W12.8)')
END
A
1 2
3
4 5
6
1 50.0 88.0
155.0 379.0 18.0
63.0
2 43.0 62.0
110.0 300.0 14.0
144.0
3 1.0
2.0 3.0
4.0 5.0 6.0
Statistic
Count-1
753.00
Count-2
673.00
Count
1426.00
Mean(X)
3.72
Mean(X-1)
3.55
Mean(X-2)
3.91
S-X
1.31
r-b
-0.17
std(r-b)
0.03
p-value
0.00
r-p
-0.14
PHONE: 713.784.3131 FAX:713.781.9260 |