Performs nonparametric hazard rate estimation using kernel functions and quasi-likelihoods.
X — NOBS by m matrix containing the raw data, where m = 1 if ICEN = 0, and m = 2 otherwise. (Input)
IRT — Column number in X of the event times. (Input)
KMIN — Minimum
number for parameter k. (Input)
Parameter k is the
number of nearest neighbors to be used in computing the k-th nearest
neighbor distance.
INK — Increment between successive values of parameter k. (Input)
NK — Number of
values of k to be considered. (Input)
HAZRD finds the
optimal value of k over the grid given by: KMIN + (j − 1)
* INK, for
j
= 1, …,
NK.
ST — Vector of
length NOBS
containing the times of occurrence of the events, sorted from smallest to
largest. (Output)
Vector ST is obtained from
the matrix X and
should be used as input to routine HAZST.
JCEN — Vector of
length NOBS
containing the sorted censor codes. (Output)
Censor codes are
sorted corresponding to the events ST(i), with
censored observations preceding tied failures. Vector JCEN is obtained from
the censor codes in X, if present, and is
used as input to routine HAZST.
ALPHA — Optimal estimate for the parameter α. (Output)
BTA — Optimal estimate for the parameter β. (Output)
K — Optimal estimate for the parameter k. (Output)
VML — Optimum value of the criterion function. (Output)
H — Vector of
length NOBS
* 5 containing
constants needed to compute the k-th nearest failure distances, and the
observation weights. (Output)
H is used as input to
routine HAZST.
NOBS — Number of
observations. (Input)
Default: NOBS = size (X,1).
LDX — Leading
dimension of X
exactly as specified in the dimension statement in the calling
program. (Input)
Default: LDX = size (X,1).
ICEN — Censoring
option. (Input)
If ICEN = 0, then all of
the data is treated as exact data with no censoring. For ICEN > 0, column
ICEN of X contains the
censoring codes. A censoring code of 0 means an exact event (failure). A
censoring code of 1 means that the observation was right censored at the event
time.
Default: ICEN = 0.
IWTO — Weight
option. (Input)
If IWTO = 1, then weight
ln(1 + 1/(NOBS −
i + 1)) is used for the i-th smallest observation. Otherwise,
weight 1/(NOBS −
i + 1) is used.
Default: IWTO = 0.
NGRID — Grid
option. (Input)
If NGRID = 0, a default
grid is used to locate an initial starting value for parameter BTA. For NGRID > 0, a
user-defined grid is used. This grid is defined as
BSTART + (j −
1) * GINC, for j =
1, …, NGRID, where BSTART, GINC, and NGRID are
input.
Default: NGRID = 0.
BSTART — First
value to be used in the user-defined grid. (Input)
Not used if
NGRID = 0.
GINC — For a
user-defined grid, the increment between successive grid values of BTA.
(Input)
Not used if NGRID = 0.
IPRINT — Printing
option. (Input)
If IPRINT = 1, the grid
estimates and the optimized estimates are printed for each value of k.
Otherwise, no printing is performed.
Default: IPRINT = 0.
ISORT — Sorting
option. (Input)
If ISORT = 1, then the
event times are not automatically sorted by HAZRD. Otherwise,
sorting is performed with exact failure times following tied right-censored
times.
Default: ISORT = 0.
NMISS — Number of missing (NaN, not a number) values in X. (Output)
Generic: CALL HAZRD (X, IRT, KMIN, INK, NK, ST, JCEN, ALPHA, BTA, K, VML, H [,…])
Specific: The specific interface names are S_HAZRD and D_HAZRD.
Single: CALL HAZRD (NOBS, X, LDX, IRT, ICEN, IWTO, NGRID, BSTART, GINC, KMIN, INK, NK, IPRINT, ISORT, ST, JCEN, ALPHA, BTA, K, VML, H, NMISS)
Double: The double precision name is DHAZRD.
Routine HAZRD is an implementation of the methods discussed by Tanner and Wong (1984) for estimating the hazard rate in survival or reliability data with right censoring. It uses the biweight kernel,
and a modified likelihood to obtain data-based estimates of
the smoothing parameters α, β, and k
needed in the estimation of the hazard rate. For kernel K(x),
define the “smoothed” kernel
Ks(x −
x(j)) as
follows:
where djk is the distance to the k-th nearest failure from x(j), and x(j) is the j-th ordered observation (from smallest to largest). For given α and β, the hazard at point x is then
where N = NOBS,
δi is the i-th
observation’s censor code (1 = censored, 0 = failed), and
wi is the
i-th
ordered observation’s weight, which may be chosen as either 1/(N −
i + 1), or
ln(1 + 1/(N − i + 1)). After the smoothing
parameters have been obtained, the hazard may be estimated via HAZST.
Let
The likelihood is given by
,
where Π denotes product. Since the likelihood leads to degenerate estimates, Tanner and Wong (1984) suggest the use of a modified likelihood. The modification consists of deleting observation xi in the calculation of h(xi) and H(xi) when the likelihood term for xi is computed using the usual optimization techniques. α and β for given k can then be estimated.
Estimates for α and β are computed as follows: for given β, a closed form solution is available for α. The problem is thus reduced to the estimation of β. A grid search for β is first performed. Experience indicates that if the initial estimate of β from this grid search is greater than, say, e6, then the modified likelihood is degenerate because the hazard rate does not change with time. In this situation, β should be taken to be infinite, and an estimate of α corresponding to infinite β should be directly computed. When the estimate of β from the grid search is less than e6, a secant algorithm is used to optimize the modified likelihood. The secant algorithm iteration stops when the change in β from one iteration to the next is less than 10−5. Alternatively, the iterations may cease when the value of β becomes greater than e6, at which point an infinite β with a degenerate likelihood is assumed.
To find the optimum value of the likelihood with respect to k, a user-specified grid of k-values is used. For each grid value, the modified likelihood is optimized with respect to α and β. That grid point, which leads to the smallest likelihood, is taken to be the optimal k.
1. Informational Errors
Type Code
4 18 All observations are missing (NaN, not a number) values.
2.
In the optimization routines, the parameterization is changed to β* and α*, where
β* = −ln(β) and α* = −ln(α). The default grid uses −8, −4,
−3, −2.5, −2, −1.5, −1, −0.5, and 0.5 for β*. This corresponds to a grid in β of 2981, 54.6,
20.08, 12.18, 7.39, 4.48, 2.72, 1.64, and .61. The grid β that maximizes
the modified “likelihood” is used as the starting point for the iterations.
3. If the initial estimate of β as determined from the grid or as given by the user is greater than 400 (actually e6), then infinite β is assumed, and an analytic estimate of α based upon infinite β is used. In the optimization, if it is determined that β must be greater than 1000, then an infinite β is assumed. Infinite β corresponds to a “flat” hazard rate.
1. The routine HAZST may be used to estimate the hazard on a grid of points once the optimal values for α, β and k have been found. The user should also consider using the “easy-to-use” version of HAZRD, routine HAZEZ.
2. If sorting of the data is performed by HAZRD, then the sorted array will be such that all censored observations at a given time precede all failures at that time. To specify an arbitrary pattern of censored/failed observations at a given time point, the ISORT = 1 option must be used. In this case, it is assumed that the times have already been sorted from smallest to largest.
3. The smallest value of k must be greater than the largest number of tied failures since djk must be positive for all j. (Censored observations are not counted.) Similarly, the largest value of k must be less than the total number of failures. If the grid specified for k includes values outside the allowable range, then a warning error is issued; but k is still optimized over the allowable grid values.
4. The secant algorithm iterates on the transformed parameter β* = exp(− β). This assures a positive β, and it also seems to lead to a more desirable grid search. All results returned to the user are in the original parameterization, however.
5. Since local minimums have been observed in the modified likelihood, it is recommended that more than one grid of initial values for α and β be used.
The following example is taken from Tanner and Wong (1984). The data are from Stablein, Carter, and Novak (1981) and involve the survival times of individuals with nonresectable gastric carcinoma. Individuals treated with radiation and chemotherapy are used. For each value of k from 18 to 22 with increment of 2, the default grid search for β is performed. Using the optimal value of β in the grid, the optimal parameter estimates of α and β are computed for each value of k. The final solution is the parameter estimates for the value of k which optimizes the modified likelihood (VML). Because the IPRINT = 1 option is in effect, HAZRD prints all of the results in the output.
USE HAZRD_INT
USE UMACH_INT
USE WRRRN_INT
USE WRIRN_INT
IMPLICIT NONE
INTEGER ICEN, INK, IPRINT, IRT, ISORT, KMIN, LDX, &
NK, NOBS
PARAMETER (ICEN=2, INK=2, IPRINT=1, IRT=1, ISORT=1, &
KMIN=18, LDX=45, NK=3, NOBS=45)
!
INTEGER JCEN(NOBS), K, NMISS, NOUT
REAL ALPHA, BTA, H(5*NOBS), ST(NOBS), VML, X(NOBS,2)
!
DATA X/17, 42, 44, 48, 60, 72, 74, 95, 103, 108, 122, 144, 167, &
170, 183, 185, 193, 195, 197, 208, 234, 235, 254, 307, 315, &
401, 445, 464, 484, 528, 542, 567, 577, 580, 795, 855, 882, &
892, 1031, 1033, 1306, 1335, 1366, 1452, 1472, 36*0, 9*1/
!
CALL HAZRD (X, IRT, KMIN, INK, NK, ST, JCEN, ALPHA, BTA, &
K, VML, H, ICEN=ICEN, IPRINT=IPRINT, ISORT=ISORT, &
NMISS=NMISS)
!
CALL UMACH (2, NOUT)
WRITE (NOUT,99999) NMISS
99999 FORMAT (/' NMISS = ', I4/)
CALL WRRRN ('ST', ST, 1, NOBS, 1)
CALL WRIRN ('JCEN', JCEN, 1, NOBS, 1)
CALL WRRRN ('H', H, NOBS, 5, NOBS)
END
*** GRID SEARCH FOR K = 18 ***
ALPHA BETA VML
4.578322 2980.958008 -266.804504
4.543117 54.598148 -266.619690
4.336464 20.085537 -265.541168
4.019334 12.182494 -264.001404
3.542742 7.389056 -262.540100
2.990577 4.481689 -262.511810
2.351537 2.718282 -262.633911
1.584173 1.648721 -262.158264
0.966332 1.000000 -262.868408
*** OPTIMAL PARAMETER ESTIMATES ***
ALPHA BETA VML
1.695147 1.769263 -262.118530
*** GRID SEARCH FOR K = 20 ***
ALPHA BETA VML
4.053934 2980.958008 -266.525970
4.032835 54.598148 -266.401428
3.905046 20.085537 -265.648315
3.687815 12.182494 -264.401672
3.304344 7.389056 -262.665924
2.822716 4.481689 -262.080078
2.252759 2.718282 -262.445251
1.555777 1.648721 -261.772278
0.955586 1.000000 -262.617645
*** OPTIMAL PARAMETER ESTIMATES ***
ALPHA BETA VML
1.540533 1.631551 -261.771484
*** GRID SEARCH FOR K = 22 ***
ALPHA BETA VML
3.656405 2980.958008 -267.595337
3.641593 54.598148 -267.498596
3.550560 20.085537 -266.903870
3.388752 12.182494 -265.859131
3.071474 7.389056 -264.066040
2.645036 4.481689 -263.038696
2.137399 2.718282 -263.334717
1.512606 1.648721 -262.639740
0.936368 1.000000 -262.682739
*** OPTIMAL PARAMETER ESTIMATES ***
ALPHA BETA VML
1.342176 1.450016 -262.561188
*** THE FINAL SOLUTION (K = 20)
***
ALPHA BETA VML
1.540533 1.631551 -261.771484
NMISS = 0
ST
1 2 3 4 5 6 7 8
17.0 42.0 44.0 48.0 60.0 72.0 74.0 95.0
9 10
11 12
13 14
15 16
103.0 108.0 122.0 144.0 167.0 170.0 183.0 185.0
17 18
19 20
21 22
23 24
193.0 195.0 197.0 208.0 234.0 235.0 254.0 307.0
25 26
27 28
29 30
31 32
315.0 401.0 445.0 464.0 484.0 528.0 542.0 567.0
33 34
35 36
37 38
39 40
577.0 580.0 795.0 855.0 882.0 892.0 1031.0 1033.0
41 42
43 44
45
1306.0 1335.0 1366.0 1452.0 1472.0
JCEN
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
21 22
23 24 25 26 27 28 29 30 31
32 33 34 35 36 37 38 39 40
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1
41 42
43 44 45
1 1 1 1 1
H
1 2 3 4 5
1 217.0 218.0 1.0 21.0 1.0
2 192.0 193.0 1.0 21.0 0.5
3 190.0 191.0 1.0 21.0 0.3
4 186.0 187.0 1.0 21.0 0.2
5 174.0 175.0 1.0 21.0 0.2
6 162.0 163.0 1.0 21.0 0.2
7 160.0 161.0 1.0 21.0 0.1
8 139.0 140.0 1.0 21.0 0.1
9 131.0 132.0 1.0 21.0 0.1
10 126.0 127.0 1.0 21.0 0.1
11 112.0 113.0 1.0 21.0 0.1
12 102.0 110.0 2.0 22.0 0.1
13 123.0 125.0 3.0 23.0 0.1
14 126.0 128.0 3.0 23.0 0.1
15 132.0 135.0 5.0 25.0 0.1
16 130.0 137.0 5.0 25.0 0.1
17 133.0 145.0 5.0 25.0 0.1
18 135.0 147.0 5.0 25.0 0.1
19 137.0 149.0 5.0 25.0 0.1
20 148.0 160.0 5.0 25.0 0.1
21 167.0 174.0 6.0 26.0 0.0
22 166.0 175.0 6.0 26.0 0.0
23 182.0 191.0 6.0 26.0 0.0
24 204.0 212.0 9.0 29.0 0.0
25 212.0 213.0 9.0 29.0 0.0
26 231.0 234.0 14.0 34.0 0.0
27 275.0 278.0 14.0 34.0 0.0
28 294.0 297.0 14.0 34.0 0.0
29 311.0 314.0 15.0 35.0 0.0
30 343.0 345.0 16.0 36.0 0.0
31 357.0 359.0 16.0 38.0 0.0
32 382.0 384.0 16.0 38.0 0.0
33 392.0 394.0 16.0 38.0 0.0
34 395.0 397.0 16.0 38.0 0.0
35 610.0 612.0 16.0 43.0 0.0
36 670.0 672.0 16.0 45.0 0.0
37 689.0 697.0 17.0 45.0 0.0
38 699.0 707.0 17.0 45.0 0.0
39 838.0 846.0 17.0 45.0 0.0
40 840.0 848.0 17.0 45.0 0.0
41 1113.0 1121.0 17.0 45.0 0.0
42 1142.0 1150.0 17.0 45.0 0.0
43 1173.0 1181.0 17.0 45.0 0.0
44 1259.0 1267.0 17.0 45.0 0.0
45 1279.0 1287.0 17.0 45.0 0.0
PHONE: 713.784.3131 FAX:713.781.9260 |