|
Estimates missing values in a time series.
Required Arguments
ITIME_POINTS —
Vector of length NOBSW containing the
reference time points at which the time
series was observed. (Input)
The reference time points must be of data type
integer and in strictly increasing order. Reference time points for missing
values must lie in the open interval . It is assumed that the time series, after estimation of
missing values, contains values at equidistant time points where the distance
between two consecutive time points is one.
W — Vector of
length NOBSW
containing the time series values. (Input)
The values must be ordered in
accordance with the values in vector ITIME_POINTS. It is
assumed that the time series, after estimation of missing values, contains
values at equidistant time points where the distance between two consecutive
time points is one. If the non-missing time series values are observed at time
points , then missing values
between
and
,
, exist if
. The size of the gap between
and
is then
. The total length of the time series with non-missing and
estimated missing values is
,
or ITIME_POINTS(NOBSW)- ITIME_POINTS(1)+1.
For example, a time series with six observations with the fourth observation missing would appear as follows:
ITIME_POINTS |
W |
t1 = 1 |
.0019 |
t 2 = 2 |
.0018 |
t 3 = 3 |
.0019 |
|
|
t 4 = 5 |
.0021 |
t 5 = 6 |
.0022 |
t 6 = 7 |
.002 |
In this example, NOBSW = 6 and the
length of the time series, Z, including missing
values is .
Z — Allocatable array which on return will contain the time series values together with estimates for the missing values. (Output)
NOBSW — Number of
observations in time series W. (Input)
Default:
NOBSW =
size(W).
IMETH — Method
used for missing value estimation. (Input)
Default: IMETH = 3.
IMETH |
Method |
0 |
Median. |
1 |
Cubic Spline Interpolation. |
2 |
AR(1) model. |
3 |
AR(p) model. |
If
IMETH = 2 and the first gap
begins at or
, the corresponding time series values are estimated using
method Median. If IMETH = 3 is chosen and the
first gap starts at
, then the values of this
gap are also estimated by method Median. If the length of the series before a
gap, denoted len, is greater than 1 and less than 2
MAXLAG, then MAXLAG is reduced to
len/2 for the estimate of the missing values within this gap.
MAXLAG — Maximum
number of autoregressive parameters, p, in the AR(p) model when
IMETH = 3 is
chosen. (Input)
See AUTO_UNI_AR
for further information.
Default: MAXLAG = 10.
MAXBC — Maximum
length of backcasting in the least-squares algorithm when IMETH = 2 is
chosen. (Input)
MAXBC must be greater
than or equal to zero. See NSLSE
for further information.
Default: MAXBC = 0.
TOLBC — Tolerance
level used to determine convergence of the backcast algorithm used when IMETH = 2 is
chosen. (Input)
Backcasting terminates when the absolute value
of a backcast is less than TOLBC. Typically,
TOLBC is set to
a fraction of wstdev where wstdev is an estimate of the standard
deviation of the time series. See NSLSE
for further information.
Default: TOLBC = 0.01 * wstdev.
TOLSS — Tolerance
level used to determine convergence of the nonlinear least-squares algorithm
when IMETH = 2
is chosen. (Input)
TOLSS represents the
minimum relative decrease in the sum of squares between two iterations required
to determine convergence. Hence, TOLSS must be greater
than zero and less than one. See NSLSE
for further information.
The default value is: max{10−10, EPS2∕3} for single precision
and
max{10−20,
EPS2∕3} for double precision,
where EPS = AMACH(4) for single precision and EPS = DMACH(4) for double precision. See the documentation for routine AMACH in the Reference Material;.
RELERR— Stopping
criterion for use in the nonlinear equation solver when
IMETH = 2 is
chosen. (Input)
See NSLSE
for further information.
Default: RELERR= 100.0 * AMACH(4)
for single
precision,
RELERR= 100.0
* DMACH(4)
for double precision.
See the documentation for routine AMACH
in the Reference Material.
MAXIT — Maximum
number of iterations allowed in the nonlinear equation solver when
IMETH = 2 is chosen.
(Input)
See NSLSE
for further information.
Default: MAXIT= 200.
WMEAN — Estimate
of the mean of time series W. (Input)
Default:
WMEAN = 0.0.
NOBSZ — Number of
elements in the time series Z, with estimated
missing values,
NOBSZ =
ITIME_POINTS(NOBSW) – ITIME_POINTS(1) + 1. (Output)
ITIME_POINTS_FULL — Vector of length NOBSZ containing the reference time points of the time series Z. (Output)
MISS_INDEX — Vector of length NOBSZ-NOBSW containing the indices for the missing values in vector ITIME_POINTS_FULL. (Output)
Generic: CALL ESTIMATE_MISSING (ITIME_POINTS, W, Z [,…])
Specific: The specific interface names are S_ESTIMATE_MISSING and D_ESTIMATE_MISSING.
Traditional time series analysis, as described by Box, Jenkins and Reinsel (1994), requires the observations be made at equidistant time points with no missing values. When observations are missing, the problem of determining suitable estimates occurs. Routine ESTIMATE_MISSING offers four methods for estimating missing values from an equidistant time series.
The Median method, IMETH = 0, estimates the missing observations in a gap by the median of the last four time series values before and the first four values after the gap. If enough values are not available before or after the gap then the number is reduced accordingly. This method is very fast and simple, but its use is limited to stationary ergodic series without outliers and level shifts.
The Cubic Spline Interpolation method, IMETH = 1, uses a cubic spline interpolation method to estimate missing values. Here the interpolation is again done over the last four time series values before and the first four values after the gap. The missing values are estimated by the resulting interpolant. This method gives smooth transitions across missing values.
The AR(1) method, IMETH
= 2, assumes that the time series before the gap can be well described by an
AR(1) process. If the last observation prior to the gap is made at time point
then it uses the time
series values at
to compute the
one-step-ahead forecast at origin
. This value is taken as an estimate for the missing value at
time point
. If the value at
is also missing then the values at time points
are used to recompute the AR(1) model, estimate the value
at
and so on. The
coefficient
in the AR(1) model is
computed internally by the method of least squares from routine NSLSE.
The AR(p) method, IMETH
= 3 ,uses an AR(p) model to estimate missing values by a one-step-ahead
forecast . First, routine AUTO_UNI_AR
, applied to the time series prior to the missing values, is used to determine
the optimum p from the set {0, 1, …, MAXLAG}
of possible values and to compute the parameters of the resulting AR(p) model. The parameters
are estimated by the method of moments. Denoting the mean of the series
by μ the
one-step-ahead forecast at origin
,
, can be computed by
the formula
This value is used as an estimate for the missing value. The procedure, starting with AUTO_UNI_AR, is then repeated for every further missing value in the gap.
All four estimation methods treat gaps of missing values in increasing time order.
Consider the AR(1) process
We assume that is a Gaussian white noise process,
. Then,
and
(see Anderson, p. 174).
The time series in this example was artificially generated
from an AR(1) process characterized by and
. This process is stationary
with
. An initial value,
was used. The sequence
was generated by a random
number generator.
From the original series, we remove the observations at time points t =130, t =140, t =141, t =160, t =175, and t =176. Then, ESTIMATE_MISSING is used to compute estimates for the missing values by all 4 estimation methods available. The estimated values are compared with the actual values.
use estimate_missing_int
!
implicit none
integer :: i, j, k, nout
integer, dimension(200) :: times_1, times_2
integer :: n_obs, n_miss
integer :: ntimes, miss_ind
integer, dimension(:), allocatable :: times, missing_index
real, dimension(200) :: x_1, x_2
real, dimension(:), allocatable :: result
real, dimension(200) :: y = (/ 1.30540,-1.37166,1.47905, &
-0.91059,1.36191,-2.16966,3.11254, -1.99536,2.29740, &
-1.82474,-0.25445,0.33519,-0.25480,-0.50574,-0.21429,&
-0.45932,-0.63813,0.25646,-0.46243,-0.44104,0.42733, &
0.61102,-0.82417,1.48537,-1.57733,-0.09846,0.46311, &
0.49156,-1.66090,2.02808,-1.45768,1.36115,-0.65973, &
1.13332,-0.86285,1.23848,-0.57301,-0.28210,0.20195, &
0.06981,0.28454,0.19745, -0.16490,-1.05019,0.78652, &
-0.40447,0.71514,-0.90003,1.83604, -2.51205,1.00526, &
-1.01683,1.70691,-1.86564,1.84912,-1.33120, 2.35105, &
-0.45579,-0.57773,-0.55226,0.88371,0.23138,0.59984, &
0.31971,0.59849,0.41873,-0.46955,0.53003,-1.17203, &
1.52937,-0.48017,-0.93830,1.00651,-1.41493,-0.42188,&
-0.67010,0.58079, -0.96193,0.22763,-0.92214,1.35697, &
-1.47008,2.47841,-1.50522, 0.41650,-0.21669,-0.90297,&
0.00274,-1.04863,0.66192,-0.39143, 0.40779,-0.68174,&
-0.04700,-0.84469,0.30735,-0.68412,0.25888, -1.08642,&
0.52928,0.72168,-0.18199,-0.09499,0.67610,0.14636, &
0.46846,-0.13989,0.50856,-0.22268,0.92756,0.73069, &
0.78998,-1.01650,1.25637,-2.36179,1.99616,-1.54326, &
1.38220,0.19674,-0.85241,0.40463,0.39523,-0.60721, &
0.25041,-1.24967,0.26727,1.40042,-0.66963,1.26049, &
-0.92074,0.05909,-0.61926,1.41550, 0.25537,-0.13240, &
-0.07543,0.10413,1.42445,-1.37379,0.44382, -1.57210, &
2.04702,-2.22450,1.27698,0.01073,-0.88459,0.88194, &
-0.25019,0.70224,-0.41855,0.93850,0.36007,-0.46043, &
0.18645,0.06337,0.29414,-0.20054,0.83078,-1.62530, &
2.64925,-1.25355,1.59094,-1.00684,1.03196,-1.58045, &
2.04295,-2.38264,1.65095,-0.33273,-1.29092,0.14020, &
-0.11434,0.04392,0.05293,-0.42277,0.59143,-0.03347, &
-0.58457,0.87030,0.19985,-0.73500,0.73640, 0.29531, &
0.22325,-0.60035,1.42253,-1.11278,1.30468,-0.41923, &
-0.38019,0.50937,0.23051,0.46496,0.02459,-0.68478, &
0.25821,1.17655,-2.26629,1.41173,-0.68331 /)
integer, dimension(200) :: tpoints=(/1,2,3,4,5,6,7,8,9,10,&
11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,&
29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,&
47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,&
65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,&
83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,&
101,102,103,104,105,106,107,108,109,110,111,112,113,114,&
115,116,117,118,119,120,121,122,123,124,125,126,127,128,&
129,130,131,132,133,134,135,136,137,138,139,140,141,142,&
143,144,145,146,147,148,149,150,151,152,153,154,155,156,&
157,158,159,160,161,162,163,164,165,166,167,168,169,170,&
171,172,173,174,175,176,177,178,179,180,181,182,183,184,&
185,186,187,188,189,190,191,192,193,194,195,196,197,198,&
199,200 /)
n_miss = 0
times_1(1) = tpoints(1)
times_2(1) = tpoints(1)
x_1(1) = y(1)
x_2(1) = y(1)
k = 0
DO i=1,199
times_1(i+1) = tpoints(i+1)
x_1(i+1) = y(i+1)
! Generate series with missing values
IF ( i/=129 .AND. i/=139 .AND. i/=140 .AND. i/=159 &
.AND. i/=174 .AND. i/=175 ) THEN
k = k+1
times_2(k+1) = times_1(i+1)
x_2(k+1) = x_1(i+1)
END IF
END DO
!
n_obs = k + 1
ntimes = tpoints(200) - tpoints(1) + 1
n_miss = ntimes - n_obs
!
ALLOCATE(times(ntimes), missing_index(n_miss))
DO j=0,3
IF (j <= 2) THEN
CALL estimate_missing(times_2, x_2, result, NOBSW=n_obs, &
IMETH=j, ITIME_POINTS_FULL=times, &
MISS_INDEX=missing_index)
ELSE
CALL estimate_missing(times_2, x_2, result, NOBSW=n_obs, &
IMETH=j, ITIME_POINTS_FULL=times, &
MAXLAG=20, MISS_INDEX=missing_index)
END IF
CALL UMACH (2, nout)
IF (j == 0) WRITE (nout,*) "Method: Median"
IF (j == 1) WRITE (nout,*) "Method: Cubic Spline Interpolation"
IF (j == 2) WRITE (nout,*) "Method: AR(1) Forecast"
IF (j == 3) WRITE (nout,*) "Method: AR(p) Forecast"
WRITE(nout,99998)
DO i=0,n_miss-1
miss_ind = missing_index(i+1)
WRITE(nout,99999) times(miss_ind), x_1(miss_ind), &
result(miss_ind), &
ABS(x_1(miss_ind)-result(miss_ind))
END DO
WRITE(nout,*) ""
!
IF (ALLOCATED(result)) DEALLOCATE(result)
END DO
!
IF (ALLOCATED(times)) DEALLOCATE(times)
IF (ALLOCATED(missing_index)) DEALLOCATE(missing_index)
!
99998 FORMAT("time",6x,"actual",6x,"predicted",6x,"difference")
99999 FORMAT(I4,6x,F6.3,8x,F6.3,7x,F6.3)
END
Method: Median
time actual predicted difference
130 -0.921 0.261 1.182
140 0.444 0.057 0.386
141 -1.572 0.057 1.630
160 2.649 0.047 2.602
175 -0.423 0.048 0.471
176 0.591 0.048 0.543
Method: Cubic Spline Interpolation
time actual predicted difference
130 -0.921 1.541 2.462
140 0.444 -0.407 0.851
141 -1.572 2.497 4.069
160 2.649 -2.947 5.596
175 -0.423 0.251 0.673
176 0.591 0.380 0.211
Method: AR(1) Forecast
time actual predicted difference
130 -0.921 -0.916 0.005
140 0.444 1.019 0.575
141 -1.572 -0.714 0.858
160 2.649 1.228 1.421
175 -0.423 -0.010 0.413
176 0.591 0.037 0.555
Method: AR(p) Forecast
time actual predicted difference
130 -0.921 -0.901 0.020
140 0.444 1.024 0.580
141 -1.572 -0.706 0.867
160 2.649 1.233 1.417
175 -0.423 -0.002 0.421
176 0.591 0.039 0.553
PHONE: 713.784.3131 FAX:713.781.9260 |