Chapter 8: Time Series Analysis and Forecasting

ESTIMATE_MISSING

TextonSpedometerHPClogo.gif



Estimates missing values in a time series.

Required Arguments

ITIME_POINTS — Vector of length NOBSW containing the reference time points  at which the time series was observed. (Input)
The reference time points must be of data type integer and in strictly increasing order. Reference time points for missing values must lie in the open interval . It is assumed that the time series, after estimation of missing values, contains values at equidistant time points where the distance between two consecutive time points is one.

W — Vector of length NOBSW containing the time series values. (Input)
The values must be ordered in accordance with the values in vector ITIME_POINTS. It is assumed that the time series, after estimation of missing values, contains values at equidistant time points where the distance between two consecutive time points is one. If the non-missing time series values are observed at time points , then missing values between  and , , exist if . The size of the gap between  and   is then . The total length of the time series with non-missing and estimated missing values is , or  ITIME_POINTS(NOBSW)- ITIME_POINTS(1)+1.

For example, a time series with six observations with the fourth observation missing would appear as follows:

 

ITIME_POINTS

  W

t1 = 1

.0019

t 2 = 2

.0018

t 3 = 3

.0019

 

 

t 4 = 5

.0021

t 5 = 6

.0022

t 6 = 7

.002

In this example, NOBSW = 6 and the length of the time series, Z, including missing values is .

Z — Allocatable array which on return will contain the time series values together with estimates for the missing values. (Output)

Optional Arguments

NOBSW — Number of observations in time series W. (Input)
Default: NOBSW = size(W).

IMETH — Method used for missing value estimation. (Input)
Default: IMETH = 3.

IMETH

Method

0

Median.

1

Cubic Spline Interpolation.

2

AR(1) model.

3

AR(p) model.

 

If IMETH = 2 and the first gap begins at  or , the corresponding time series values are estimated using method Median. If IMETH = 3 is chosen and the first gap starts at, then the values of this gap are also estimated by method Median. If the length of the series before a gap, denoted len, is greater than 1 and less than 2MAXLAG, then MAXLAG is reduced to len/2 for the estimate of the missing values within this gap.

MAXLAG — Maximum number of autoregressive parameters, p, in the AR(p) model when IMETH = 3 is chosen. (Input)
See AUTO_UNI_AR for further information.
Default: MAXLAG = 10.

MAXBC — Maximum length of backcasting in the least-squares algorithm when IMETH = 2 is chosen.   (Input)
MAXBC must be greater than or equal to zero.  See NSLSE for further information.
Default: MAXBC = 0.

TOLBC — Tolerance level used to determine convergence of the backcast algorithm used when IMETH = 2 is chosen.   (Input)
Backcasting terminates when the absolute value of a backcast is less than TOLBC. Typically, TOLBC is set to a fraction of wstdev where wstdev is an estimate of the standard deviation of the time series.  See NSLSE for further information.
Default: TOLBC = 0.01 * wstdev.

TOLSS — Tolerance level used to determine convergence of the nonlinear least-squares algorithm when IMETH = 2 is chosen.   (Input)
TOLSS represents the minimum relative decrease in the sum of squares between two iterations required to determine convergence. Hence, TOLSS must be greater than zero and less than one.  See NSLSE for further information.
The default value is:  max{1010, EPS23} for single precision and
                                   max{1020, EPS23} for double precision,

            where EPS = AMACH(4) for single precision and EPS = DMACH(4) for double precision. See the documentation for routine AMACH in the Reference Material;.

RELERR— Stopping criterion for use in the nonlinear equation solver when
IMETH = 2 is chosen.   (Input)
See NSLSE for further information.
Default: RELERR= 100.0 * AMACH(4)  for single precision,
              RELERR= 100.0 * DMACH(4)  for double precision.
See the documentation for routine AMACH in the Reference Material.

MAXIT — Maximum number of iterations allowed in the nonlinear equation solver when
IMETH = 2 is chosen. (Input)
See NSLSE for further information.
Default: MAXIT= 200.

WMEAN — Estimate of the mean of time series W. (Input)
Default: WMEAN = 0.0.

NOBSZ — Number of elements in the time series Z, with estimated missing values,
NOBSZ = ITIME_POINTS(NOBSW) ITIME_POINTS(1) + 1. (Output)

ITIME_POINTS_FULL — Vector of length NOBSZ containing the reference time points of the time series Z. (Output)

MISS_INDEX —  Vector of length NOBSZ-NOBSW containing the indices for the missing values in vector ITIME_POINTS_FULL. (Output)

FORTRAN 90 Interface

Generic:                              CALL ESTIMATE_MISSING (ITIME_POINTS, W, Z [,…])

Specific:                             The specific interface names are S_ESTIMATE_MISSING and                                 D_ESTIMATE_MISSING.

Description

Traditional time series analysis, as described by Box, Jenkins and Reinsel (1994), requires the observations be made at equidistant time points with no missing values. When observations are missing, the problem of determining suitable estimates occurs. Routine ESTIMATE_MISSING offers four methods for estimating missing values from an equidistant time series.

The Median method, IMETH = 0, estimates the missing observations in a gap by the median of the last four time series values before and the first four values after the gap. If enough values are not available before or after the gap then the number is reduced accordingly.  This method is very fast and simple, but its use is limited to stationary ergodic series without outliers and level shifts.

The Cubic Spline Interpolation method, IMETH = 1, uses a cubic spline interpolation method to estimate missing values. Here the interpolation is again done over the last four time series values before and the first four values after the gap. The missing values are estimated by the resulting interpolant. This method gives smooth transitions across missing values. 

The AR(1) method, IMETH = 2, assumes that the time series before the gap can be well described by an AR(1) process. If the last observation prior to the gap is made at time point  then it uses the time series values at  to compute the one-step-ahead forecast at origin . This value is taken as an estimate for the missing value at time point. If the value at  is also missing then the values at time points  are used to recompute the AR(1) model, estimate the value at   and so on. The coefficient  in the AR(1) model is computed internally by the method of least squares from routine NSLSE.

The AR(p) method, IMETH = 3 ,uses an AR(p) model to estimate missing values by a one-step-ahead forecast . First, routine AUTO_UNI_AR , applied to the time series prior to the missing values, is used to determine the optimum p from the set {0, 1, , MAXLAG} of possible values and to compute the parameters   of the resulting AR(p) model. The parameters are estimated by the method of moments. Denoting the mean of the series  by μ the one-step-ahead forecast at origin  , ,  can be computed by the formula

 

This value is used as an estimate for the missing value. The procedure, starting with AUTO_UNI_AR, is then repeated for every further missing value in the gap.

All four estimation methods treat gaps of missing values in increasing time order.

Example

Consider the AR(1) process

We assume that is a Gaussian white noise process, . Then,  and   (see Anderson, p. 174).

The time series in this example was artificially generated from an AR(1) process characterized by  and . This process is stationary with .  An initial value,  was used. The sequence was generated by a random number generator.

From the original series, we remove the observations at time points t =130, t =140, t =141, t =160, t =175, and t =176. Then, ESTIMATE_MISSING is used to compute estimates for the missing values by all 4 estimation methods available. The estimated values are compared with the actual values.

 

   use estimate_missing_int

!

   implicit none

   integer                            :: i, j, k, nout

   integer, dimension(200)            :: times_1, times_2

   integer                            :: n_obs, n_miss

   integer                            :: ntimes, miss_ind

   integer, dimension(:), allocatable :: times, missing_index

   real, dimension(200)               :: x_1, x_2

   real, dimension(:), allocatable    :: result

 

   real, dimension(200) :: y = (/ 1.30540,-1.37166,1.47905, &

       -0.91059,1.36191,-2.16966,3.11254, -1.99536,2.29740, &

       -1.82474,-0.25445,0.33519,-0.25480,-0.50574,-0.21429,&

       -0.45932,-0.63813,0.25646,-0.46243,-0.44104,0.42733, &

        0.61102,-0.82417,1.48537,-1.57733,-0.09846,0.46311, &

        0.49156,-1.66090,2.02808,-1.45768,1.36115,-0.65973, &

        1.13332,-0.86285,1.23848,-0.57301,-0.28210,0.20195, &

        0.06981,0.28454,0.19745, -0.16490,-1.05019,0.78652, &

       -0.40447,0.71514,-0.90003,1.83604, -2.51205,1.00526, &

       -1.01683,1.70691,-1.86564,1.84912,-1.33120, 2.35105, &

       -0.45579,-0.57773,-0.55226,0.88371,0.23138,0.59984,  &

        0.31971,0.59849,0.41873,-0.46955,0.53003,-1.17203,  &

        1.52937,-0.48017,-0.93830,1.00651,-1.41493,-0.42188,&

       -0.67010,0.58079, -0.96193,0.22763,-0.92214,1.35697, &

       -1.47008,2.47841,-1.50522, 0.41650,-0.21669,-0.90297,&

        0.00274,-1.04863,0.66192,-0.39143, 0.40779,-0.68174,&

       -0.04700,-0.84469,0.30735,-0.68412,0.25888, -1.08642,&

        0.52928,0.72168,-0.18199,-0.09499,0.67610,0.14636,  &

        0.46846,-0.13989,0.50856,-0.22268,0.92756,0.73069,  &

        0.78998,-1.01650,1.25637,-2.36179,1.99616,-1.54326, &

        1.38220,0.19674,-0.85241,0.40463,0.39523,-0.60721,  &

        0.25041,-1.24967,0.26727,1.40042,-0.66963,1.26049,  &

       -0.92074,0.05909,-0.61926,1.41550, 0.25537,-0.13240, &

       -0.07543,0.10413,1.42445,-1.37379,0.44382, -1.57210, &

        2.04702,-2.22450,1.27698,0.01073,-0.88459,0.88194,  &

       -0.25019,0.70224,-0.41855,0.93850,0.36007,-0.46043,  &

        0.18645,0.06337,0.29414,-0.20054,0.83078,-1.62530,  &

        2.64925,-1.25355,1.59094,-1.00684,1.03196,-1.58045, &

        2.04295,-2.38264,1.65095,-0.33273,-1.29092,0.14020, &

       -0.11434,0.04392,0.05293,-0.42277,0.59143,-0.03347,  &

       -0.58457,0.87030,0.19985,-0.73500,0.73640, 0.29531,  &

        0.22325,-0.60035,1.42253,-1.11278,1.30468,-0.41923, &

       -0.38019,0.50937,0.23051,0.46496,0.02459,-0.68478,   &

        0.25821,1.17655,-2.26629,1.41173,-0.68331 /)

    

   integer, dimension(200) :: tpoints=(/1,2,3,4,5,6,7,8,9,10,&

       11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,&

       29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,&

       47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,&

       65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,&

      83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,&

     101,102,103,104,105,106,107,108,109,110,111,112,113,114,&

     115,116,117,118,119,120,121,122,123,124,125,126,127,128,&

     129,130,131,132,133,134,135,136,137,138,139,140,141,142,&

     143,144,145,146,147,148,149,150,151,152,153,154,155,156,&

     157,158,159,160,161,162,163,164,165,166,167,168,169,170,&

     171,172,173,174,175,176,177,178,179,180,181,182,183,184,&

     185,186,187,188,189,190,191,192,193,194,195,196,197,198,&

     199,200 /)

  

   n_miss = 0

   times_1(1) = tpoints(1)

   times_2(1) = tpoints(1)

   x_1(1) = y(1)

   x_2(1) = y(1)

   k = 0

  

   DO i=1,199

      times_1(i+1) = tpoints(i+1)

      x_1(i+1) = y(i+1)

      !        Generate series with missing values

      IF ( i/=129 .AND. i/=139 .AND. i/=140 .AND. i/=159 &

              .AND. i/=174 .AND. i/=175 ) THEN

         k = k+1

         times_2(k+1) = times_1(i+1)

         x_2(k+1) = x_1(i+1)

      END IF

   END DO

   !

   n_obs = k + 1

   ntimes = tpoints(200) - tpoints(1) + 1

   n_miss = ntimes - n_obs

!

   ALLOCATE(times(ntimes), missing_index(n_miss))

  

   DO j=0,3

      IF (j <= 2) THEN

         CALL estimate_missing(times_2, x_2, result, NOBSW=n_obs, &

                               IMETH=j, ITIME_POINTS_FULL=times,  &

                               MISS_INDEX=missing_index)

      ELSE

         CALL estimate_missing(times_2, x_2, result, NOBSW=n_obs, &

                               IMETH=j, ITIME_POINTS_FULL=times,  &

                               MAXLAG=20, MISS_INDEX=missing_index)

      END IF

 

      CALL UMACH (2, nout)

      IF (j == 0) WRITE (nout,*) "Method: Median"

      IF (j == 1) WRITE (nout,*) "Method: Cubic Spline Interpolation"

      IF (j == 2) WRITE (nout,*) "Method: AR(1) Forecast"

      IF (j == 3) WRITE (nout,*) "Method: AR(p) Forecast"

 

      WRITE(nout,99998)

 

      DO i=0,n_miss-1

        miss_ind = missing_index(i+1)

        WRITE(nout,99999) times(miss_ind), x_1(miss_ind),      &

                  result(miss_ind),                            &

                  ABS(x_1(miss_ind)-result(miss_ind))

 

      END DO

      WRITE(nout,*) ""

!

      IF (ALLOCATED(result)) DEALLOCATE(result)

   END DO

!

   IF (ALLOCATED(times)) DEALLOCATE(times)

   IF (ALLOCATED(missing_index)) DEALLOCATE(missing_index)

!

99998  FORMAT("time",6x,"actual",6x,"predicted",6x,"difference")

99999  FORMAT(I4,6x,F6.3,8x,F6.3,7x,F6.3)

END

Output

 

Method: Median

time      actual      predicted      difference

 130      -0.921         0.261        1.182

 140       0.444         0.057        0.386

 141      -1.572         0.057        1.630

 160       2.649         0.047        2.602

 175      -0.423         0.048        0.471

 176       0.591         0.048        0.543

 

Method: Cubic Spline Interpolation

time      actual      predicted      difference

 130      -0.921         1.541        2.462

 140       0.444        -0.407        0.851

 141      -1.572         2.497        4.069

 160       2.649        -2.947        5.596

 175      -0.423         0.251        0.673

 176       0.591         0.380        0.211

 

Method: AR(1) Forecast

time      actual      predicted      difference

 130      -0.921        -0.916        0.005

 140       0.444         1.019        0.575

 141      -1.572        -0.714        0.858

 160       2.649         1.228        1.421

 175      -0.423        -0.010        0.413

 176       0.591         0.037        0.555

 

Method: AR(p) Forecast

time      actual      predicted      difference

 130      -0.921        -0.901        0.020

 140       0.444         1.024        0.580

 141      -1.572        -0.706        0.867

 160       2.649         1.233        1.417

 175      -0.423        -0.002        0.421

 176       0.591         0.039        0.553



http://www.vni.com/
PHONE: 713.784.3131
FAX:713.781.9260