Tuesday, January 30, 2018

How to make the local level model with seasonal effect

Overview

On time-series analytics, we frequently need to think about a recurring pattern. In the context of time-series analytics, a recurring pattern is referred to as a seasonal effect.

For example, please see the image below. This image is the plotting of a time series data. As you can see, it has recurrent pattern.

enter image description here
On this article, I'll make the local level model with seasonal effects on Stan.



Data

I'll use an air passenger data, which is typical time series data. You can download this from the link below.
On this article, to make the point clean-cut, I don't download the data on the code.
Anyway, import the necessary libraries and load the data.

import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm
import pystan

data_loc = '/workspace/AirPassengers.csv'
data = pd.read_csv(data_loc)

print(data.head())
     Month  #Passengers
0  1949-01          112
1  1949-02          118
2  1949-03          132
3  1949-04          129
4  1949-05          121

This data is very simple. The first columns means time and the second one is the amount of the passengers. From this point, I'll use just #Passengers column.
By visualizing this, we can check the behavior of the data.

ts_data = data['#Passengers']
plt.plot(ts_data)
plt.show()

enter image description here

Roughly, we can see the recurrent pattern. To make it clear, let’s visualize the autocorrelation. About the visualization of the autocorrelation, please check the article, How to check autocorrelation on Python.

import statsmodels.api as sm

sm.graphics.tsa.plot_pacf(ts_data, lags=40)

enter image description here

We can see that the periodicity is more or less twelve days.

Stan Model


We can express local model with seasonal effects as followings.



On this data, I'll assign eleven on . The Stan code becomes the followings.

data {
    int N;
    int pred_N;
    vector[N] y;
}

parameters {
    vector[N] s;
    vector[N] u;
    real<lower=0> s_s;
    real<lower=0> s_u;
    real<lower=0> s_y;
}

model {
    s[12:N] ~ normal(-s[1:N-11]-s[2:N-10]-s[3:N-9]-s[4:N-8]-s[5:N-7]-s[6:N-6]-s[7:N-5]-s[8:N-4]-s[9:N-3]-s[10:N-2]-s[11:N-1], s_s);
    u[2:N] ~ normal(u[1:N-1], s_u);
    y ~ normal(u+s, s_y);
}

generated quantities {
    vector[N+pred_N] s_pred;
    vector[N+pred_N] u_pred;
    vector[N+pred_N] y_pred;

    s_pred[1:N] = s;
    u_pred[1:N] = u;
    y_pred[1:N] = y;

    for (t in (N+1):(N+pred_N)){
        s_pred[t] = normal_rng(-s_pred[t-11]-s_pred[t-10]-s_pred[t-9]-s_pred[t-8]-s_pred[t-7]-s_pred[t-6]-s_pred[t-5]-s_pred[t-4]-s_pred[t-3]-s_pred[t-2]-s_pred[t-1], s_s);
        u_pred[t] = normal_rng(u_pred[t-1], s_u);
        y_pred[t] = normal_rng(u_pred[t]+s_pred[t], s_y);
    }
}

Execute


On Python, we can execute sampling and check the outcome.

data_feed = {'N': data.shape[0], 'pred_N': 100, 'y': ts_data}
season_fit = pystan.stan('seasonal_term.stan', data=data_feed, iter=1000)

By visualizing the sampled points, we can see that the points fluctuate by responding season.

samples = season_fit.extract(permuted=True)
y_pred = samples['y_pred'].mean(axis=0)

plt.plot(list(range(ts_data.shape[0])), ts_data)
plt.plot(list(range(len(y_pred))), y_pred)
plt.show()

enter image description here

Of course, we can check the seasonal effects.

s_pred = samples['s_pred'].mean(axis=0)
plt.plot(list(range(0, s_pred.shape[0])), s_pred)
plt.show()

enter image description here