Tuesday, February 20, 2018

Simple Bayesian Modeling on Edward by Hamiltonian Monte Carlo

Overview

On this article, I'll re-write by Edward the simple bayesian model that was written on Stan.
This time, my target is from the following article.
Actually, I already wrote the article, Simple regression by Edward: variational inference. There, I re-wrote the model on Edward. But at that time, I used variational bayesian method for inference.
Stan uses Hamiltonian Monte Carlo. So, this time, I'll use Hamiltonian Monte Carlo on Edward and re-write the model.



Data


I'll use the same data as the original Stan code’s one. On programming language R, you can write the following code. It writes out the data.

write.csv(cars, "cars.csv", row.names=FALSE)

On Python, you can read the data and check it.

import pandas as pd

data = pd.read_csv('cars.csv')
print(data.head())
   speed  dist
0      4     2
1      4    10
2      7     4
3      7    22
4      8    16

The data have two columns. Here, the purpose of the model is to predict ‘dist’ value by using ‘speed’ value as explaining value.
By plotting, we can check the behavior of the data.

import matplotlib.pyplot as plt
plt.scatter(data['speed'], data['dist'])
plt.show()
enter image description here

Edward Code

Basically, the code is almost same as the one of Simple regression by Edward: variational inference. But here, I used HMC() function for Hamiltonian Monte Carlo and for this, Empirical() was used.

import tensorflow as tf
import edward as ed
from edward.models import Normal, Empirical, Gamma
import numpy as np

# data
speed = np.array(data['speed']).reshape([data.shape[0], 1])
dist = np.array(data['dist'])

# placeholder
speed_ph = tf.placeholder(tf.float32, [None, 1])
dist_ph = tf.placeholder(tf.float32, [None, 1])

# set coefficients
a = Normal(loc=tf.zeros(1), scale=tf.ones(1))
b = Normal(loc=tf.zeros(1), scale=tf.ones(1))

# model
y_hat = ed.dot(speed_ph, b) + a
output = Normal(loc=y_hat, scale=tf.ones(1))

# inference
T = 1000
q_a = Empirical(params=tf.Variable(tf.random_normal([T, 1])))
q_b = Empirical(params=tf.Variable(tf.random_normal([T, 1])))

inference = ed.HMC({a: q_a, b: q_b}, {speed_ph: speed, output: dist})
inference.run(step_size=1e-3)

We can check sampled points by visualization.

plt.subplot(2,1,1)
plt.title('a')
plt.hist(q_a.sample(1000).eval())
plt.subplot(2,1,2)
plt.title('b')
plt.hist(q_b.sample(1000).eval())

enter image description here

When I did on Stan, the outcome was like the followings. Actually, those are more or less different.

enter image description here

By plotting the data and sampled points at once, let's check how it worked. Here, I used mean as representative values. But when you check the outcome with representative values, you should think that which representative method is appropriate.

n_samples = 100
prob_lst = []

for _ in range(n_samples):
    b_samp = q_b.sample()
    a_samp = q_a.sample()
    prob = ed.dot(speed.astype(np.float32), b_samp) + a_samp

    prob_lst.append(prob.eval())

for i,val in enumerate(prob_lst):
    if i == 0:
        added = val
    else:
        added += val

added_mean = added / len(prob_lst)

plt.scatter(speed, added_mean)
plt.scatter(speed, dist)
plt.show()

enter image description here