Tuesday, November 14, 2017

Simple Baysian Neural Network with Edward

Overview

Edward can enable us to convert TensorFlow code to Baysian one. I’m not used to Edward. So for the training, I’m tackling with converting some TensorFlow code to Edward one. On this article, I tried to convert simple neural network model to Baysian neural network one.

The purpose of this article is to convert the TensorFlow code I posted before to Baysian one by Edward.

Baysian neural network model

By Edward, we can relatively easily convert the model using TensorFlow to probabilistic one.
The regression model for iris data is from the article below.

Simple regression model by TensorFlow

Neural network is composed of input, hidden and output layers. And the number of hidden layers is optional. So the simplest network architecture has just one hidden layer. On this article, I'll make the simplest neural network for regression by TensorFlow.


In a nutshell, the model is to predict one target value from three features. About the details, please check the article.



Model part

The code below is the model part of the whole code from the article.

# parameters
W1 = tf.Variable(tf.random_normal(shape=[3, 6]))
b1 = tf.Variable(tf.random_normal(shape=[6]))
hidden = tf.nn.relu(tf.add(tf.matmul(x_data, W1), b1))

W2 = tf.Variable(tf.random_normal(shape=[6, 1]))
b2 = tf.Variable(tf.random_normal(shape=[1]))
output = tf.nn.relu(tf.add(tf.matmul(hidden, W2), b2))

Concretely, for probabilistic models, it is enough to convert this part to the followings.

import edward as ed
from edward.models import Normal

# parameters
W1 = Normal(loc=tf.zeros([3, 6]), scale=tf.ones([3, 6]))
B1 = Normal(loc=tf.zeros(6), scale=tf.ones(6))

W2 = Normal(loc=tf.zeros([6, 1]), scale=tf.ones([6, 1]))
B2 = Normal(loc=tf.zeros(1), scale=tf.ones(1))

output = Normal(loc=tf.nn.relu(tf.add(tf.matmul(tf.nn.relu(tf.add(tf.matmul(x_data, W1), B1)), W2), B2)), scale=tf.ones(1))

q_W1 = Normal(loc=tf.Variable(tf.zeros([3, 6])),
              scale=tf.nn.softplus(tf.Variable(tf.zeros([3, 6]))))
q_B1 = Normal(loc=tf.Variable(tf.zeros(6)),
              scale=tf.nn.softplus(tf.Variable(tf.zeros(6))))
q_W2 = Normal(loc=tf.Variable(tf.zeros([6, 1])),
              scale=tf.nn.softplus(tf.Variable(tf.zeros([6, 1]))))
q_B2 = Normal(loc=tf.Variable(tf.zeros(1)),
              scale=tf.nn.softplus(tf.Variable(tf.zeros(1))))

Inference

After the model part, it runs variational inference with the Kullback-Leibler divergence.

inference = ed.KLqp({W1: q_W1, W2: q_W2, B1: q_B1, B2: q_B2}, data={x_data: x_train, output: y_train.reshape([len(y_train), 1])})
inference.run(n_iter=10000)
10000/10000 [100%] ██████████████████████████████ Elapsed: 37s | Loss: 119.921

Criticism


We can evaluate the model.

y_post = ed.copy(output, {W1: q_W1.mean(),W2: q_W2.mean(),B1: q_B1.mean(),B2: q_B2.mean()})
print(ed.evaluate('mean_squared_error', data={x_data: x_train, output: y_train.reshape([len(y_train), 1])}))
0.562092

Points of uncertainty

I still have uncertain points about Edward.
One of those is… How can I efficiently predict by using the sampled points about weights?

I found the same question on stack over flow.

How to obtain prediction results in edward

Each month, over 50 million developers come to Stack Overflow to learn, share their knowledge, and build their careers. Join the world's largest developer community. Thank you for this community. I am a beginner and I have a very dumb question on using Edward. I am using a tutorial regression model.


Unfortunately, no one has answered.
Of course we can get the means of weights and by using those, numpy and so on, the model for prediction can be constructed. But isn’t there any more efficient way?

Anyway, I’ll continue to use this to learn about the better workflow.

Reference