Monday, December 18, 2017

Classification by deep neural network using tf.estimator of TensorFlow

Overview

On the article below, I checked how to write deep neural network by tf.estimator. But it was regression case.


tf.estimator of TensorFlow lets us concisely write deep neural network

On this article, I'll re-write the simple deep neural network model to iris data by tf.estimator. From official page, TensorFlow's high-level machine learning API (tf.estimator) makes it easy to configure, train, and evaluate a variety of machine learning models. By comparing with the original code, I'll check how much it becomes concise and how to use tf.estimator.
Here, just in case, I’ll check the classification case. This is totally same as the official page’s tutorial and actually, the difference between regression and classification about the aspect of code is quite few. But classification and regression are one of the most basic tasks on machine learning and data science. So I’ll do it by myself.



Data

Same as the regression case, I’ll use iris data set which has 4 features and one class label.

import numpy as np
import tensorflow as tf
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler

# set random number
seed = 42
tf.set_random_seed(seed)
np.random.seed(seed)

# data
iris = datasets.load_iris()
x = iris['data']
y = iris['target']

x_train, x_test, y_train, y_test = train_test_split(x, y, train_size=0.7)

# normalization
mms = MinMaxScaler()
x_train = mms.fit_transform(x_train)
x_test = mms.transform(x_test)

To check the data, let’s print the part of those.

print(x_train[:10])
print(y_train[:10])
[[ 0.35294118  0.18181818  0.46428571  0.375     ]
 [ 0.58823529  0.36363636  0.71428571  0.58333333]
 [ 0.61764706  0.5         0.78571429  0.70833333]
 [ 0.67647059  0.45454545  0.58928571  0.54166667]
 [ 0.85294118  0.72727273  0.89285714  1.        ]
 [ 0.41176471  0.40909091  0.55357143  0.5       ]
 [ 0.97058824  0.45454545  0.98214286  0.83333333]
 [ 0.38235294  0.45454545  0.60714286  0.58333333]
 [ 0.23529412  0.68181818  0.05357143  0.04166667]
 [ 1.          0.36363636  1.          0.79166667]]
[1 2 2 1 2 1 2 1 0 2]

The target label’s integer expresses the class. This data has three types of classes.

Write model


By tf.estimator, we can write the model of classification almost same way as regression. Only difference is that on the case of regression, I used tf.estimator.DNNRegressor() and here, tf.estimator.DNNClassifier().

# Specify that all features have real-value data
feature_columns = [tf.feature_column.numeric_column("x", shape=[4])]

# Hidden layers
classifier = tf.estimator.DNNClassifier(feature_columns=feature_columns,
                                        hidden_units=[6, 4],
                                        n_classes=3,
                                        model_dir="/tmp/iris_model")

# Define the training inputs
train_input_fn = tf.estimator.inputs.numpy_input_fn(
    x={"x": x_train},
    y=y_train,
    num_epochs=None,
    shuffle=True)

classifier.train(input_fn=train_input_fn, steps=50)
INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_master': '', '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_model_dir': '/tmp/iris_model', '_save_checkpoints_secs': 600, '_task_id': 0, '_save_checkpoints_steps': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x11f338ac8>, '_task_type': 'worker', '_save_summary_steps': 100, '_num_ps_replicas': 0, '_service': None, '_tf_random_seed': None, '_log_step_count_steps': 100, '_is_chief': True, '_num_worker_replicas': 1, '_session_config': None}
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Saving checkpoints for 1 into /tmp/iris_model/model.ckpt.
INFO:tensorflow:step = 1, loss = 139.555
INFO:tensorflow:Saving checkpoints for 50 into /tmp/iris_model/model.ckpt.
INFO:tensorflow:Loss for final step: 72.8592.
Out[21]:
<tensorflow.python.estimator.canned.dnn.DNNClassifier at 0x11edb8438>

Evaluation


Same as regression case, we can just set test data and evaluate the model by evaluate() method.

# Define the test inputs
test_input_fn = tf.estimator.inputs.numpy_input_fn(
    x={"x": x_test},
    y=y_test,
    num_epochs=1,
    shuffle=False)

# Evaluate by test data.
eva = classifier.evaluate(input_fn=test_input_fn)
print(eva)
INFO:tensorflow:Starting evaluation at 2017-12-17-23:45:00
INFO:tensorflow:Restoring parameters from /tmp/iris_model/model.ckpt-50
INFO:tensorflow:Finished evaluation at 2017-12-17-23:45:00
INFO:tensorflow:Saving dict for global step 50: accuracy = 0.733333, average_loss = 0.537562, global_step = 50, loss = 24.1903
{'loss': 24.19031, 'accuracy': 0.73333335, 'average_loss': 0.53756243, 'global_step': 50}

Different from the regression case, we can see the accuracy on the classification case.

Predict


We can predict on just same way as evaluation.

# Define the test inputs
predict_input_fn = tf.estimator.inputs.numpy_input_fn(
    x={"x": x_test},
    y=y_test,
    num_epochs=1,
    shuffle=False)

predictions = list(classifier.predict(input_fn=predict_input_fn))
predicted_classes = [p["classes"] for p in predictions]
print(predicted_classes)