HTW Berlin  Angewandte Informatik  Advanced Topics  Exercise  Multiclass Logistic Regression (Softmax) with TensorFlow
Table of Contents
Introduction
In this exercise notebook you will implement a multiclass logistic regression model using TensorFlow. To do so, one would normally use TensorFlow's predefined functions for the softmax prediction, the crossentropy costs and an optimizer based on the gradient descent update algorithm.
Here you will not use any of them, but implement them yourself only using basic TensorFlow functions like tf.matmul
, tf.transpose
, etc. An exception is the tf.gradients
function, which returns the gradient of a function with respect to a variable / list of variables. This gradient can then be used to define the update algorithm.
Besides consolidating your theoretical knowledge about gradient descent, knowing how to use the TensorFlow's autograd feature can be very useful when you want to do anything which can be calculated with a gradient but is not covered with the standard builtins, e.g. define your own cost and update function.
In order to detect errors in your own code, execute the notebook cells containing assert
or assert_almost_equal
. In this notebook, however, these cells will only detect a small portion of possible errors, e.g. your implemented function returning a wrong shape.
Requirements
Knowledge
To complete this exercise notebook, you should possess knowledge about the following topics.
 Logistic regression
 Softmax function
 Crossentropy
 Gradient descent
 Basic TensorFlow dataflow (see below)
The following material can help you to acquire this knowledge:
 Softmax, crossentropy, gradient descent:
 Chapter 5 and 6 of the Deep Learning Book
 Chapter 5 of the book Pattern Recognition and Machine Learning by Christopher M. Bishop [BIS07]
 Logistic Regression (binary):
 Video 15.3 and following in the playlist Machine Learning
 TensorFlow:
 TensorFlow dataflow
 TensorFlow gradient computation
Python Modules
# External Modules
import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf
from numpy.testing import assert_almost_equal
if int(tf.__version__.split('.')[0]) != 2:
raise Warning('ATTENTION: This notebook was designed with tensorflow version 1.xx in mind.\n\n Suggested tensorflow methods during the exercises might NOT function as described. We suggest installing the latest 1.xx version of tensorflow in a seperate environment to solve this exercise.')
%matplotlib inline
tf.reset_default_graph()
sess = tf.InteractiveSession()
Exercise  Multiclass Logistic Regression (Softmax) with TensorFlow
Training Data
Given$ m $ examples in our training data$ \mathcal D = \{(\vec x^{(1)}, y^{(1)}),(\vec x^{(2)},y^{(2)}), \dots (\vec x^{(m)},y^{(m)})\} $, with$ \vec x^{(1)} $ denoting the first feature vector and$ y^{(1)} $ the corresponding class.
We will create our own training data by drawing samples from different gaussian distributions, which our model should be capable of generalizing. To make things concrete we will be using:
 two features$ \vec x = (x_1, x_2)^T $
 three classes:$ y \in \{ 0, 1, 2\} $
 100 examples for each class
# class 0:
# covariance matrix and mean
cov0 = np.array([[5,4],[4,4]])
mean0 = np.array([2.,3])
# number of data points
m0 = 100
# class 1
# covariance matrix and mean
cov1 = np.array([[5,3],[3,3]])
mean1 = np.array([0.5,0.5])
m1 = 100
# class 2
# covariance matrix mean
cov2 = np.array([[2,0],[0,2]])
mean2 = np.array([8.,5])
m2 = 100
# generate m0 gaussian distributed data points with
# mean0 and cov0.
r0 = np.random.multivariate_normal(mean0, cov0, m0)
r1 = np.random.multivariate_normal(mean1, cov1, m1)
r2 = np.random.multivariate_normal(mean2, cov2, m2)
def plot_data(r0, r1, r2):
plt.figure(figsize=(7.,7.))
plt.scatter(r0[...,0], r0[...,1], c='r', marker='o', label="Klasse 0")
plt.scatter(r1[...,0], r1[...,1], c='y', marker='o', label="Klasse 1")
plt.scatter(r2[...,0], r2[...,1], c='b', marker='o', label="Klasse 2")
plt.xlabel("$x_0$")
plt.ylabel("$x_1$")
# Let's visualize our training data
plot_data(r0, r1, r2)
X = np.concatenate((r0, r1, r2), axis=0)
X.shape
y = np.concatenate((np.zeros(m0), np.ones(m1), 2 * np.ones(m2)))
y.shape
# shuffle the data
assert X.shape[0] == y.shape[0]
perm = np.random.permutation(np.arange(X.shape[0]))
#print(perm)
X = X[perm]
y = y[perm]
Implement the Model
Since we have concrete classes and not contiunous values, we have to implement logistic regression (opposed to linear regression). logistic regression implies the use of the logistic function. But as the number of classes exceeds two, we have to use the generalized form, the softmax function.
Task:
Implement softmax regression. This can be split into three subtasks: 1. Implement the softmax function for prediction. 2. Implement the computation of the crossentropy loss. 3. Implement vanilla gradient descent.
Softmax
Task 1:
Implement the softmax prediction$ h_i $, defined for each class$ i $ as:
$ h_i = \frac{\exp(z_i)}{\sum_{k=1}^c\exp (z_k)} $
with$ c $ denoting the class label and the net output$ z_i $ for that class, where the whole vector$ \vec z $ is defined as:
$ \vec z = W \vec{x} + \vec b $
Hint:
Remember that your functions should be able to handle multiple or even all$ \vec x $s.
Evaluating softmax should look like:
in> h.eval(feed_dict={x: X})
out> array([[1.62411915e08, 1.70372473e03, 9.98296261e01],
[3.72431863e08, 3.27572320e03, 9.96724248e01],
[9.83378708e01, 1.66097078e02, 1.15793373e05],
.....
### First we define Variables for the weigths W and bias b.
### From Docstring:
### "A variable maintains state in the graph across calls to run() ...
### ... constructor requires an initial value ..."
NUM_LABELS = 3
NUM_FEATURES = 2
D_TYPE = tf.float32
I_TYPE = tf.int32
W = tf.Variable(tf.random_uniform([NUM_FEATURES, NUM_LABELS], dtype=D_TYPE))
b = tf.Variable(tf.zeros([NUM_LABELS], dtype=D_TYPE))
### And placeholders for the training data.
### From Docstring:
### "This tensor will produce an error if evaluated. Its value must
### be fed using the `feed_dict` optional argument to `Session.run()`
### Using None in the first dimension allows to feed a variable number
x = tf.placeholder(shape=[None, NUM_FEATURES], dtype=D_TYPE, name="features")
t = tf.placeholder(shape=[None], dtype=I_TYPE, name="targets")
### Variables must be initialized by running an `init` Op after having
### launched the graph. We first have to add the `init` Op to the graph.
init_op= tf.global_variables_initializer()
sess.run(init_op)
### Implement this function
def net_output(x, W, b):
"""
Calculates the net output z = W * x + b.
:x: Predicitons.
:x type: 2DTensor of type float32 with
shape (n_examples, n_features).
:W: Weight matrix.
:W type: 2DTensor of type float32 with
shape (n_features, n_classes).
:b: Weight matrix.
:b type: DTensor of type float32 with
shape (n_classes).
:returns: The net output
:r type: 2DTensor of type float32
with shape (n_examples, n_classes).
"""
raise NotImplementedError()
### Implement this function
def softmax(z):
"""
Returns the normalized predictions z.
:z: Predicitons.
:z type: 2DTensor of type float32 with
shape (n_examples, n_classes).
:returns: softmax prediction.
:r type: Tensor with same type and shape as z.
"""
raise NotImplementedError()
z = net_output(x, W, b)
h = softmax(z)
some_predictions = h.eval(feed_dict={x: X[0:2]})
print(some_predictions)
assert_almost_equal(some_predictions[0].sum(), 1.0)
assert_almost_equal(some_predictions[1].sum(), 1.0)
CrossEntropy
Task 2:
Implement the computation of the crossentropy loss. Don't use any buildin function of TensorFlow for the crossentropy.
Reminder:
\begin{equation} \begin{split} H(p, q) & = \sum_{i=0}^c p_i(x) \cdot \log \frac{1}{q_i(x)} \\ & = \sum_{i=0}^c p_i(x) \cdot \log q_i(x) \\ \end{split} \end{equation}
with
 the number of classes c
 the correct class distribution$ p(x) $
 and the predictions of our net$ q(x) $ (softmax output)
Hint:
Return the crossentropy average: $ J(W,b) = \frac{1}{m} \sum_{j=1}^m H\left(p(\vec x^{(j)}),q(\vec x^{(j)})\right) $
### Implement this function
def cross_entropy(targets, predictions):
"""
Computes the crossentropy average.
:targets: True classes as scalars.
:targets type: tf.Tensor with the shape (n_classes).
:predictions: predictions as softmax output
:predicitons type: tf.Tensor with shape (n_examples, n_classes).
:returns: crossentropy average.
:r type: Tensor of type float32
"""
raise NotImplementedError()
# t is the tensorflow placeholder for the targets (class labels)
cost = cross_entropy(t, h)
some_cost = cost.eval(feed_dict={x: X, t: y})
print(some_cost)
assert some_cost.dtype == np.float32
Gradient Descent
Task 3:
Implement gradient descent and train the model:

Implement the gradient descent update rule. Don't use any TensorFlow buildin optimizer!
 Use
tf.gradients
for computing the gradient. tf.assign
for updating.
 Use
 Iteratively apply the update rule to minimize the loss.
 Train for 100 epochs
 Use minibatches with size 50
 Keep track of the costs after each epoch
 Decide about an appropriate learning rate
Reminder:
Equation for the update rule:
$ \begin{align} W' & = W  \alpha \cdot \frac{\partial}{\partial W} J(W, b)\\\\ b' & = b  \alpha \cdot \frac{\partial}{\partial b} J(W, b) \end{align} $
### Complete this cell
nb_epochs = 100
minibatch_size = 50
learning_rate = 1337 ### Decide about an appropriate learning rate
cost_per_epoch = []
### Your code ...
Plot
Cost (Loss) over Iterations
Plot of the cost progress vs. iterations.
The output should look similar to the following:
plt.plot(range(len(cost_per_epoch)), cost_per_epoch)
plt.xlabel('# of iterations')
plt.ylabel('cost')
plt.title('Learning Progress')
Decision Boundary After Training
The following function plots the data with the decision boundaries after the training. The model should be trained well enough to seperate most (roughly ~95%) of the data correctly. Use the following code for plotting.
The output should look similar to the following:
def plot_decision_boundary(iteration=None, x_min=10, x_max=14, y_min=10, y_max=10):
fig = plt.figure(figsize=(8,8))
plt.xlim(x_min, x_max)
plt.ylim(y_min, y_max)
delta = 0.1
a = np.arange(x_min, x_max+delta, delta)
b = np.arange(y_min, y_max+delta, delta)
A, B = np.meshgrid(a, b)
x_ = np.dstack((A, B)).reshape(1, 2)
out = h.eval(feed_dict={x: x_})
ns = list()
ns.append(3)
ns.extend(A.shape)
out = out.T.reshape(ns)
plt.pcolor(A, B, out[0], cmap="Blues", alpha=0.2)
plt.pcolor(A, B, out[1], cmap=('Oranges'), alpha=0.2)
plt.pcolor(A, B, out[2], cmap=('Greens'), alpha=0.2)
# lets visualize the data:
plt.scatter(X[:, 0], X[:, 1], c=y, s=40, cmap=plt.cm.Spectral)
plt.title("Decision boundaries in data space.")
plot_decision_boundary()
Decision Boundary Before Training
Now we reinitialize our model's variables to visualize how the decision boundaries might have been before the training. Since we initilize our weights with tf.random_uniform
this will look different for every execution.
sess.run(init_op)
plot_decision_boundary()
Literature
Licenses
Notebook License (CCBYSA 4.0)
The following license applies to the complete notebook, including code cells. It does however not apply to any referenced external media (e.g., images).
HTW Berlin  Angewandte Informatik  Advanced Topics  Exercise  Multiclass Logistic Regression (Softmax) with tensorflow
by Christian Herta, Klaus Strohmenger
is licensed under a Creative Commons AttributionShareAlike 4.0 International License.
Based on a work at https://gitlab.com/deep.TEACHING.
Code License (MIT)
The following license only applies to code cells of the notebook.
Copyright 2018 Christian Herta, Klaus Strohmenger
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.