# Exercise - Multiclass Logistic Regression (Softmax) with pytorch

## Introduction

Teaching objectives of this notebook are:

• Implementing a logistic regression model using PyTorch
• Understanding how to use PyTorch's autograd feature by implementing gradient descent.

In order to detect errors in your own code, execute the notebook cells containing assert or assert_almost_equal.

## Requirements

### Knowledge

• Logistic regression
• Softmax
• Chapter 5 and 6 of the Deep Learning Book
• Chapter 5 of the book Pattern Recognition and Machine Learning by Christopher M. Bishop [BIS07]
• Video 15.3 and following in the playlist Machine Learning

### Python Modules

import numpy as np

import scipy.stats
from scipy.stats import norm

from matplotlib import pyplot as plt
from IPython.core.pylabtools import figsize

%matplotlib inline
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

torch.manual_seed(1)

## Trianing Data

#### Classification

• $m$-Training data $\mathcal D = \{(\vec x^{(1)}, y^{(1)}),(\vec x^{(2)},y^{(2)}), \dots (\vec x^{(m)},y^{(m)})\}$

here with

• two features $\vec x = (x_1, x_2)^T$
• three classes: $y \in \{ 0, 1, 2\}$
# class 0:
# covariance matrix and mean
cov0 = np.array([[5,-4],[-4,4]])
mean0 = np.array([2.,3])
# number of data points
m0 = 100

# class 1
# covariance matrix
cov1 = np.array([[5,-3],[-3,3]])
mean1 = np.array([0.5,0.5])
m1 = 100

# class 2
# covariance matrix
cov2 = np.array([[2,0],[0,2]])
mean2 = np.array([8.,-5])
m2 = 100

# generate m0 gaussian distributed data points with
# mean0 and cov0.
r0 = np.random.multivariate_normal(mean0, cov0, m0)
r1 = np.random.multivariate_normal(mean1, cov1, m1)
r2 = np.random.multivariate_normal(mean2, cov2, m2)

def plot_data(r0, r1, r2):
plt.figure(figsize=(7.,7.))
plt.scatter(r0[...,0], r0[...,1], c='r', marker='o', label="Klasse 0")
plt.scatter(r1[...,0], r1[...,1], c='y', marker='o', label="Klasse 1")
plt.scatter(r2[...,0], r2[...,1], c='b', marker='o', label="Klasse 2")
plt.xlabel("$x_0$")
plt.ylabel("$x_1$")
plot_data(r0, r1, r2)
X = np.concatenate((r0, r1, r2), axis=0)
y = np.concatenate((np.zeros(m0), np.ones(m1), 2 * np.ones(m2)))
y.shape
# shuffle the data
assert X.shape[0] == y.shape[0]
perm = np.random.permutation(np.arange(X.shape[0]))
X = X[perm]
y = y[perm]

### Exercise - Implement the Model

Since we have concrete classes and not contiunous values, we have to implement logistic regression (opposed to linear regression). logistic regression implies the use of the logistic function. But as the number of classes exceeds two, we have to use the generalized form, the softmax function.

Implement softmax regression. This can be split into three subtasks: 1. Implement the softmax function for prediction. 2. Implement the computation of the cross-entropy loss. 3. Implement vanilla gradient descent.

#### Softmax

Implement Softmax Regression as an nn.Module

Hint:

In our case, with two features, the input data has the shape:

    tensor([[-0.6617, -0.0426],
[-1.3328,  0.5161],
....


The forward method should return the probabilities for the three classes, e.g.

    tensor([[ 0.1757,  0.3948,  0.4295],
[ 0.0777,  0.3502,  0.5721],
....

class SoftmaxRegression(nn.Module):  # inheriting from nn.Module!

def __init__(self, num_labels, num_features):
raise NotImplementedError()

def forward(self, x):
# should return the probabilities for the classes, e.g.
# tensor([[ 0.1757,  0.3948,  0.4295],
#         [ 0.0777,  0.3502,  0.5721],
#          ....
raise NotImplementedError()

NUM_LABELS = 3
NUM_FEATURES = 2
model = SoftmaxRegression(NUM_LABELS, NUM_FEATURES)
# test if the probabilities for an example sum to 1:
data = torch.randn(4, 2) # 4 examples with two features each
pred = model(data)

# the probabilities for an example should sum to 1:
np.testing.assert_allclose(pred.detach().numpy().sum(axis=1), 1.)
print (pred)

#### Cross-Entropy

Implement the computation of the cross-entropy loss. Don't use any build-in function of PyTorch for the cross-entropy.

Reminder:

\begin{split} H(p, q) & = \sum_{i=0}^c p_i(x) \cdot \log \frac{1}{q_i(x)} \\ & = -\sum_{i=0}^c p_i(x) \cdot \log q_i(x) \\ \end{split}

with

• the number of classes c
• $p(x)$ the true probabilities for the classes
• Hint: We assume this is always 1.0 for the correct class and 0.0 for the other classes
• and the predictions of our net $q(x)$ (softmax output)

Hint:

Return the cross-entropy average: $J(\theta) = \frac{1}{m} \sum_{j=1}^m H\left(p(\vec x^{(j)}),q(\vec x^{(j)})\right)$

# method that returns the crossentropy computed with pytorch
def cross_entropy(predictions, targets):
# crossentropy average as pytorch tensor (scalar)
raise NotImplementedError()

targets = torch.tensor([0,2,1,0], dtype=torch.int64)
pred[np.arange(4), targets]

costs = cross_entropy(pred, t).item()
print(costs)

# costs shoudl be a float >= 0.0
assert costs >= 0.0

Train the model with gradient descent.

• Convert the data to torch tensors.
• Implement the gradient descent update rule.
• Apply iteratively the update rule to minimize the loss.

Reminder:

Equation for the update rule:

\begin{aligned} \theta_j' & = \theta_j - \alpha \cdot \frac{\partial}{\partial \theta_j} J(\theta)\\\\ \end{aligned}
def update_step(model, loss_function, x_, y_, lr):
raise NotImplementedError()
def gradient_descent(data, targets, loss_function, model, lr = 0.5, nb_epochs = 1000):
raise NotImplementedError()
nb_epochs = 1000
cost = gradient_descent(X_tensor, y_tensor, loss_function=cross_entropy, model=model, lr = 0.5, nb_epochs = nb_epochs)
# cost is a numpy array with the cost function value at each iteration.
# used below to print the progrss during learning

### Plot Cost (Loss ) Values over iterations

Plot the costs per epoch. Just execute the cells. The output should look similar to the following:

plt.plot(range(nb_epochs), cost)
plt.xlabel('# of iterations')
plt.ylabel('cost')
plt.title('Learning Progress')

### Plot Decision Boundary

Plot the data with the decisions. Just execute the cells. The output should look similar to the following:

def plot_decision_boundary():
fig = plt.figure(figsize=(8,8))

x_start = -5
x_end = 15
y_start = -10
y_end = 10
plt.xlim(x_start, x_end)
plt.ylim(y_start, y_end)

delta = 0.1
a = np.arange(x_start, x_end+delta, delta)
b = np.arange(y_start, y_end+delta, delta)
A, B = np.meshgrid(a, b)

x_ = np.dstack((A, B)).reshape(-1, 2)
x = torch.tensor(x_, dtype=torch.float32)
pred = model(x)
out = pred.detach().numpy()

ns = list()
ns.append(3)
ns.extend(A.shape)
out = out.T.reshape(ns)

plt.pcolor(A, B, out[0], cmap="Blues", alpha=0.2)
plt.pcolor(A, B, out[1], cmap=('Oranges'), alpha=0.2)
plt.pcolor(A, B, out[2], cmap=('Greens'), alpha=0.2)
#out.shape
# lets visualize the data:
plt.scatter(X[:, 0], X[:, 1], c=y, s=40, cmap=plt.cm.Spectral)

plt.title("Data and class predictions in data space.")
plot_decision_boundary()

## Literature

The following license applies to the complete notebook, including code cells. It does however not apply to any referenced external media (e.g., images).

HTW Berlin - Angewandte Informatik - Advanced Topics - Exercise - Multiclass Logistic Regression (Softmax) with tensorflow
by Christian Herta, Klaus Strohmenger