# Exercise - Neural Network with PyTorch

## Introduction

In this exercise you will be presented a classification problem with two classes and two features. The classes are not linearly separable. First you will implement the logistic regression, which will yield a very bad decision boundary. Then you will extend your model with a hidden layer consisting of two hidden neurons only. By executing the plots you will see, that these two hidden neurons are already almost enough to find a decision boundary, that separates our data much better.

Finally, you will implement a neural network with multiple hidden layers to solve the problem without any missclassifications.

## Requirements

### Knowledge

You should have a basic knowledge of:

• Logistic regression
• Logistic function
• Tanh as activation function
• Cross-entropy loss
• numpy
• matplotlib

Suitable sources for acquiring this knowledge are:

### Python Modules

By deep.TEACHING convention, all python modules needed to run the notebook are loaded centrally at the beginning.

# External Modules
import numpy as np
import matplotlib.pyplot as plt

import torch
import torch.nn as nn
import torch.optim as optim

torch.manual_seed(1)

%matplotlib inline

## Data Generation

For convenience and visualization, we will only use two features in this notebook, so we are still able to plot them together with the target class and decision boundary

First we will create some artificial data:

-$m_1 = 10$ examples for class 0 -$m_2 = 15$ examples for class 1 -$n = 2$ features for each example

No exercise yet, just execute the cells.

m1 = 10
m2 = 15
m = m1 + m2
n = 2
X = np.ndarray((m,n))
X.shape
y = np.zeros((m))
y[m1:] = y[m1:] + 1.0
y
### Execute this to generate linearly sperable data
def x2_function_class_0(x):
return -x*2 + 2

def x2_function_class_1(x):
return -x*2 + 4
### Execute this to generate NOT linearly sperable data
def x2_function_class_0(x):
return np.sin(x)

def x2_function_class_1(x):
return np.sin(x) + 1
x1_min = -5
x1_max = +5

X[:m1,0] = np.linspace(x1_min, x1_max, m1)
X[m1:,0] = np.linspace(x1_min+0.5, x1_max-0.2, m2)
X[:m1,1] = x2_function_class_0(X[:m1,0])
X[m1:,1] = x2_function_class_1(X[m1:,0])
def plot_data():
plt.scatter(X[:m1,0], X[:m1,1], alpha=0.5, label='class 0 train data')
plt.scatter(X[m1:,0], X[m1:,1], alpha=0.5, label='class 1 train data')

plt.plot(x1_line, x2_line_class_0, alpha=0.2, label='class 0 true target func')
plt.plot(x1_line, x2_line_class_1, alpha=0.2, label='class 1 true target func')
plt.legend(loc=1)
x1_line = np.linspace(x1_min, x1_max, 100)
x2_line_class_0 = x2_function_class_0(x1_line)
x2_line_class_1 = x2_function_class_1(x1_line)

plot_data()

## Convert the Data to torch tensors

• Convert numpy arrays to tensors.
###############################
###############################
#
# Task: Convert numpy arrays to tensors
#

###############################
###############################
### If your implementation is correct, these tests should not throw and exception

print(X_tensor.shape) ### should be [25,2]
print(y_tensor.shape) ### should be [25,1]

assert X_tensor.shape[0] == 25
assert X_tensor.shape[1] == 2
assert y_tensor.shape[0] == 25
assert y_tensor.shape[1] == 1

## Logistic Regression

• Implement the class below or logistic regression.
• Use torch.nn.Linear and torch.nn.Sigmoid.
• Add both as class members.
• Data flow should be like: -$x \rightarrow linear \rightarrow sigmoid \rightarrow \hat y$

• mathematically: $\hat y = sigmoid(linear(\vec x))$

• with$\hat y$ the prediction of your model

The following picture visualizes the data flow:

class LogisticRegression(nn.Module):  # inheriting from nn.Module!
def __init__(self, num_labels, num_features):

super(LogisticRegression, self).__init__()

###############################
###############################

raise NotImplementedError()

###############################
###############################

def forward(self, x):

###############################
###############################

raise NotImplementedError()

###############################
###############################

NUM_LABELS = 1
NUM_FEATURES = 2
model = LogisticRegression(NUM_LABELS, NUM_FEATURES)
### Should output something like:
###
### LogisticRegression(
###  (linear): Linear(in_features=2, out_features=1, bias=True)
###  (sigmoid): Sigmoid()
### )
print(model)

• Use torch.nn.BCELoss as cost function
• Use any Optimizer from torch.optim

Hint:

• Print the costs every ~100 epochs for instant feedback
###############################
###############################
#
# Task: Create a new model and train with with built-in cost and optimizer

###############################
###############################
### With this function you can access your trained model parameters
model.state_dict()
model.state_dict().keys()
### depending on the name of your class members you might have to adjust the keys
weights = model.state_dict()['linear.weight'].detach().numpy()
bias = model.state_dict()['linear.bias'].detach().numpy()

### Plot Decision Boundary

With acces to our trained model parameters we can plot the decision boundary together with out data. Execeuting the cell below should result in a plot like the following:

As we should have already known, using plain logistic regression we cannot seperate our dataset very well.

### Plot the data and decision boundary, just execute this cell

x2_boundary = (-bias[0] -weights[0,0]*x1_line)/weights[0,1]
plt.plot(x1_line, x2_boundary, c='g', label='boundary')

plot_data()

Now we are going to add one hidden layer consisting of two neurons. For the hidden layer neurons use the activation function torch.nn.Tanh instead of torch.nn.Sigmoid.

• Implement the class HiddenLayerNN
• Use nn.ModuleList as class member to store the linear layers
• Use another of these list objects to store the activation functions (tanh and sigmoid)
• Also store the intermediate results in class members (linear_results, activation_results) when doing the calculations in the forward pass
• The data flow should be like the following: -$x \rightarrow linear \rightarrow tanh \rightarrow linear \rightarrow sigmoid \rightarrow \hat y$

• mathematically:$\hat y = sigmoid(linear(tanh(linear(\vec{x}))))$
• with$\hat y$ the prediction of your model

class HiddenLayerNN(nn.Module):  # inheriting from nn.Module!
def __init__(self, num_labels, num_features, num_hidden):

super(HiddenLayerNN, self).__init__()

self.linear_modules = nn.ModuleList()
self.activation_modules = nn.ModuleList()
self.linear_results = []
self.activation_results = []

###############################
###############################
#
# Task: add linear module and tanh and sigmoid functions to the ModuleLists

###############################
###############################

def forward(self, x):

self.linear_results = [] ### clear after every run
self.activation_results = [] ### clear after every run

###############################
###############################
#
# Task: iterate through both ModuleLists
#
#       save intermediate results in the python lists

###############################
###############################

return x_

NUM_LABELS = 1
NUM_FEATURES = 2
NUM_HIDDEN = 2
model = HiddenLayerNN(NUM_LABELS, NUM_FEATURES, NUM_HIDDEN)
### Should output something like:
###
### HiddenLayerNN(
###   (linear_modules): ModuleList(
###     (0): Linear(in_features=2, out_features=2, bias=True)
###     (1): Linear(in_features=2, out_features=1, bias=True)
###   )
###   (activation_modules): ModuleList(
###     (0): Tanh()
###     (1): Sigmoid()
###   )
### )
print(model)

• Use torch.nn.BCELoss as cost function
• Use any Optimizer from torch.optim

Sidenote:

• With only 1 hiddenlayer of 2 hidden neurons, we can be unlucky with bad weight initialization. If the plots some cells below after the training do not exactly look like the sample pictures, rerun your training several times.
###############################
###############################
#
# Task: Create a new model and train with built-in cost and optimizer

###############################
###############################

Now we are going to plot two things:

• 1st: The two neurons in the hidden layer representing the original data, but transformed into another 2D space. Though, for our data following two different$sin$ functions, these two hidden neurons are still not enough to transform our data to be linearly seperable

• 2nd: We can also plot the decision boundary in our original space.

If your implementation is correct and training did succeed, your plots could look similiar to the following:

### plot hidden transformation of X with learned w1s (transformation) and learned w2s (boundary)
###
### ATTENTION: ONLY WORKDS IF HIDDEN LAYER has 2 neurons only
###

def plot_last_hidden_layer_feature_space(model):
preds = model(X_tensor)
A1 = model.activation_results[-2].detach().numpy()
plt.scatter(A1[:m1,0], A1[:m1,1], alpha=0.5, label='class 0')
plt.scatter(A1[m1:,0], A1[m1:,1], alpha=0.5, label='class 1')

### Plot true target functions
data_tmp_tensor = torch.tensor(np.ndarray((len(x1_line), 2)), dtype=torch.float32)
data_tmp_tensor[:,0] = torch.from_numpy(x1_line)
data_tmp_tensor[:,1] = torch.from_numpy(x2_line_class_0)

preds = model(data_tmp_tensor)
A1 = model.activation_results[-2].detach().numpy()
plt.plot(A1[:,0], A1[:,1])

data_tmp_tensor[:,1] = torch.from_numpy(x2_line_class_1)
preds = model(data_tmp_tensor)
A1 = model.activation_results[-2].detach().numpy()
plt.plot(A1[:,0], A1[:,1])

### Plot boundary
preds = model(X_tensor)
keys = list(model.state_dict().keys())
weights = model.state_dict()[keys[-2]].detach().numpy()
bias = model.state_dict()[keys[-1]].detach().numpy()

x1_boundary_mlp = np.linspace(-1, +1, 10)
x2_boundary_mlp = (-bias[0] -weights[0,0]*x1_boundary_mlp)/weights[0,1]
plt.plot(x1_boundary_mlp, x2_boundary_mlp, c='g')
plt.legend(loc=2)
plt.title('Data and boundary in hidden space')

### Plot transformations
plot_last_hidden_layer_feature_space(model)
def plot_boundary_in_original_space(model):
grid_density = 100
x1 = np.linspace(X[:,0].min()-1,X[:,0].max()+1,grid_density)
x2 = np.linspace(X[:,1].min()-1,X[:,1].max()+1,grid_density)
mash = np.meshgrid(x1,x2)

data_tmp = np.ndarray((grid_density**2, n))
data_tmp[:,0] = mash[0].flatten()
data_tmp[:,1] = mash[1].flatten()
data_tmp_tensor = torch.tensor(data_tmp, dtype=torch.float32)

preds = model(data_tmp_tensor).detach().numpy()
print(preds.flatten())
c0 = data_tmp[preds.flatten() < 0.5]
c1 = data_tmp[preds.flatten() >= 0.5]
plt.scatter(c0[:,0],c0[:,1], alpha=1.0, marker='s', color="#aaccee")
plt.scatter(c1[:,0],c1[:,1], alpha=1.0, marker='s', color="#eeccaa")
plot_data()
plt.title('Data and boundary in original space')

### plot boundary in original space
plot_boundary_in_original_space(model)

### Adding more Layers and Parametrization

Now implement the class MultiHiddenLayerNN. This class accepts as parameter num_hidden, a list specifying how many neurons in each layer should be.

Instead of the parameter num_features the first entry in num_hidden specifys the number of our features (2).

Finalize with the Sigmoid function

Hint:

• Use a loop to iterate through the entries of num_hidden to add linear layer modules and tanh activation functions.
• Again, save intermediate results during your forward function in the according python lists.
• Note that any hidden layer may have a different number of neurons. BUT the last hidden layer must have exactly 2 neurons, when we want to visualize the last stage of feature space transformation.

class MultiHiddenLayerNN(nn.Module):  # inheriting from nn.Module!
def __init__(self, num_labels, num_hidden):
super(MultiHiddenLayerNN, self).__init__()

self.linear_modules = nn.ModuleList()
self.activation_modules = nn.ModuleList()
self.linear_results = []
self.activation_results = []

###############################
###############################
#
# Task: add linear modulte and tanh and sigmoid functions to the ModuleLists

###############################
###############################

def forward(self, x):

self.linear_results = [] ### clear after every run
self.activation_results = [] ### clear after every run

###############################
###############################
#
# Task: iterate through both ModuleLists
#
#       save intermediate results in the python lists

###############################
###############################

return x_
NUM_LABELS = 1

### Meaning: 2 features,
### 20 neurons in 1st hidden layer,
### 20 in a 2nd hidden layer,
### 2 in a 3rd hidden layer
NUM_HIDDEN = [2,20,20,2]
model = MultiHiddenLayerNN(NUM_LABELS, NUM_HIDDEN)

• Use torch.nn.BCELoss as cost function
• Use any Optimizer from torch.optim
###############################
###############################
#
# Task: Create a new model and train with built-in cost and optimizer

###############################
###############################

The plots now should show, that we achieved to transform our data into a feature space, where the data is linearly seperable:

Plots should look similar to:

### Plot transformations
plot_last_hidden_layer_feature_space(model)
### plot boundary in original space
plot_boundary_in_original_space(model)

### Freestyle Exercise

When you are finished you can try out a lot of different things and see how the hidden feature space and the final decision boundary change. Here are some ideas to try out:

• adjust the number of neurons in the hidden layer (more / less)
• add more / bigger layers
• different activation functions $tanh$,$logistic SINGLESINGLE relu$)
• you can try to use mean-squared-error as cost-function
• you can also try to change the true target functions and make it harder for the network to seperate the data !

Things you should note when trying different things:

• In order to plot the transformed hidden space, the last hidden layer may only consist of 2 neurons
• When using$relu$ as activation function, do not use it for a Layer with only 2 neurons. Chances are high, that wheigths are negative and you end up with 2 dead neurons.

## Summary and Outlook

In this notebook you learned how to set up a more complex and deeper neural network using the tools from previous exercies. You also saw how these more complex networks are able to perform non-linear classification due to feature transformation using (more) hidden layers and how to vizualise this process. All techniques learned here can be used for even more complex problem while using PyTorch as framework.

The following license applies to the complete notebook, including code cells. It does however not apply to any referenced external media (e.g., images).

Exercise - Neural Network with PyTorch
by Klaus Strohmenger