# ML-Fundamentals - Simple Neural Network

## Introduction

In this exercise you will be presented a classification problem with two classes and two features. The classes are not linearly seperable. First you will implement the logistic regression, which will yield a very bad decision boundary. Then you will extend your model with a hidden layer consisting of two hidden neurons only. By executing the plots you will see, that these two hidden neurons are just enough to find a decision boundary, that seperates our data much better.

## Requirements

### Knowledge

You should have a basic knowledge of:

• Logistic regression
• Logistic function
• Tanh as activation function
• Relu as activation function
• Mean squared error
• Cross-entropy loss
• Gradient descent
• Backpropagation
• numpy
• matplotlib

Suitable sources for acquiring this knowledge are:

### Python Modules

By deep.TEACHING convention, all python modules needed to run the notebook are loaded centrally at the beginning.

# External Modules
import numpy as np
import matplotlib.pyplot as plt

%matplotlib inline

## Exercises

For convenience and visualization, we will only use two features in this notebook, so we are still able to plot them together with the target class and decision boundary

### Data Generation

First we will create some artificial data:

• $m_1$ examples for class 0
• $m_2$ examples for class 1
• $n$ features for each example

No exercise yet, just execute the cells.

m1 = 10
m2 = 15
m = m1 + m2
n = 2
X = np.ndarray((m,n))
X.shape
y = np.zeros((m))
y[m1:] = y[m1:] + 1.0
y
### Execute this to generate linearly sperable data
def x2_function_class_0(x):
return -x*2 + 2

def x2_function_class_1(x):
return -x*2 + 4
### Execute this to generate NOT linearly sperable data
def x2_function_class_0(x):
return np.sin(x)

def x2_function_class_1(x):
return np.sin(x) + 1
x1_min = -5
x1_max = +5

X[:m1,0] = np.linspace(x1_min, x1_max, m1)
X[m1:,0] = np.linspace(x1_min+0.5, x1_max-0.2, m2)
X[:m1,1] = x2_function_class_0(X[:m1,0])
X[m1:,1] = x2_function_class_1(X[m1:,0])
def plot_data():
plt.scatter(X[:m1,0], X[:m1,1], alpha=0.5, label='class 0 train data')
plt.scatter(X[m1:,0], X[m1:,1], alpha=0.5, label='class 1 train data')

plt.plot(x1_line, x2_line_class_0, alpha=0.2, label='class 0 true target func')
plt.plot(x1_line, x2_line_class_1, alpha=0.2, label='class 1 true target func')
plt.legend()
x1_line = np.linspace(x1_min, x1_max, 100)
x2_line_class_0 = x2_function_class_0(x1_line)
x2_line_class_1 = x2_function_class_1(x1_line)

plot_data()

### Activation and Cost Functions

In order to implement the logistic function and a neural net with a hidden layer, the least we need is:

• An activation function like tanh
• A cost function like cross-entropy
• The sigmoid (or logistic function)

Task:

Implement at least the following functions and their derivatives:

• Tanh $\frac{e^x - e^-x}{e^x + e^-x}$
• Tanh derivative $1 - x^2$
• Logistic $\frac{1}{1 + e^-x}$
• Logistic derivative $x \cdot (1-x)$
• Cross-entropy $\frac{1}{m}\sum_i^m -y^i \cdot log(\hat y^i) - (1-y^i) \cdot log(1-\hat y^i)$
• Cross-entropy derivative $\hat y - y$

Optionally (to play around with) also implement:

• ReLu $max(0,x)$
• Relu derivative $0 \text{ if x <= 0 else 1}$
• Mean squared error $-\frac{1}{m}\sum_i^m (y^i - \hat y^i)^2$
• Mean squared error $2 \cdot (\hat y - y)$

If you implementations are correct, the plot of the activation functions and the derivatives (by executing the last cell of this section), should look like the following:

def logistic(x, deriv=False):
if deriv:
raise NotImplementedError()
raise NotImplementedError()

def tanh(x, deriv=False):
if deriv:
raise NotImplementedError()
raise NotImplementedError()

def relu(x, deriv=False):
if deriv:
raise NotImplementedError()
raise NotImplementedError()
def cross_entropy(y_preds, y, deriv=False):
if deriv:
raise NotImplementedError()
raise NotImplementedError()

def mean_squared_error(y_preds, y, deriv=False):
if deriv:
raise NotImplementedError()
raise NotImplementedError()
### Just execute to print your implementation

plt.figure(figsize=(16,4))
x_tmp = np.linspace(-5,5,100)

ax = plt.subplot(1,3,1)
ax.plot(x_tmp, logistic(x_tmp), label='logistic function')
ax.plot(x_tmp, logistic(logistic(x_tmp), True), label='logistic function derivative')
ax.legend()

ax = plt.subplot(1,3,2)
ax.plot(x_tmp, tanh(x_tmp), label='tanh')
ax.plot(x_tmp, tanh(tanh(x_tmp), True), label='tanh dervative')
ax.legend()

ax = plt.subplot(1,3,3)
ax.plot(x_tmp, relu(x_tmp), label='relu')
ax.plot(x_tmp, relu(x_tmp, True), label='relu derivative')
ax.legend()

### Logstic Regression

Implement the iterative gradient_descent function with:

• Forward pass: $\hat y = logistic( \vec x w + b )$
• Print the cost (you can try cross-entropy and mean squared error)
• Gradient descent update rule:

$w_{i_{new}} \leftarrow w_{i_{old}} - \alpha * \frac{\partial cost (logistic( \vec x w + b ),y)}{\partial \text{} w_{i_{old}}}$

If your implementation is correct, running the training cell and plot cell below should result in either one of the following plots (depending what data generation process you have chosen):

def gradient_descent(x, y, ws, b, lrate, epochs):

for i in range(epochs):

# forward

# calculate and print costs

# backward (calculation of partial derivatives for ws and b)

# update ws, b

pass

# return new ws, b and prediction of last iteration
raise NotImplementedError()
### TRAINING HERE, just execute

ws = np.array([1.,2.])
b = 0.
ws, b, y_pred = gradient_descent(X, y, ws, b, lrate=0.01, epochs=1000)
y_pred[y_pred > 0.5] = 1.
y_pred[y_pred < 0.5] = 0.
print(y)
print(y_pred)
### Plot the data and decision boundary, just execute this cell

x2_boundary = (-b -ws[0]*x1_line)/ws[1]
plt.plot(x1_line, x2_boundary, c='g', label='boundary')

plot_data()

### Adding Hidden Layer

Now we are going to add a hidden layer consisting of two neurons. For the hidden layer neurons use the activation function $tanh$ instead of $logistic$.

Task:

• Implement the function for the forward_pass. It should return:

• Z1s (the result for $\vec x \cdot w_{11} + b_1$ and $\vec x \cdot w_{12} + b_1$)
• A1s (the resutl of passing Z1 into activation function)
• Z2s (the result of $\vec {A1} \cdot w_{21} + b_2$ and $\vec {A1} \cdot w_{22} + b_2$)
• A1 (also y, the resutl of passing Z2s into $logistic$ function)
• And then backprop
def forward_pass(x, ws, bs, act_fs):
raise NotImplementedError()
def backprop(x, y, ws, bs, act_fs, cost_f, lrate, epochs):
for i in range(epochs):

### forward
Z1, A1, Z2, A2 = forward_pass(x, ws, bs, act_fs)

### cost

### backward

### updates

return ws, bs, A1, A2
### Training, just execute this cell

ws = [
np.random.randn(2, 2)*0.1,
np.random.randn(2, 1)*0.1
]
bs = np.full((len(ws),1),0.)
act_fs = [tanh, logistic] ### possible: tanh / logistic / relu
cost_f = cross_entropy ### possible: mean_squared_error / cross_entropy

y = y.reshape((len(X),1))

ws, bs, Z1, A1, Z2, A2 = backprop(X, y, ws, bs, act_fs, cost_f, 0.1, 5000)

### Applying a threshold to our predictions
A2[A2<0.5] = 0.
A2[A2>0.5] = 1.
### Then we can compare our predictions witht the true labels
print('a', A2.flatten())
print(y.flatten())

Now we are going to plot two things:

• 1st: The two neurons in the hidden layer represent the original data, but transformed into another 2D space. This is more likely to be linearly seperable. Though, for our data following two different $sin$ functions, these two hidden neurons are just not enough to seperate all data correct.

• 2nd: We can also plot the decision boundary in our original space.

If your implementation is correct and training did succeed, your plots could look like the following:

### plot hidden transformation of X with learned w1s (transformation) and learned w2s (boundary)
###
### ATTENTION: ONLY WORKDS IF HIDDEN LAYER has 2 neurons only
###

### Plot transformations
Z1, A1, Z2, A2 = forward_pass(X, ws, bs, act_fs)
plt.scatter(A1[:m1,0], A1[:m1,1], alpha=0.5, label='class 0')
plt.scatter(A1[m1:,0], A1[m1:,1], alpha=0.5, label='class 1')

### Plot true target functions
data_tmp = np.ndarray((len(x1_line), 2))
data_tmp[:,0] = x1_line

data_tmp[:,1] = x2_line_class_0
Z1, A1, Z2, A2 = forward_pass(data_tmp, ws, bs, act_fs)
plt.plot(A1[:,0], A1[:,1])

data_tmp[:,1] = x2_line_class_1
Z1, A1, Z2, A2 = forward_pass(data_tmp, ws, bs, act_fs)
plt.plot(A1[:,0], A1[:,1])

### Plot boundary
Z1, A1, Z2, A2 = forward_pass(X, ws, bs, act_fs)
x1_boundary_mlp = np.linspace(A1[:,0].min(),A1[:,1].max(), 10)
print(x1_boundary_mlp)
x2_boundary_mlp = (-bs[-1][-1] -ws[-1][0,0]*x1_boundary_mlp)/ws[-1][1,0]
plt.plot(x1_boundary_mlp, x2_boundary_mlp, c='g')
plt.legend()
plt.title('Data and boundary in hidden space')
### plot boundary in original space

grid_density = 100
x1 = np.linspace(X[:,0].min()-1,X[:,0].max()+1,grid_density)
x2 = np.linspace(X[:,1].min()-1,X[:,1].max()+1,grid_density)
mash = np.meshgrid(x1,x2)

data_tmp = np.ndarray((grid_density**2, n))
data_tmp[:,0] = mash[0].flatten()
data_tmp[:,1] = mash[1].flatten()

Z1, A1, Z2, A2 = forward_pass(data_tmp, ws, bs, act_fs)
print(data_tmp.shape)
print(A2.shape)
c0 = data_tmp[A2[:,0] < 0.5]
c1 = data_tmp[A2[:,0] >= 0.5]
plt.scatter(c0[:,0],c0[:,1], alpha=1.0, marker='s', color="#aaccee")
plt.scatter(c1[:,0],c1[:,1], alpha=1.0, marker='s', color="#eeccaa")
plot_data()
plt.title('Data and boundary in original space')

### Adding more Layers and Parametrization

Task:

Now write the forward_pass and backprop function again, but this time fully parametrize your functions, so you can use it with different number of layers, different activation function for each layer and so on.

def forward_pass(x, ws, bs, act_fs):

raise NotImplementedError()
return Zs, As
def backprop(x, y, ws, bs, act_fs, cost_f, lrate, epochs):

for i in range(epochs):

### forward
Zs, As = forward_pass(x, ws, bs, act_fs)

### cost

### backward

### update

return ws, bs, Zs, As
ws = [
np.random.randn(2, 20)*0.1,
np.random.randn(20, 2)*0.1,
np.random.randn(2, 1)*0.1
]
bs = np.full((len(ws),1),0.)
act_fs = [tanh, tanh, logistic] ### tanh / logistic / relu
cost_f = cross_entropy ### mean_squared_error / cross_entropy

y = y.reshape((len(X),1))

ws, bs, Zs, As = backprop(X, y, ws, bs, act_fs, cost_f, 0.1, 2000)

results = As[-1].flatten()
results[results < .5] = 0.
results[results >= .5] = 1.
print(results, results.shape)
print(y.flatten())

Now we are going to plot again:

• 1st: The data for the two neurons in the LAST hidden layer.
• 2nd: The decision boundary in our original space.

If your implementation is correct and training did succeed, your plots could look like the following:

# plot hidden transformation of X with learned w1s (transformation) and learned w2s (boundary)
#
# ATTENTION: ONLY WORKS IF LAST LAYER-1 has 2 neurons only
#
Zs, As = forward_pass(X, ws, bs, act_fs)
plt.scatter(As[-2][:m1,0], As[-2][:m1,1], alpha=0.5, label='class 0')
plt.scatter(As[-2][m1:,0], As[-2][m1:,1], alpha=0.5, label='class 1')

data_tmp = np.ndarray((len(x1_line), 2))
data_tmp[:,0] = x1_line
print(data_tmp.shape)
data_tmp[:,1] = x2_line_class_0
Zs, As = forward_pass(data_tmp, ws, bs, act_fs)
plt.plot(As[-2][:,0], As[-2][:,1])

data_tmp[:,1] = x2_line_class_1
Zs, As = forward_pass(data_tmp, ws, bs, act_fs)
plt.plot(As[-2][:,0], As[-2][:,1])

#x1_boundary_mlp = np.linspace(As[-2][:,0].min(),As[-2][:,0].max(), 10)
x1_boundary_mlp = np.linspace(-1, +1, 10)
x2_boundary_mlp = (-bs[-1][-1] -ws[-1][0,0]*x1_boundary_mlp)/ws[-1][1,0]
plt.plot(x1_boundary_mlp, x2_boundary_mlp, c='g')
plt.title('Data and boundary in hidden space')
grid_density = 100
x1 = np.linspace(X[:,0].min()-1,X[:,0].max()+1,grid_density)
x2 = np.linspace(X[:,1].min()-1,X[:,1].max()+1,grid_density)
mash = np.meshgrid(x1,x2)

data_tmp = np.ndarray((grid_density**2, n))
data_tmp[:,0] = mash[0].flatten()
data_tmp[:,1] = mash[1].flatten()

Zs, As = forward_pass(data_tmp, ws, bs, act_fs)
print(data_tmp.shape)
print(A2.shape)
c0 = data_tmp[As[-1][:,0] < 0.5]
c1 = data_tmp[As[-1][:,0] >= 0.5]
plt.scatter(c0[:,0],c0[:,1], alpha=1.0, marker='s', color="#aaccee")
plt.scatter(c1[:,0],c1[:,1], alpha=1.0, marker='s', color="#eeccaa")
plot_data()
plt.title('Data and boundary in original space')

### Freestyle Exercise

When you are finished you can try different activation functions ($tanh$, $logistic$m $relu$) and / or different cost functions when calling the backprop function for the Neural Network. You can also try to add more Layers.

Things you should note when trying different things:

• In order to plot the transformed hidden space, the last hidden layer may only consist of 2 neurons
• When using $relu$ as activation function, do not use it for a Layer with only 2 neurons. Chances are high, that wheigths are negative and you end up with 2 dead neurons.
• When using $relu$ in a layer with only 2 neurons, make sure not to initiliaze them negative.

[TODO]

## Licenses

### Notebook License (CC-BY-SA 4.0)

The following license applies to the complete notebook, including code cells. It does however not apply to any referenced external media (e.g., images).

Exercise: Simple Neural Network
by Klaus Strohmenger
is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Based on a work at https://gitlab.com/deep.TEACHING.

### Code License (MIT)

The following license only applies to code cells of the notebook.

Copyright 2019 Klaus Strohmenger

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.