# Neural Networks - Exercise: Simple MNIST Network

## Introduction

In this exercise, you will analyze an existing implementation of a neural network to recognise handwritten digits. It is an adaptation of the network presented by Michael Nielsen in his book Neural Networks and Deep Learning. The aim is to identify and understand how the different components that make up the network interact. With this overview of the network, you will refactor the script-like approach into a more modular implementation that clearly distinguishes between forward pass, computation of loss, backward pass and training.

## Requirements

### Python-Modules

# third party
import numpy as np
import matplotlib.pyplot as plt

# internal
from deep_teaching_commons.data.fundamentals.mnist import Mnist

## Data

# create mnist loader from deep_teaching_commons

# load all data, labels are one-hot-encoded, images are flatten and pixel squashed between [0,1]
train_images, train_labels, test_images, test_labels = mnist_loader.get_all_data(one_hot_enc=True, normalized=True)

# shuffle training data
shuffle_index = np.random.permutation(60000)
train_images, train_labels = train_images[shuffle_index], train_labels[shuffle_index]

## Simple MNIST Network

The presented network is an adaptation of Michael Nielson's introductory example to neural networks. It is recommended, though not necessary, to read the first two chapters of his great online book 'Neural Networks and Deep Learning' for a better understanding of the given example. Compared to the original by Nielsen, the present variant was vectorized and the sigmoid activation function replaced by a rectified linear unit function (ReLU). As a result, the code is written much more compact, and the optimization of the model is much more efficient.

delta_hist =[]

def feed_forward(X, weights):
a = [X]
for w in weights:
a.append(np.maximum(a[-1].dot(w),0))
return a

a = feed_forward(X, weights)
# https://brilliant.org/wiki/backpropagation/ or https://stats.stackexchange.com/questions/154879/a-list-of-cost-functions-used-in-neural-networks-alongside-applications
delta = a[-1] - Y
delta_hist.append(np.sum(delta*Y)/len(X))
for i in range(len(a)-2, 0, -1):
delta = (a[i] > 0) * delta.dot(weights[i].T)

trX, trY, teX, teY = train_images, train_labels, test_images, test_labels
weights = [np.random.randn(*w) * 0.1 for w in [(784, 200), (200,100), (100, 10)]]
num_epochs, batch_size, learn_rate = 20, 50, 0.1
for i in range(num_epochs):
for j in range(0, len(trX), batch_size):
X, Y = trX[j:j+batch_size], trY[j:j+batch_size]
weights -= learn_rate * grads(X, Y, weights)
once = False
prediction_test = np.argmax(feed_forward(teX, weights)[-1], axis=1)
print (i, np.mean(prediction_test == np.argmax(teY, axis=1)))

## Exercise - Understanding an Implementation

Your goal is to understand how the implementation works. Therefore you can do the following:

• Plot delta_hist, which stores the delta value calculated on the output layer during each iteration
• Add an argument verbose (boolean) to the function. When set to true, add meaningful print lines to the network.

After working through the implementation, try and answer the following questions:

1. Which cost function is used, what is its derivative and how is it implemented?
2. Why are the boundaries of your plot between [-1,0], why is it so noisy, how can you reduce the noise and what is the difference to a usual plot of a loss function?
3. How does the network implement the backpropagation algorithm?

## Exercise - Step towards a NN-Framework

The presented implementation is compact and efficient, but hard to modify or extend. However, a modular design is crucial if you want to experiment with a neural network to understand the influence of its components. Now you make the first changes towards your own 'toy-neural-network-framework', which you should expand in the progress of the course.

Rework the implementation from above given the classes and methods below. Again, you do not have to re-engineer the whole neural network in this step. Rework the code to match the given specification and do necessary modifications only. For your understanding, you can change the names of the variables to more fitting ones.

class FullyConnectedNetwork:
def __init__(self, layers):

def forward(self, data):

def backward(self, X, Y):

def predict(self, data):

class Optimizer:
def __init__(self, network, train_data, train_labels, test_data=None, test_labels=None, epochs=100, batch_size=20, learning_rate=0.01):

def sgd(self):

# Following code should run:
mnist_NN = FullyConnectedNetwork([(784, 200),(200,100),(100, 10)])
epochs, batch_size, learning_rate = 20, 500, 0.1
Optimizer(mnist_NN, train_images, train_labels, test_images, test_labels, epochs, batch_size, learning_rate)
plt.plot(mnist_NN.delta_hist)

### Literature

The following license applies to the complete notebook, including code cells. It does however not apply to any referenced external media (e.g., images).

Neural Networks - Exercise: Simple MNIST Network
by Benjamin Voigt