# Neural Network Framework - Exercise: Fully Connected Network

## Introduction

For a better understanding of neural networks, you will start to implement a framework on your own. The given notebook explains some core functions and concepts of the framework, so all of you have the same starting point. Our previous exercises were self-contained and not very modular. You are going to change that. Let us begin with a fully connected network on the now well-known MNIST dataset. The Pipeline will be:

• Define a model architecture
• Construct a neural network based on the architecture
• Define an evaluation criteria
• Optimize the model (training)

Read the whole notebook carefully to understand how the pipeline works even if there is no specific implementation work required from you.

## Requirements

TODO

### Python-Modules

# third party
import numpy as np
from deep_teaching_commons.data.fundamentals.mnist import Mnist

### Data

We load the MNIST dataset. Have a look at the data structure that is necessary to use feed data into the framework. A batch is a 4d tensor with: (image_i, channel, width, height).

# create mnist loader from deep_teaching_commons

# load all data, labels are not one-hot-encoded, images are flatten and pixel squashed between [0,1]
train_images, train_labels, test_images, test_labels = mnist_loader.get_all_data(flatten=False, one_hot_enc=False, normalized=True)
print(train_images.shape, train_labels.shape)

# reshape to match general framework architecture
train_images, test_images = train_images.reshape(60000, 1, 28, 28), test_images.reshape(10000, 1, 28, 28)
print(train_images.shape, train_labels.shape)

# shuffle training data
shuffle_index = np.random.permutation(60000)
train_images, train_labels = train_images[shuffle_index], train_labels[shuffle_index]

## Towards a Neural Network Framework

To create custom models you have to be able to define layers and activation functions in a modular way. Layers and activation functions are therefore modelled as objects. Each object that you want to use has to implement a forward and a backward method that is used later by the NeuralNetwork class. Additionally the self.params attribute is mandatory to meet the specification of the NeuralNetwork class. It is used to store all learnable parameters that you need for the optimization algorithm. Implemented that way you can use the objects as building blocks and stack them up to create a custom model. Be aware of using an activation function after each layer except the last one, cause the softmax function is applied by default during loss calculation of the network output.

After completing this notebook you can move the implemented functions to the script files for further development. The framework consists of the following files:

• layer.py
• activation_func.py
• network.py
• cost_func
• optimizer.py
• utils.py

After processing the notebook, it certainly becomes clear which functionality belongs into which file.

### Exercise: Define Layers

The first layers added to the framework are a flatten and a fully-connected layer, which we need to build an architecture for the corresponding fully connected network — sidenote sometimes, depending on the framework, the term dense layer is used instead of fully-connected.

All kind of neural network layers and regularization techniques that can be inserted as layers into a architecture will be implemented in the file layer.py later.

Implement the methods FullyConnected.forward and FullyConnected.backward.

class Flatten(object):
''' Flatten layer used to reshape inputs into vector representation

Layer should be used in the forward pass before a dense layer to
transform a given tensor into a vector.
'''
def __init__(self):
self.params = []

def forward(self, X):
''' Reshapes a n-dim representation into a vector
by preserving the number of input rows.

Examples:
[10000,[1,28,28]] -> [10000,784]
'''
self.X_shape = X.shape
self.out_shape = (self.X_shape[0], -1)
out = X.reshape(-1).reshape(self.out_shape)
return out

def backward(self, dout):
''' Restore dimensions before flattening operation
'''
out = dout.reshape(self.X_shape)
return out, []

class FullyConnected(object):
''' Fully connected layer implemtenting linear function hypothesis
in the forward pass and its derivation in the backward pass.
'''
def __init__(self, in_size, out_size):
''' initialize all learning parameters in the layer

Weights will be initialized with modified Xavier initialization.
Biases will be initialized with zero.
'''
self.W = np.random.randn(in_size, out_size) * np.sqrt(2. / in_size)
self.b = np.zeros((1, out_size))
self.params = [self.W, self.b]

def forward(self, X):
''' Linear combiationn of images, weights and bias terms

Args:
X: Matrix of images (flatten represenation)

Returns:
out: Sum of X*W+b
'''
self.X = X
############################################
#                   TODO                   #
############################################
# out =
############################################
#             END OF YOUR CODE             #
############################################
return out

def backward(self, dout):
''' Restore dimensions before flattening operation

Args:
dout: Derivation of the local out

Returns:
dX : Derivation with respect to X
dW : Derivation with respect to W
db : Derivation with respect to b
'''
############################################
#                   TODO                   #
############################################
#dX =
#dW =
############################################
#             END OF YOUR CODE             #
############################################
db = np.sum(dout, axis=0)
return dX, [dW, db]

### Testing

Once you've connected many types of layers in a network and you notice an error in your training, it can be difficult to track down which layer exactly has a buggy implementation. Since you're implementing each layer in a modular fashion you can also test them individually. So, it's a good practice to write tests for each of your layers at this point already.

There are properties you know should hold true about the input and output of your layer. In the FullyConnected layer, you may want to test:

• In the forward pass: which shape should the return value have?
• In the backward pass: which shape should the derivatives dX, dW and db have?
• In the backward pass: which shape do you expect from the argument dout?

### Exercise: Define Activation Function

First, remember that activation functions are non-linearities added to your architecture. As an example the classic ReLU function is implemented here:

$f ( x ) = \left\{ \begin{array} { l l } { x } & { \text { if } x > 0 } \\ { 0 } & { \text { otherwise } } \end{array} \right.$

The ReLU function matches the current weight initialization in the fully-connected layer. Note that may have to be changed if you implement other activation functions.

Actually, the activation function belongs into the layer.py, for the sake of clarity, however, the functions are put into a separate file activation_func.py.

Implement the ReLU class.

class ReLU(object):
''' Implements activation function rectified linear unit (ReLU)

ReLU activation function is defined as the positive part of
its argument.
'''
def __init__(self):
self.params = []

def forward(self, X):
''' In the forward pass return the identity for x > 0

Safe input for backprop and forward all values that are above 0.
'''
self.X = X
############################################
#                   TODO                   #
############################################
#return
############################################
#             END OF YOUR CODE             #
############################################

def backward(self, dout):
''' Derivative of ReLU

Retruns:
dX: for all x \elem X <= 0 in forward pass
return 0 else x
[]: no gradients on ReLU operation
'''
dX = dout.copy()
############################################
#                   TODO                   #
############################################
#dX =
############################################
#             END OF YOUR CODE             #
############################################
return dX, []

### NeuralNetwork Class

A NeuralNetwork object connects all layers and activation functions of a model architecture using the forward and backward methods of the containing objects. Calling forward on the NeuralNetwork object will pass a given input through the whole computational graph. The backward function calculates the gradients via backpropagation.

It further creates a global list of all parameters in the network during initialization, which is used later in optimization process.

A predict function implements a foward pass with the application of a given score function at the end of the calculation. At the momennt it is suited for the softmax function, taking only the max argument.

class NeuralNetwork(object):
''' Creates a neural network from a given layer architecture

This class is suited for fully-connected network and
convolutional neural network architectures. It connects
the layers and passes the data from one end to another.
'''
def __init__(self, layers, score_func=None):
''' Setup a global parameter list and initilize a
score function that is used for predictions.

Args:
layer: neural network architecture based on layer and activation function objects
score_func: function that is used as classifier on the output
'''
self.layers = layers
self.params = []
for layer in self.layers:
self.params.append(layer.params)
self.score_func = score_func

def forward(self, X):
''' Pass input X through all layers in the network
'''
for layer in self.layers:
X = layer.forward(X)
return X

def backward(self, dout):
''' Backprop through the network and keep a list of the gradients
from each layer.
'''
for layer in reversed(self.layers):

def predict(self, X):
''' Run a forward pass and use the score function to classify
the output.
'''
X = self.forward(X)
return np.argmax(self.score_func(X), axis=1)

### Exercise: Define a Cost Function

Implementations of different cost functions should be placed into cost_func.py. A cost function object defines the criteria your network is evaluating during the optimization process. Further the class contains score functions that can be used as classification criteria for predictions using a certain model. So it is necessary to provide a cost function object to a optimization algorithm for the training process.

Implement the softmax method.

class CostCriteria(object):
''' Implements different types of loss and score functions for neural networks

Todo:
- Implement init that defines score and loss function
'''
def softmax(X):
''' Numeric stable calculation of softmax
'''
############################################
#                   TODO                   #
############################################
#return =
############################################
#             END OF YOUR CODE             #
############################################

def cross_entropy_softmax(X, y):
''' Computes loss and prepares dout for backprop

https://eli.thegreenplace.net/2016/the-softmax-function-and-its-derivative/
'''
m = y.shape[0]
p = CostCriteria.softmax(X)
log_likelihood = -np.log(p[range(m), y])
loss = np.sum(log_likelihood) / m
dout = p.copy()
dout[range(m), y] -= 1
return loss, dout

### Testing

Softmax turns each row (each sample) into a probability distribution over the output classes. So you may want to test

• the shape of the return value should contain the same number of samples
• if each row is a valid probability distribution. So all values should of the return value should be [0..1] and each row should sum to 1.

### Optimization with SGD

The file optimizer.py contains implementations of optimization algorithms. Your optimizer needs your custom network, data and loss function and some additional hyperparameter as arguments to optimize your model.

class Optimizer(object):
def get_minibatches(X, y, batch_size):
''' Decomposes data set into small subsets (batches)
'''
m = X.shape[0]
batches = []
for i in range(0, m, batch_size):
X_batch = X[i:i + batch_size, :, :, :]
y_batch = y[i:i + batch_size, ]
batches.append((X_batch, y_batch))
return batches

def sgd(network, X_train, y_train, cost_function, batch_size=32, epoch=100, learning_rate=0.001, X_test=None, y_test=None, verbose=None):
''' Optimize a given network with stochastical gradient descent

Args:
X_train: trainings data
y_train: trainings label (ground truth)
cost_function: cost function
batch_size: size of a single batch
epoch: amount of epochs
learning_rate: the rate which is going to be multiplied with the gradient
X_test: trainings data if you want to test your model in each epcoh
y_test: trainings labels
verbose: if set it prints out training accuracy and test accuracy
Returns:
Model with optimized parameters
'''
minibatches = Optimizer.get_minibatches(X_train, y_train, batch_size)
for i in range(epoch):
loss = 0
if verbose:
print('Epoch',i)
for X_mini, y_mini in minibatches:
# calculate loss and derivation of the last layer
loss, dout = cost_function(network.forward(X_mini), y_mini)
# Do not train in epoch 0, so we now performance b4 training
if i > 0:
# run vanilla sgd update for all learnable parameters in self.params
param[i] += - learning_rate * grad[i]
if verbose:
train_acc = np.mean(y_train == network.predict(X_train))
test_acc = np.mean(y_test == network.predict(X_test))
print("Loss = {0} :: Training = {1} :: Test = {2}".format(loss, train_acc, test_acc))
return network

### Put it All Together

Now you have to put all parts together to create and train a fully connected neural network. First, you have to define an individual network architecture by flattening the input and stacking fully-connected layer with activation functions, e.g.:

Input -> Flatten -> Dense -> Activation -> Dense -> Activation -> Dense -> Activation -> Dense

You have to initialize all objects you need to build your custom architecture and put them into a list afterward. Your architecture is then given to a NeuralNetwork object that handles the inter-layer communication during the forward and backward pass. It will also set the evaluation criteria at the end of the network, because of that you end your architecture with a fully-connected layer. The pipeline above is implemented in the following cell.

Finally, you can train the model with an optimization algorithm and a cost function, here stochastic gradient descent and cross-entropy with softmax. That kind of pipeline is similar to the one you would create with a more sophisticated framework like Tensorflow or PyTorch.

# design a three hidden layer architecture with Dense-Layer
# and ReLU as activation function
def fcn_mnist():
flat = Flatten()
hidden_01 = FullyConnected(784, 500)
relu_01 = ReLU()
hidden_02 = FullyConnected(500, 200)
relu_02 = ReLU()
hidden_03 = FullyConnected(200, 100)
relu_03 = ReLU()
ouput = FullyConnected(100, 10)
return [flat, hidden_01, relu_01, hidden_02, relu_02, hidden_03, relu_03, ouput]

# create a neural network on specified architecture with softmax as score function
fcn = NeuralNetwork(fcn_mnist(), score_func=CostCriteria.softmax)

# optimize the network and a softmax loss
fcn = Optimizer.sgd(fcn, train_images, train_labels, CostCriteria.cross_entropy_softmax, batch_size=64, epoch=10, learning_rate=0.01, X_test=test_images, y_test=test_labels, verbose=True)

### Exercise: Experiment with the Framework

Here is your last exercise for this notebook:

Now you have a basic idea of how to build a fully-connected neural network with the framework. The next steps are straight forward. Download all script files of the framework from Moodle or the exercise-repository and move the implemented functions into the correct script files:

• layer.py
• activation_func.py
• cost_func

After chose a dataset you like and create a data loader in the script file utils.py. Load your data, build a neural network and try to build a good classifier. Have fun!

The following license applies to the complete notebook, including code cells. It does however not apply to any referenced external media (e.g., images).

Neural Networks - Exercise: Simple MNIST Network
by Benjamin Voigt