Neural Networks - Exercise: Simple MNIST Network

Introduction

TODO

Requirements

Knowledge

TODO

Python-Modules

# third party
import numpy as np
import matplotlib.pyplot as plt

# internal
from deep_teaching_commons.data.fundamentals.mnist import Mnist

Data

# create mnist loader from deep_teaching_commons
mnist_loader = Mnist(data_dir='data')

# load all data, labels are one-hot-encoded, images are flatten and pixel squashed between [0,1]
train_images, train_labels, test_images, test_labels = mnist_loader.get_all_data(one_hot_enc=True, normalized=True)

# shuffle training data
shuffle_index = np.random.permutation(60000)
train_images, train_labels = train_images[shuffle_index], train_labels[shuffle_index]

Simple MNIST Network

The presented network is an adaptation of Michael Nielson's introductory example to neural networks. It is recommended, though not necessary, to read the first two chapters of his great online book 'Neural Networks and Deep Learning' for a better understanding of the given example. Compared to the original by Nielsen, the present variant was vectorized and the sigmoid activation function replaced by a rectified linear unit function (ReLU). As a result, the code is written much more compact, and the optimization of the model is much more efficient.

delta_hist =[]

def feed_forward(X, weights):
    a = [X]
    for w in weights:
        a.append(np.maximum(a[-1].dot(w),0))
    return a

def grads(X, Y, weights):
    grads = np.empty_like(weights)
    a = feed_forward(X, weights)
    # https://brilliant.org/wiki/backpropagation/ or https://stats.stackexchange.com/questions/154879/a-list-of-cost-functions-used-in-neural-networks-alongside-applications
    delta = a[-1] - Y
    delta_hist.append(np.sum(delta*Y)/len(X))
    grads[-1] = a[-2].T.dot(delta)
    for i in range(len(a)-2, 0, -1):
        delta = (a[i] > 0) * delta.dot(weights[i].T)
        grads[i-1] = a[i-1].T.dot(delta)
    return grads / len(X)

trX, trY, teX, teY = train_images, train_labels, test_images, test_labels
weights = [np.random.randn(*w) * 0.1 for w in [(784, 200), (200,100), (100, 10)]]
num_epochs, batch_size, learn_rate = 20, 50, 0.1
for i in range(num_epochs):
    for j in range(0, len(trX), batch_size):
        X, Y = trX[j:j+batch_size], trY[j:j+batch_size]
        weights -= learn_rate * grads(X, Y, weights)
        once = False
    prediction_test = np.argmax(feed_forward(teX, weights)[-1], axis=1)
    print (i, np.mean(prediction_test == np.argmax(teY, axis=1)))

Exercise - Understanding an Implementation

Your goal is to understand how the implementation works. Therefore you can do the following:

Task:

  • Plot delta_hist, which stores the delta value calculated on the output layer during each iteration
  • Add comments to functions and lines of code. Follow the Google-Python guidelines or similar for comments.
  • Add a verbose argument (boolean) to the functions that adds meaningful print lines to the network.

Answere the Questions

Tasks:

Hopefully, this implementation of the given neural network is clear after your 1. Which cost function is used, what is its derivation and how is it implemented? 2. Why are the boundaries of your plot between [-1,0], why is it so noisy, how can you reduce the noise and what is the difference to a usual plot of a loss function? 3. How does the network implement the backpropagation algorithm?

Exercise - Step towards a NN-Framework

The presented implementation is compact and efficient, but hard to modify or extend. However, a modular design is crucial if you want to experiment with a neural network to understand the influence of its components. Now you make the first changes towards your own 'toy-neural-network-framework', which you should expand in the progress of the course.

Rework the implementation from above given the classes and methods below. Again, you do not have to re-engineer the whole neural network at this step. Rework the code to match the given specification and do necessary modifications only. For your understanding, you can change the names of the variables to more fitting ones.

class FullyConnectedNetwork:
    def __init__(self, layers):
        raise NotImplementedError("This is your duty")
        
    def forward(self, data):
        raise NotImplementedError("This is your duty")

    def backward(self, X, Y):
        raise NotImplementedError("This is your duty")

    def predict(self, data):
        raise NotImplementedError("This is your duty")
            
class Optimizer:
    def __init__(self, network, train_data, train_labels, test_data=None, test_labels=None, epochs=100, batch_size=20, learning_rate=0.01):
        raise NotImplementedError("This is your duty")
        
    def sgd(self):
        raise NotImplementedError("This is your duty")

    
# Following code should run:    
mnist_NN = FullyConnectedNetwork([(784, 200),(200,100),(100, 10)]) 
epochs, batch_size, learning_rate = 20, 500, 0.1
Optimizer(mnist_NN, train_images, train_labels, test_images, test_labels, epochs, batch_size, learning_rate)
plt.plot(mnist_NN.delta_hist)

Licenses

Notebook License (CC-BY-SA 4.0)

The following license applies to the complete notebook, including code cells. It does however not apply to any referenced external media (e.g., images).

Neural Networks - Exercise: Simple MNIST Network
by Benjamin Voigt
is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Code License (MIT)

The following license only applies to code cells of the notebook.

Copyright 2018 Benjamin Voigt

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.