# Introduction to Neural Networks

In Introduction to Machine Learning you have learned about the linear and logistic regregression to predict continous values or classes. You have learned about activation functions (sigmoid function), cost functions (mean squared error, cross-entropy) and how to iteratively reduce the costs by optimizing your model parameters with gradient descent.

In this course you will extend the logistic regression model to a fully connected neural network in order to be able solve non-linear serperable classification tasks. To accomplish this, you will learn about the backpropagation algorithm and implement it. Lastly you will get to know more activation functions (tanh, relu) and also advanced weight initialization methods (xavier).

## Notebooks

In Introduction to Machine Learning(highly recommended), you calulated the gradient by hand and just used the final formula. In this exercise you will learn how to just derive the single individual functions and chain them programatically. This allows to programmatically build computational graphs and derive them w.r.t. certain variables, only knowing the derivatives of the most basic functions.

Here you will learn to visualize a neural network given matrices of wheights and compute the forward pass using matrix and vector operations.

Knowing how to compute the forward pass and the backward pass with backpropagation, you are ready for a simple neural neutwork. First you will refresh your knowledge about logistic regression, but this time, implement it using computational graph. Then you will add a hidden layer. Further you will understand what happens with the data in the hidden layer by plotting it.

For a better understanding of neural networks, you will start to implement a framework on your own. The given notebook explains some core functions and concepts of the framework, so all of you have the same starting point. Our previous exercises were self-contained and not very modular. You are going to change that. Let us begin with a fully connected network on the now well-known MNIST dataset. The Pipeline will be

In the previous exercises we used to initilize our weights with a normaldistribution centered arount 0. Although it was not the worst way we could have done this, there also exist better ways. One is the Xavier initilization [GLO10], which is still practically used in state-of-the-art neural network architectures.

• TODO: exercise-weight-initilization

So far, the vanilla gradient descent update algorithm served well for our toy examples. But in real world you are much better off to chose a more sophisticated version, and there are plenty of them! And almost all are centered around the idea of an adaptive learning rate, which helps to speed up the training process and to not miss the minimum.