Status: Draft

Introduction to Machine Learning

In order to really understand how neural networks work, it is essential to understand their core concepts. Practically, a multi layer perceptron (fully connected feed forward network) consists of many logistic regression tasks. So in order to understand neural networks you need to understand logistic regression, which is practically linear regression extended with a sigmoid function.

Linear regression is used to predict continous values (e.g. the temperature tomorrow), based on some features (e.g. temperature today, humidity, calender week, etc…). By adding a sigmoid function at the end of linear regression, the final output will be a value between 0.0 and 1.0. Adding a threshold at 0.5 then results in a binary classifier (e.g. warm or cold day tomorrow).

Stacking several logistic regressions (though with other activation function) in the same layer after the input layer and adding a last logistic regression after that, results in a fully connected neural network with one hidden layer. But this will be tought in the next course Introduction to Neural Networks

The following gif depicts the whole picture:



Training and using regression models, and therefore also neural networks, require either a lot of loops, or better, matrix and vector operations. The first exercise is intended to refresh mathematic skills.

In Python, the most efficient way to handle vectors, matrices and n-dimensional arrays is via the package numpy. Its core functions are implemented in C and Fortran, which makes it extremely fast. Numpy provides a lot of functions and features to access and manipulate n-dimensional arrays. Having a basic knowledge about what is possible with numpy will ease any data-scientists daily work with Python.

The most simple model to start with is linear regression. In this exercise you will build a model, which predicts a floating point value (target value), based on only one other floating point value (feature). You will get familiar with the concept of hypothesises, cost functions (here mean squared error), the gradient descent algorithm and and iterative update rule.

The next exercise extends the previous one, by adding more features and how to handle them the best mathematically.

So far all the models have only been judged (good or bad) by examing the costs. In the field of machine learning, a wide variety of quality measures is beeing used. In this notebook you will get to know the most important ones, like confusion matrices, accuracy, precision and recall, f1-score and receiver-operator-characteristic.

With the use of the sigmoid (or logistic) function, the linear regression can easily be extended to predict whether a set of features of an example is likely to belong to class A or class B. Though the addition of the logistic function requires an other cost function, the cross-entropy-cost.

Until now, you have only been working with training data. But only using training data, which the model used to learn its parameters, it is hard to really tell how the model performs on unseen data. In this notebook we introduce the out-of-sample error (E_out), in contrast to the error on the training data (E_in). E_out is composed of the variance and the bias. The ratio between these two key figures is determined by the complexity of our model and if it did not fit to the training data enough (underfitting) or even too much (overfitting).

Of course machine learning is not all about neural networks (and therefore linear and logistic regression). There exist a lot of other algorithms, which still have their merit and justification. One example are decision trees and random forests. But before learning about the later, take a look at decision trees and the concept of entropy.