ML-Fundamentals - Logistic Regression and Regularization
Table of Contents
- Summary and Outlook
In this exercise you will implement the logistic regression. Opposed to the linear regression, the purpose of this model is not to predict a continuous value (e.g. the temperature tomorrow), but to predict a certain class: For example, whether it will rain tomorrow or not. During this exercise you will:
- Implement the logistic function and plot it
- Implement the hypothesis using the logistic function
- Write a function to calculate the cross-entropy cost
- Implement the loss function using the hypothesis and cost
- Implement the gradient descent algorithm to train your model (optimizer)
- Visualize the decision boundary together with the data
- Calculate the accuracy of your model
- Extend your model with regularization
- Calculate the gradient for the loss function with cross-entropy cost (pen&paper)
You should have a basic knowledge of:
- Logistic regression
- Cross-entropy loss
- Gradient descent
Suitable sources for acquiring this knowledge are:
- Logistic Regression Notebook by Christian Herta and corresponding lecture slides (German)
- Regularization Notebook by Christian Herta and corresponding lecture slides (German)
- Chapter 5.1 of Deep Learning by Ian Goodfellow
- Some parts of chapter 1 and 3 of Pattern Recognition and Machine Learning by Christopher M. Bishop
- numpy quickstart
- Matplotlib tutorials
By deep.TEACHING convention, all python modules needed to run the notebook are loaded centrally at the beginning.
# External Modules import numpy as np import matplotlib.pyplot as plt %matplotlib inline
Exercise - Logistic Regression
For convenience and visualization, we will only use two features in this notebook, so we are still able to plot them together with the target class. But your implementation should also be capable of handling more (except the plots).
First we will create some artificial data. For each class, we will generate the features with bivariate (2D) normal distribution;
# class 0: # covariance matrix and mean cov0 = np.array([[5,-4],[-4,4]]) mean0 = np.array([2.,3]) # number of data points m0 = 1000 # class 1 # covariance matrix cov1 = np.array([[5,-3],[-3,3]]) mean1 = np.array([1.,1]) # number of data points m1 = 1000 # generate m gaussian distributed data points with # mean and cov. r0 = np.random.multivariate_normal(mean0, cov0, m0) r1 = np.random.multivariate_normal(mean1, cov1, m1)
plt.scatter(r0[...,0], r0[...,1], c='b', marker='o', label="class 0") plt.scatter(r1[...,0], r1[...,1], c='r', marker='x', label="class 1") plt.xlabel("x0") plt.ylabel("x1") plt.legend() plt.show() X = np.concatenate((r0,r1)) y = np.zeros(len(r0)+len(r1)) y[:len(r0),] = 1
For the logistic regression, we want the output of the hypothesis to be in the interval . This is done using the logistic function. The logistic function is a special case of the sigmoid function, though in the domain of machine learning, the term sigmoid function is often used as a synonym for logistic function:
Implement the logistic function and plot it for 1000 points in the interval of .
def logistic_function(x): """ Returns f(x) with f beeing the logistic function. """ raise NotImplementedError("You should implement this function") ### Insert code to plot the logistic function below
The logistic hypothesis is defined as:
or for the whole data set and
Implement the logistic hypothesis using your implementation of the logistic function.
logistic_hypothesis should return a function which accepts the training data :
>> theta = np.array([1.1, 2.0, -.9])
>> h = logistic_hypothesis(theta)
array([ -0.89896965, 0.71147926, ....
You may of course also implement a helper function for transforming into and use it inside the
lamda function of
def logistic_hypothesis(theta): ''' Combines given list argument in a logistic equation and returns it as a function Args: thetas: list of coefficients Returns: lambda that models a logistc function based on thetas and x ''' raise NotImplementedError("You should implement this function") ### Uncomment to test your implementation #theta = np.array([1.,2.,3.]) #h = logistic_hypothesis(theta) #print(h(X))
The cross-entropy costs are defined with:
Implement the cross-entropy cost.
Your python function should return a function, which accepts the vector . The returned function should return the cost for each feature vector . The length of the returned array of costs therefore has to be the same length as we have feature vectors (and also labels ):
>> J = cross_entropy_loss(logistic_hypothesis, X, y)
array([ 7.3, 9.5, ....
def cross_entropy_costs(h, X, y): ''' Implements cross-entropy as a function costs(theta) on given traning data Args: h: the hypothesis as function x: features as 2D array with shape (m_examples, n_features) y: ground truth labels for given features with shape (m_examples) Returns: lambda costs(theta) that models the cross-entropy for each x^i ''' raise NotImplementedError("You should implement this function") ### Uncomment to test your implementation #theta = np.array([1.,2.,3.]) #costs = cross_entropy_costs(logistic_hypothesis, X, y) #print(costs(theta))
Now implement the loss function , which calculates the mean costs for the whole training data . Your python function should return a function, which accepts the vector .
def mean_cross_entropy_costs(X, y, hypothesis, cost_func): ''' Implements mean cross-entropy as a function J(theta) on given traning data Args: X: features as 2D array with shape (m_examples, n_features) y: ground truth labels for given features with shape (m_examples) hypothesis: the hypothesis as function cost_func: cost function Returns: lambda J(theta) that models the mean cross-entropy ''' raise NotImplementedError("You should implement this") ### Uncomment to test your implementation #theta = np.array([1.,2.,3.]) #J = mean_cross_entropy_costs(X,y, logistic_hypothesis, cross_entropy_costs, 0.1) #print(J(theta))
A short recap, the gradient descent algorithm is a first-order iterative optimization for finding a minimum of a function. From the current position in a (cost) function, the algorithm steps proportional to the negative of the gradient and repeats this until it reaches a local or global minimum and determines. Stepping proportional means that it does not go entirely in the direction of the negative gradient, but scaled by a fixed value also called the learning rate. Implementing the following formalized update rule is the core of the optimization process:
Implement the function to update all theta values.
def compute_new_theta(X, y, theta, learning_rate, hypothesis): ''' Updates learnable parameters theta The update is done by calculating the partial derivities of the cost function including the linear hypothesis. The gradients scaled by a scalar are subtracted from the given theta values. Args: X: 2D numpy array of x values y: array of y values corresponding to x theta: current theta values learning_rate: value to scale the negative gradient hypothesis: the hypothesis as function Returns: theta: Updated theta_0 ''' raise NotImplementedError("You should implement this")
compute_new_theta method, you can now implement the gradient descent algorithm. Iterate over the update rule to find the values for that minimize our cost function . This process is often called training of a machine learning model.
- Implement the function for the gradient descent.
- Create a history of all theta and cost values and return them.
def gradient_descent(X, y, theta, learning_rate, num_iters): ''' Minimize theta values of a logistic model based on cross-entropy cost function Args: X: 2D numpy array of x values y: array of y values corresponding to x theta: current theta values learning_rate: value to scale the negative gradient num_iters: number of iterations updating thetas Returns: history_cost: cost after each iteration history_theta: Updated theta values after each iteration ''' raise NotImplementedError("You should implement this")
Training and Evaluation
Choose an appropriate learning rate, number of iterations and initial theta values and start the training
# Insert your code below
Now that the training has finished we can visualize our results.
Plot the costs over the iterations. Your plot should look similar to this one:
def plot_progress(costs): """ Plots the costs over the iterations Args: costs: history of costs """ raise NotImplementedError("You should implement this!")
plot_progress(history_cost) print("costs before the training:\t ", history_cost) print("costs after the training:\t ", history_cost[-1])
Plot Data and Decision Boundary
Now plot the deicision boundary (a straight line in this case) together with the data.
# Insert your code to plot below theta_hist[-1]
- Calculate the accuracy of your final classifier. The accuracy is the proportion of the correctly classified data.
- Why will the accuracy never reach 100% using this model and this data set?
# Insert you code below
Extend your implementation with a regularization term by adding it as argument to the functions
Proof - Pen&Paper
The sigmoid activation function is defined as
Now show that:
Note that in general (because of symmetry) holds:
with the sigmoid function as hypothesis
Make use of your knowlede, that:
Summary and Outlook
Notebook License (CC-BY-SA 4.0)
The following license applies to the complete notebook, including code cells. It does however not apply to any referenced external media (e.g., images).
Exercise: Logistic Regression and Regularization
by Christian Herta, Klaus Strohmenger
is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Based on a work at https://gitlab.com/deep.TEACHING.
Code License (MIT)
The following license only applies to code cells of the notebook.
Copyright 2018 Christian Herta, Klaus Strohmenger
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.