# HTW-Berlin - Informatik und Wirtschaft - Aktuelle Trends - Machine Learning: Logistic Regression Exercise

## Introduction

Goal of this exercise is to implement Logistic Regression in Python. You can use the Python standard library and math functions from numpy. This notebook guides you through the implementation process.

This notebooks implements tests using assert or np.testing.assert_almost_equal. If you run the corresponding notebook cell and no output appears, the test has passed. Otherwise an exception is raised.

General Hint:

If you have problems with the implementation (e.g. you don't know how to call a certain function or you don't know how to loop through the dataset), make use of the interactive nature of the notebook. You can at all times add new cells to the notebook to inspect defined variables or to try small code snippets.

### Required Knowledge

This exercise is part of the course "Aktuelle Trends der Informations- und Kommunikationstechnik". The fundamentals of Logistic Regression are taught in class.

• The PDF slides used in class are available in the educational-materials repository.

### Required Python Modules

import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import StandardScaler
from deep_teaching_commons.data.fundamentals.iris import Iris

### Required Data

base_data_dir = os.path.expanduser('~/deep.TEACHING/data')
dm = Iris(base_data_dir=base_data_dir)  # data manager
iris = dm.dataframe()
iris.head()
df_reduced = iris.query('species == "Iris-versicolor" | species == "Iris-virginica"')
df_reduced.head()
X = df_reduced[['petal_width', 'petal_length']].values
Y = df_reduced['species'].replace({'Iris-versicolor': 0, 'Iris-virginica': 1}).values
X[:5]
Y[:5]
X.shape, Y.shape
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
X_scaled[:5]
cp = sns.color_palette()
df_scaled = pd.DataFrame(X_scaled, columns=['x1', 'x2'])
df_scaled['y'] = Y
sns.scatterplot(data=df_scaled, x='x1', y='x2', hue='y', palette=cp[1:3]);

## Logistic Regression

### Exercise: Sigmoid

The sigmoid function is defined as follows:

$sigmoid(t) = \frac{1}{1 + e^{-t}}$

Implement this function below. You should use np.exp to calculate $e^{-t}$.

def sigmoid(t):
raise NotImplementedError('implement this function')
sigmoid(0)
# run tests
np.testing.assert_almost_equal(sigmoid(0), 0.5)
np.testing.assert_almost_equal(sigmoid(1), 0.7310585786300049)
spacing = np.linspace(-7, 7, 100)
plt.plot(spacing, [sigmoid(t) for t in spacing]);
plt.xlabel('t')
plt.ylabel('sigmoid(t)');

### Exercise: Hypothesis

The logistic hypothesis is defined as follows:

$h(x_1, x_2) = sigmoid(x_1 w_1 + x_2 w_2 + b)$

The hypothesis is a function of $x_1$ and $x_2$, where the parameters $w_1$, $w_2$ and $b$ are treated as constants.

Implement this function in Python as a closure.

def make_logistic_hypothesis(w1, w2, b):
# this is a closure
def logistic_hypothesis(x1, x2):
raise NotImplementedError('implement this function')

return logistic_hypothesis
w1, w2, b = 3.3, 3.7, 0.3
h = make_logistic_hypothesis(w1, w2, b)

The hypothesis $h$ is used to predict values for $y$ given $x$.

x1, x2 = -1, 1
h(x1, x2)
# run a test
np.testing.assert_almost_equal(h(-1, 1), 0.6681877721681662)

### Exercise: Classifier

The logistic hypothesis calculates probability values in range $]0, 1[$. In order to decide wether a dataset is classfied as $1$ or $0$, we need to define a threshold as decision boundary. The function classify uses this threshold and is defined as follows.

classify(x{_1}, x{_2}) \cases{1, & if $h(x_1, x_2) > threshold$ \\ 0, & otherwise}

Implement classify in Python as a closure. You can use the already implemented function logistic_hypothesis inside of classify to create a new hypothesis.

def make_classify(w1, w2, b, threshold):
def classify(x1, x2):
raise NotImplementedError('implement this function')

return classify
w1, w2, b = 3.3, 3.7, 0.3
threshold = 0.5
classify = make_classify(w1, w2, b, threshold)
x1, x2 = -1, -1
classify(x1, x2)
x1, x2 = 1, 1
classify(x1, x2)
# run tests
assert classify(-1, -1) == 0
assert classify(1, 1) == 1

### Exercise: Cost

The cost function is defined as follows:

$J(w1, w2, b) = \frac{1}{m} \sum_{i=1}^{m} y^{i} \cdot -ln(h(x_{1}^{i}, x_{2}^{i})) + (1 - y^{i}) \cdot -ln(1 - h(x_{1}^{i}, x_{2}^{i}))$

Implement this function using a closure. The binary crossentropy cost function $J$ loops through the complete dataset of X and Y, to sum over the calculated cost values. As a last step the summation is divided by $m$, where $m$ is the number of samples in $X$.

$J$ is a function of $w_1$, $w_2$ and $b$, because we want to find the best parameters $w_1$, $w_2$ and $b$ providing the lowest possible cost. Therefore the data $X$ and $Y$ is treated as a constant. You can use the already implemented function logistic_hypothesis inside of $J$ to create a new hypothesis. Use numpy's build-in function np.log to calculate the logarithmus naturalis $ln$.

def make_binary_crossentropy_cost(X, Y):
def binary_crossentropy_cost(w1, w2, b):
raise NotImplementedError('implement this function')

return binary_crossentropy_cost
J = make_binary_crossentropy_cost(X_scaled, Y)
w1, w2, b = 3.3, 3.7, 0.3
J(w1, w2, b)
# run a test
np.testing.assert_almost_equal(J(3.3, 3.7, 0.3), 0.10736084146625315)

The partial derivatives (gradient) are used by the Stochastic Gradient Descent optimizer and are defined as follows:

\begin{aligned} \frac{\partial}{\partial w_1} J(w_{1}, w_{2}, b) &= \frac{1}{m}\sum_{i=1}^{m}(h(x_{1}^{i}, x_{2}^{i}) - y^{i}) \cdot x_{1}^{i}\\ \frac{\partial}{\partial w_2} J(w_{1}, w_{2}, b) &= \frac{1}{m}\sum_{i=1}^{m}(h(x_{1}^{i}, x_{2}^{i}) - y^{i}) \cdot x_{2}^{i}\\ \frac{\partial}{\partial b} J(w_{1}, w_{2}, b) &= \frac{1}{m}\sum_{i=1}^{m}(h(x_{1}^{i}, x_{2}^{i}) - y^{i}) \end{aligned}

Implement a function gradient, which calculates at a point $w1, w2, b$ the partial derivatives $pd\_w1$, $pd\_w2$ and $pd\_b$. Return all three values from the function.

def make_gradient(X, Y):
raise NotImplementedError('implement this function')

pd_w1 = None
pd_w2 = None
pd_b = None

return pd_w1, pd_w2, pd_b

return gradient
gradient = make_gradient(X, Y)
w1, w2, b = 3.3, 3.7, 0.3
gradient(w1, w2, b)
# run a test
np.testing.assert_almost_equal(gradient(3.3, 3.7, 0.3), (0.6629999910605702, 2.129999972002591, 0.49999999158653635))

The following pseude code shows the iterative parameter updates of Stochastic Gradient Descent:

Randomly initialize w and b.

For a number of epochs repeat:

\begin{aligned} pd\_w1 &:= \frac{\partial}{\partial w_1} J(w_1, w_2, b)\\ pd\_w2 &:= \frac{\partial}{\partial w_2} J(w_1, w_2, b)\\ pd\_b &:= \frac{\partial}{\partial b} J(w, b)\\\\ w1 &:= w1 - \alpha * pd\_w1\\ w2 &:= w2 - \alpha * pd\_w2\\ b &:= b - \alpha * pd\_b \end{aligned}

The function to be implemented is stochastic_gradient_descent(X, Y, w1, w2, b, alpha, epochs), where X, Y is the data, w1, w2, b are the randomly initialized parameters, alpha is the learning rate and epochs is the number of training iterations. You should return the values of $w_1$, $w_2$ and $b$, as well as a list of the cost after each training epoch.

def sgd(X, Y, w1, w2, b, alpha, epochs):
raise NotImplementedError('implement this function')

cost_per_epoch = []

return w1, w2, b, cost_per_epoch
alpha = 0.1
epochs = 1500
w1, w2, b = np.random.randn(3)
w1, w2, b, cost_per_epoch = sgd(X_scaled, Y, w1, w2, b, alpha, epochs)
w1, w2, b
len(cost_per_epoch)
# run tests
test_w1, test_w2, test_b, test_cost_per_epoch = sgd(X_scaled, Y, -1.58979407,  0.26957035, -1.92309864, 0.1, 1500)
print(test_w1, test_w2, test_b)
np.testing.assert_almost_equal(len(test_cost_per_epoch), 1500)
np.testing.assert_almost_equal((test_w1, test_w2, test_b), (3.234979367208197, 3.5709689275028116, 0.2978914362922387))

### Exercise: Plot Cost per Epoch

Plot the cost_per_epoch result of sgd.

def plot_over_time(cost_per_epoch):
raise NotImplementedError('implement this function')
plot_over_time(cost_per_epoch)

You can try out different alpha values and see how the training performance changes.

### Plot Boundary

This is not an exercise. Run the code below to visualize the decision boundary your implementation of Logistic Regression determined.

def make_decision_boundary(w1, w2, b, threshold):
def decision_boundary(x1):
return (np.log(threshold / (1 - threshold)) - x1*w1 - b) * (1 / w2)

return decision_boundary
def plot_boundary(df, decision_boundary):
sns.scatterplot(data=df, x='x1', y='x2', hue='y', palette=sns.color_palette()[1:3])

spacing = np.linspace(df['x1'].min(), df['x1'].max(), 10)
boundary_values = np.array([decision_boundary(x1) for x1 in spacing])

plt.plot(spacing, boundary_values, label='boundary')
plot_boundary(df_scaled, make_decision_boundary(w1, w2, b, threshold))

## Summary and Outlook

You have learned how to implement Logistic Regression with two inputs and one output variable to solve simple classification problems. The algorithms were implemented in Python, without the help of higher level libraries like Tensorflow or Keras.

The next part of the course covers evaluation scores for classification tasks.

The following license applies to the complete notebook, including code cells. It does however not apply to any referenced external media (e.g. images).

HTW-Berlin - Informatik und Wirtschaft - Aktuelle Trends - Machine Learning: Logistic Regression Exercise
by Christoph Jansen (deep.TEACHING - HTW Berlin)