# HTW-Berlin - Informatik und Wirtschaft - Aktuelle Trends - Machine Learning: Evaluation Exercise

## Introduction

Goal of this exercise is to evaluation scores for a classification task in Python. You can use the Python standard library and math functions from numpy. This notebook guides you through the implementation process.

This notebooks implements tests using assert or np.testing.assert_almost_equal. If you run the corresponding notebook cell and no output appears, the test has passed. Otherwise an exception is raised.

General Hint:

If you have problems with the implementation (e.g. you don't know how to call a certain function or you don't know how to loop through the dataset), make use of the interactive nature of the notebook. You can at all times add new cells to the notebook to inspect defined variables or to try small code snippets.

### Required Knowledge

This exercise is part of the course "Aktuelle Trends der Informations- und Kommunikationstechnik". The fundamentals of evaluation metrics are taught in class.

• The PDF slides used in class are available in the educational-materials repository.

### Required Python Modules

import os
import socket
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from deep_teaching_commons.data.fundamentals.iris import Iris

### Required Data

base_data_dir = os.path.expanduser('~/deep.TEACHING/data')
dm = Iris(base_data_dir=base_data_dir)  # data manager
iris = dm.dataframe()
df_reduced = iris.query('species == "Iris-versicolor" | species == "Iris-virginica"')
X = df_reduced[['petal_width', 'petal_length']].values
Y = df_reduced['species'].replace({'Iris-versicolor': 0, 'Iris-virginica': 1}).values
X[:5]
Y[:5]
X.shape, Y.shape
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
X_scaled[:5]
# plot data
cp = sns.color_palette()
df_scaled = pd.DataFrame(X_scaled, columns=['x1', 'x2'])
df_scaled['y'] = Y
sns.scatterplot(data=df_scaled, x='x1', y='x2', hue='y', palette=cp[1:3]);
# split data for training and test
X_train, X_test, Y_train, Y_test = train_test_split(
X_scaled, Y, stratify=Y, test_size=0.2, random_state=42
)
train_classes = dict(zip(*np.unique(Y_train, return_counts=True)))
test_classes = dict(zip(*np.unique(Y_test, return_counts=True)))

train_classes, test_classes

## Implementation of Logistic Hypothesis

This implementation was part of the last notebook "Exercise: Logistic Regression".

def sigmoid(t):
return 1 / (1 + np.exp(-t))
def make_logistic_hypothesis(w1, w2, b):
def logistic_hypothesis(x1, x2):
return sigmoid(x1 * w1 + x2 * w2 + b)

return logistic_hypothesis
def make_decision_boundary(w1, w2, b, threshold):
def decision_boundary(x1):
return (np.log(threshold / (1 - threshold)) - x1*w1 - b) * (1 / w2)

return decision_boundary
def plot_boundary(df, decision_boundary):
sns.scatterplot(data=df, x='x1', y='x2', hue='y', palette=sns.color_palette()[1:3])

spacing = np.linspace(df['x1'].min(), df['x1'].max(), 10)
boundary_values = np.array([decision_boundary(x1) for x1 in spacing])

plt.plot(spacing, boundary_values, label='boundary')
def make_classify(w1, w2, b, threshold):
h = make_logistic_hypothesis(w1, w2, b)

def classify(x1, x2):
return 1 if h(x1, x2) > threshold else 0

return classify
# Choose some mediocre values, to demonstrate metrics
w1, w2, b = 3.6962765211562245, 2.548083316850051, 0.01089234547433182
# plot training data
df_train = pd.DataFrame(X_train, columns=['x1', 'x2'])
df_train['y'] = Y_train
plot_boundary(df_train, make_decision_boundary(w1, w2, b, 0.5))
# plot test data
df_test = pd.DataFrame(X_test, columns=['x1', 'x2'])
df_test['y'] = Y_test
plot_boundary(df_test, make_decision_boundary(w1, w2, b, 0.5))
# create classifier
classify = make_classify(w1, w2, b, 0.5)
classify(-1, -1)

## Classify Datasets

C_train = np.array([classify(x1, x2) for x1, x2 in X_train])
C_test = np.array([classify(x1, x2) for x1, x2 in X_test])
C_train[:5]
C_test[:5]
C_train.shape, C_test.shape

## Exercise: Accuracy

Accuracy is defined as

$accuracy = \frac{T}{T + F}$

where $T$ is the number of true classifications and $F$ is the number of false classifications on a dataset.

Implement the function accuracy below.

def accuracy(C, Y):
raise NotImplementedError('implement this function')
train_accuracy = accuracy(C_train, Y_train)
train_accuracy
test_accuracy = accuracy(C_test, Y_test)
test_accuracy
np.testing.assert_almost_equal(accuracy(C_train, Y_train), 0.975)
np.testing.assert_almost_equal(accuracy(C_test, Y_test), 0.8)

## Exercise: TP, FP, TN, FN

#### True Positive (TP)

Number of true (T) classifications where the classification result is class 1 (P) bestimmt wurden.

#### False Positive (FP)

Number of false (F) classifications where the classification result is class 1 (P) bestimmt wurden.

#### True Negative (TN)

Number of true (T) classifications where the classification result is class 0 (N) bestimmt wurden.

#### False Negative (FN)

Number of false (F) classifications where the classification result is class 0 (N) bestimmt wurden.

Implement the function tp_fp_tn_fn to calculate values for TP, FP, FN, FN on a dataset.

def tp_fp_tn_fn(C, Y):
raise NotImplementedError('implement this function')

tp, fp, tn, fn = 0, 0, 0, 0

return tp, fp, tn, fn
train_tp, train_fp, train_tn, train_fn = tp_fp_tn_fn(C_train, Y_train)
train_tp, train_fp, train_tn, train_fn
test_tp, test_fp, test_tn, test_fn = tp_fp_tn_fn(C_test, Y_test)
test_tp, test_fp, test_tn, test_fn
np.testing.assert_almost_equal(tp_fp_tn_fn(C_train, Y_train), (39, 1, 39, 1))
np.testing.assert_almost_equal(tp_fp_tn_fn(C_test, Y_test), (7, 1, 9, 3))

# Exercise: Precision and Recall

Precision and recall are defined as follows:

$precision = \frac{TP}{TP+FP}$
$recall = \frac{TP}{TP+FN}$

Implement a function precision_recall to calculate precision and recall.

def precision_recall(tp, fp, fn):
raise NotImplementedError('implement this function')

precision = None
recall = None

return precision, recall
train_precision, train_recall = precision_recall(train_tp, train_fp, train_fn)
train_precision, train_recall
test_precision, test_recall = precision_recall(test_tp, test_fp, test_fn)
test_precision, test_recall
np.testing.assert_almost_equal(precision_recall(39, 1, 1), (0.975, 0.975))
np.testing.assert_almost_equal(precision_recall(7, 1, 3), (0.875, 0.7))

# Exercise: F-Score

The f-score metric is defined as follows:

$F_{\beta} = (1 + \beta^2) \cdot \frac{precision \cdot recall}{(\beta^2 \cdot precision) + recall}$

Implement a function f_score to calculate the value.

def f_score(precision, recall, beta):
raise NotImplementedError('implement this function')
train_f1_score = f_score(train_precision, train_recall, 1)
train_f1_score
train_f05_score = f_score(train_precision, train_recall, 0.5)
train_f05_score
test_f1_score = f_score(test_precision, test_recall, 1)
test_f1_score
test_f05_score = f_score(test_precision, test_recall, 0.5)
test_f05_score
np.testing.assert_almost_equal(f_score(train_precision, train_recall, 1), 0.975)
np.testing.assert_almost_equal(f_score(train_precision, train_recall, 0.5), 0.975)
np.testing.assert_almost_equal(f_score(test_precision, test_recall, 1), 0.7777777777777777)
np.testing.assert_almost_equal(f_score(test_precision, test_recall, 0.5), 0.8333333333333334)

## Summary and Outlook

You have learned how to implement evaluation scores for a classification task.

This was the last part of the course. If you are interested in further topics, learn about vectorization, general forms of Linear and Logistic Regression with an arbitrary number of inputs and outputs, as well as Artifcial Neural Networks.

The following license applies to the complete notebook, including code cells. It does however not apply to any referenced external media (e.g. images).

HTW-Berlin - Informatik und Wirtschaft - Aktuelle Trends - Machine Learning: Evaluation Exercise
by Christoph Jansen (deep.TEACHING - HTW Berlin)