# Calculation of the Kappa Score

## Introduction

In this final notebook you will first determine the labels for the patients of the CAMELYON17 training set based on your predictions of the individual WSIs (slides) per patient.

Afterwards use the evaluation.py script (provided by the challenge) to calculate the kappa score.

## Requirements

### Python-Modules

import numpy as np

### Edit if needed

## Teaching Content

Although the evaluation.py script does all the calculations for you, as long the patient labels are assigned, it is always good to know how the metrics you are using work.

### Confusion Matrix

To determine the kappa score we first need to determine the confusion matrix. A confusion matrix is often used in binary classification tasks where we only have 2 classes (positive, negative), but it can also be constructed when we have more classes. The green elemtents mark the correct classifications. In some cases the classes can be more similar to one another (e.g. C1 might less different to C2 than to C3), which here is indicated by the intensity of the red color.

### Kappa Score

The kappa score takes into consideration that some correct predictions were made by 'accident':

\kappa = \frac{p0 - p_e}{1 - p_e}, with $p_0$ being the accuracy and $p_e$ the proportion of 'accidentially'_ correct classified examples.

For the binary classification task $p_e$ is calculated with:

$$p_e = \frac{(TP + FN) \cdot (TP + FP)}{b^2} + \frac{(FN + TN) \cdot (FP + TN)}{b^2}$$ with $b$ the total number of examples.

And in general for (n) different classes:

pe = \frac{1}{b^2} \cdot \sum{i=1}^{n} h{i+} \cdot h{+i}

with the sum of row $i SINGLESINGLE h_{i+}$ and the sum of column $i SINGLESINGLE h_{+i}$

### Weighted Kappa Score

If some misclassificaitons are worse then others (C1 classified as C3 is worse than classified as C2), it is possible to take weights into calculation. In this case we assign weights $w_{11}$ to $w_{nn}$ to the confusion matrix. For the weighted kappa score we then have:

\kappaw = 1 - \frac{\sum_i^n \sum_j^n w{ij} \cdot h{ij}}{\sum_i^n \sum_j^n w{ij} \cdot \frac{h{i+} \cdot h{+j}}{b}}

Note that the CAMELYON17 challenge uses the weighted kappa score for scoring, since classifying a patient with the true label pN0 (no metastasis at all) as pN0(i+) (max. isolated tumor cells found) is less worse then classifying him as _pN2 (macro metastasis found in four or more slides).

## Exercises

### Determine Patient Labels

Have a look at the CAMELYON17 evaluation page. Implement the code to edit your c17_train_predictions.csv, so it also includes the patient labels (pN0, pN0(i+), pN1mi, pN1 and pN2).

### Run the Script

Finally run the evaluation.py script provided in the CAMELYON17/testing folder to calculate the kappa score.

## Summary and Outlook

Congratulations. You successfully worked through the whole process to cover all necessary steps to take part at the CAMELYON challenge. Of course this was only a minimal example covering the most essential steps. To achieve better results follow the hints at the end of every notebook.

If you want, you can compare your kappa score with the official submission on the CAMELYON challenge results page. However, note that your results are based on the training set, whereas the official submissions are based on the CAMEYLON17 test set. If you apply your classifier on the test set, expect like 5% lower kappa score.

## Literature

The following license applies to the complete notebook, including code cells. It does however not apply to any referenced external media (e.g., images).

XXX
by Klaus Strohmenger