Prediction and Heatmap Generation

Introduction

Now that we have a trained (and saved) model, we can use it to predict the slides of the CAMELYON16 test dataset. From the prediction of the individual tiles, we can build a heatmap of the whole slide, showing the regions, which are predicted to be metastatic. The steps in this notebook can be broken down into:

  • Load the trained model
  • Load CAMEYLON16 test dataset with Slidemanager
  • Get slides with Slidemanager.get_test_slides
  • Get tiles with split_negative_slide
  • Predict the tiles and build the heatmaps
  • Visually compare your heatmaps with the tumor masks (if test slides have metastatic regions)

Note:

Chances are high your model will not be able to produce good enough heatmaps. Therefore in the next notebook you will be offered high quality heatmaps produced by a far superior CNN.

Requirements

Python-Modules

import tensorflow as tf
from tensorflow import keras

import numpy as np
import matplotlib.pyplot as plt
import random
import h5py
import math
from skimage.filters import threshold_otsu

from preprocessing.datamodel import SlideManager
from preprocessing.processing import split_negative_slide, split_positive_slide, create_tumor_mask, rgb2gray, create_otsu_mask_by_threshold
from preprocessing.util import TileMap
from cnn.tissuedataset import TissueDataset

Teaching Content

Evaluation of the CAMELYON16 Challenge

Following the original CAMELYON16 challange, the task would now be, to predict CAMELYON16 test dataset. Back in 2016, the labels were not published to the public. The metrics to evaluate the model were:

1) Receiver operating characteristic (ROC) at slide level and then calculate the are under the ROC curve (AUC).

2) Free-response receiver operating characteristic (FROC) for lesion based evaluation. Briefly, this metric measures, how well the regions in a tumorus slide match the true regions. Also, for each coordinate in the metastatic region, a confidence score had to be submitted.

If you are interested in evaluating your model and see how it would have performed in the CAMELYON16 Challenge you can read more about the evaluation and the scoring at the official CAMELYON16 website

Towards CAMELYON17

Since the labels of the CAMELYON16 challange have already been published it is no longer possible to hand in any results. Therefor we will not go into detail evaluating the model for the CAMELYON16 challange.

Instead we will head straight towards the CAMELYON17 challenge. The second goal of CAMELYON16 (lesion based) also prepares for this. From the confidence score it is straight forward to create a heatmap as prediction for a slide (similar to the tumor mask). These heatmaps can then be used to achieve the goals of the CAMELYON17 challenge, which are:

  • Predict if a slide contains no tumor regions, only isolated tumor cells (ITCs), micro metastasis of macro metastasis.
  • To be able to achieve this, the CAMELYON17 dataset is labeled with 4 different classes.

In the next notebooks, we will use the heatmaps, created with our model, to accomplish this. So the task in this notebook is to create the heatmaps first.

Setting the Paths

Set the paths according the destination where you store the data:

### EDIT THIS CELL:
### Assign the path to your CAMELYON16 data and create the directories
### if they do not exist yet.
CAM_BASE_DIR = '/path/to/CAMELYON/data/'
### Do not edit this cell
CAM16_DIR = CAM_BASE_DIR + 'CAMELYON16/'
GENERATED_DATA = CAM_BASE_DIR + 'tutorial/'
MODEL_FINAL = GENERATED_DATA + 'model_final.hdf5'

# Destination to store the heatmaps which we will create in this notebook
HEATMAPS_CAM16_TESTSET = GENERATED_DATA +'test_set_predictions/'

Loading the Model

First we will load our trained and saved model. Since we did not train the model with an optimizer from the tf.keras package, we will have to recompile it.

# Recreate the exact same model, including weights
model = tf.keras.models.load_model(MODEL_FINAL)

#model.compile(optimizer=tf.train.AdamOptimizer(learning_rate=0.0005), 
#              loss='binary_crossentropy',
#              metrics=['accuracy'])

Reading CAMELYON16 Test Dataset

The main purpose of creating a training dataset as a single HDF5 file was to reduce the time reading the data. This was crucial for training, because we needed to read the same data over and over again while training. Concerning the test dataset, this is not as crucial, because we only need to read every slide once after the training is finished.

So to read the CAMELYON16 test dataset, we can just use the SlideManager class, SlideManager.test_slides attribute and the split_annotated_slides and split_negative_slides methods.

mgr = SlideManager(cam16_dir=CAM16_DIR)

level = 0
tile_size = 256
poi_threshold = 0.9
slide = mgr.get_slide('Test_001')
print(slide)
print(slide.dimensions)
print(slide.level_dimensions[level])

When we pass a test slide as parameter to the method create_tumor_mask, a mask will always be returned. If there exists no annotation xml file (because it is a slide without metastatic regions), the mask will just be blank. This method can be used to manually compare your generated heatmaps with the true tumor area.

mask = create_tumor_mask(slide, level=8)
print(mask.shape)
plt.imshow(mask, cmap='gray')

Exercise

Normalization

Since we trained our model with normalized images, we will also need the mean and the standard deviation of the color channels we used.

Task:

Create both varibles mean_pixel and std_pixel and assign the values by just looking them up in the last notebook.

### Exercise: Look up the corresponding values and save them into variables

Heatmap Generation

Task:

Use your trained model to predict the individual tiles of each slide in the test dataset. From the predictions of your model (values form 0.0 to 1.0) build a heatmap for each slide. It should have the same ratio of width and height as the original slide, but of course with a smaller scale.

Hints:

  • Use split_negative_slides on the test slides to receive the slides (you do not know if it is a tumor or normal slide). For the usage, refer to data-handling-usage-guide.ipynb.
  • When you use overlapping slides, the resoluton of you heatmap will be bigger. E.g. overlap of 128 to double the resolution.
  • Save your created heatmaps as png files (e.g. test_001.png)
  • Optional: Save the original (*.xml files) masks as images so you can compare them with your heatmaps.

Here are examples of some created heatmaps (top: heatmaps. bottom: true masks from xml files in CAMELYON17/test/lesion_annotations/):

Notes:

  • This will take a lot of time (~20+ hours)
  • Generated Heatmaps will probably not be of good quality for different reasons:
  • Low zoom (2 of 0-9, 0 being highest zoom)
  • Only one zoom level used
  • No data augmentation
  • Inferior CNN architecture
  • Weak color normalization
  • Weak hyperparameter optimization
  • So there comes a lot together. But to go through this tutorial you should not need to be a deep learning expert and not need a 1000 euro GPU.
  • If you do not have the time to classify all tiles of all slides, you can just implement the code, run it to produce the first 5-10 heatmaps and proceed with the next notebook.
  • In the next notebook you will be offered some high quality heatmaps, produced with all the missing things, which were mentioned here.
### Exercise. Your code below

Summary and Outlook

So far we have accomplished to ...

  • ... divide our huge data set into smaller pieces (tiles) to be even able to handle it and use it to train model.
  • ... build and train a CNN to predict whether a single tile contains metastatic or normal tissue.
  • ... use our CNN to predict the individual tiles of the slides of the test set.
  • ... put the predictions of a slide together in order to generate a heatmap (or mask), which looks similar to the masks provided.

In the next notebook we will extract geometric features of these heatmaps to train another classifier, which will then be able to predict the tumor class of the slides (negative, itc, micro, macro)

Literature

Licenses

Notebook License (CC-BY-SA 4.0)

The following license applies to the complete notebook, including code cells. It does however not apply to any referenced external media (e.g., images).

XXX
by Klaus Strohmenger
is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Based on a work at https://gitlab.com/deep.TEACHING.

Code License (MIT)

The following license only applies to code cells of the notebook.

Copyright 2018 Klaus Strohmenger

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.