Status: Draft

Medical Image Classification Scenario

This project contains Python Jupyter notebooks to teach machine learning content in the context of medical data, e.g., automated tumor detection. The material focuses primarily on teaching basic knowledge of convolutional neural networks but also contains portions of fundamental machine learning knowledge.

1) Scenario Description

To determine the exact state of breast cancer and therefore subsequent therapy decisions, it is essential to analyze a patient’s tissue samples under a microscope. Examining such tissue slides is a complex task that requires years of training and expertise in a specific area by a pathologist. But scientific studies show that even in a group of highly experienced experts there can be substantial variability in the diagnoses for the same patient, which indicates the possibility of misdiagnosis [ELM15][ROB95]. This result is not surprising given the enormous amount of details in a tissue slide at 40x magnification. To get a sense of the amount of data, imagine that an average digitized tissue slide has a size of 200.000x100.000 pixel and you have to inspect every one of them to get an accurate diagnose. Needless to say, if you have to examine multiple slides per patient and have several patients, this is a lot of data to cover in a usually limited amount of diagnosis time. Following image depicts a scanned tissue slide (whole slide image, short WSI) at different magnification level.

WSI example

Under such circumstances, an automated detection algorithm can naturally complement the pathologists’ work process to enhance the possibility of an accurate diagnosis. Such algorithms have been successfully developed in the scientific field in recent years, in particular, models based on Convolutional Neural Networks [WAN16][LIU17]. But getting enough data to train machine learning algorithms is still a challenge in the medical context. However, the Radboud University Medical Center (Nijmegen, the Netherlands) and the University Medical Center Utrecht (Utrecht, the Netherlands) provide an extensive dataset containing sentinel lymph nodes of breast cancer patients in the context of their CAMELYON16 challenge. These data provide a good starting point for further scientific investigations and are therefore mainly used in that scenario. You can get the data at CAMELYON17 challenge (GoogleDrive/Baidu).

In context of the medical scenario you will develop a custom classifier for the given medical dataset that will decide:

  1. If a lymph node tissue contains metastases.
  2. What kinds of metastases are present, e.g., micro- or macro-metastasis?
  3. Which pN stage the patient is in based on the TNM staging system?

2) Teaching Material

Detection of metastases is a classification problem. To solve this first issue, you will implement a classification pipeline based on Convolutional Neural Network (CNN)[LEC98]. Further, you will extend that pipeline with classical machine learning approaches, like decision trees, to address the second and third issue.

We divide the teaching material in that scenario into WSI preprocessing (global operations, like i/o handling of WSI data, dividing WSIs into smaller tiles), Tile Classification (predicting whether a tile contains metastases or not), WSI postprocessing (building heatmaps, extract features, preparation for further classification) and WSI classification (machine learning approaches to solve our scenario issues) in order to make it modular and reusable.

The deep.TEACHING project provides educational material for students to gain basic knowledge about the problem domain, the programming, math and statistics requirements, as well as the mentioned algorithms and their evaluation. Students will also learn how to construct complex machine learning systems, which can incorporate several algorithms at once.

3) Requirements

ATTENTION: The first half of the notebooks, which are about WSI preprocessing and Tile Classification (4.3.1) Tiling - (4.3.4) Heatmap creation) require at least 3.5 tera bytes space for the data set, and a fairly strong GPU (at least Geforce 1070 recommended), and a lot of time to run the processes.

If you cannot provide these hardware requirements, you can directly jump to 4.3.5) Feature Extraction, where you will be provided with a sample solution to go on from them. All subsequent notebooks do not require a GPU or anymore than several hundred mega bytes.

If you decide to directly start with 4.3.5) Feature Extraction, we still suggest to read through everything on this page, as it will provide you with the big picture of the data processing pipeline.

3.1) Openslide

To run the first half of the notebooks, you will also need to install openslide, which is needed to read the WSIs tif format.

Manually on Ubuntu 18.04 (Tested):

  • Go to the openslide download page and download the tar.gz

    • This tutorial uses openslide version 3.4.1, which is confirmed to work (2018-11-02).
  • Install the following system packages. It is suggested to install the newest versions with:

    sudo apt-get install libjpeg-dev libtiff-dev libglib2.0-dev libcairo2-dev ibgdk-pixbuf2.0-dev libxml2-dev libsqlite3-dev valgrind zlib1g-dev libopenjp2-tools libopenjp2-7-dev python3-dev
  • However, if installation fails, the following versions are confirmed to work with openslide version 3.4.1:

    sudo apt-get install libjpeg-dev=8c-2ubuntu8 libtiff-dev=4.0.9-5 libglib2.0-dev=2.56.2-0ubuntu0.18.04.2 libcairo2-dev=1.15.10-2 ibgdk-pixbuf2.0-dev=2.36.11-2 libxml2-dev=2.9.4+dfsg1-6.1ubuntu1.2 libsqlite3-dev=3.22.0-1 valgrind=1:3.13.0-2ubuntu2.1 zlib1g-dev=1:1.2.11.dfsg-0ubuntu2 libopenjp2-tools=2.3.0-1 libopenjp2-tools=2.3.0-1 libopenjp2-7-dev=2.3.0-1 python3-dev
  • Unpack the openslide tar.gz file and inside the unpacked folder execute the following (excerpt from the README.txt):

    ./configure
    make
    make install
  • Finally add the following to the end of your ~/.bashrc

    ########## OpenSlide START ############
    LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/local/lib
    export LD_LIBRARY_PATH
    ########## OpenSlide END ##############

3.2) Python Packages

To run the notebooks navigate to “educational-materials/medical-image-classification” and execute the following commands:

# Create a new virtual environment for this course and install dependencies from Pipfile.lock
pipenv install
# Create an ipython kernel for the virtual environment
pipenv run ipython kernel install --user --name medical_image_classification
# When opening a notebook with Jupyter Lab, select medical_image_classification (upper right corner)

It is possible that some packages do not work properly when loaded within an ipython-kernel referring to a virtual environment (medical_image_classification). In this case, install these packages for your user via pip:

pip3 install --user progress
pip3 install --user scikit-image

4) Teaching Content and Exercises

Read this section for an overview about the medical brackground, the data and the workflow covered in the notebooks. At the end of a subsection, exercise-notebooks are linked:

  • There are exercise-notebooks, which work with the original medical data (marked with specific). These exercise-notebooks are meant to be completed in chronological order, since one might require the results of the preceding exercise.
  • There also exist more generic notebooks which are about the same machine-learning technique, but work with artificial data and can be completed as stand-alone exercises. These notebooks are marked with generic.

4.1) Medical Background

To determine the exact state of breast cancer and therefore subsequent therapy decisions, it is essential to analyze a patient’s tissue samples under a microscope. The tissue was extracted from lymph nodes of the breast region. Examining lymph nodes is crucial in case of breast cancer, as they are the first place breast cancer is likely to spread to. The tissue is fixed between two small glass plates, which is called slide. After digitizing of those slides, we call them whole-slide-images (WSIs).

4.1.1) pN-Stage

How badly and how many lymph nodes are infested determines the pN-stage of the patient. In other words, the pN-stage quantifies the infestation of the lymph nodes.

The stage pN2 for example means, that metastases were found in at least 4 nodes, of which at least one is a macro-metastasis. The CAMELYON17 challenge distinguishes five different pN-stages. For a detailed description see the evaluation section of the CAMELYON17 website.

4.1.2) Slide Labels

A slide can contain no tumorous tissue at all (class label negative), contain only a very small tumorous area (isolated tumor cells, short itc), a small to medium area (micro metastases) or a bigger area of tumorous tissue (macro metastses). See also evaluation section.

4.2) Data Set

The data set we are going to use is the CAMELYON data set. It is divided into two sub data sets. See also the data section of the CAMELYON website.

4.2.1) CAMELYON16

The training set contains 270 WSIs. They are not labeled with negative, itc, micro or macro. Their labels are only positive (contains metastases) or negative. Additionally xml files are provided for positive slides, which contain coordinates for polygons to describe the metastatic regions.

The test set contains 130 WSIs, also with xml files. At the beginning of the CAMELYON17 challenge, the WSIs of the test set additionally received the labels negative, itc, micro or macro, though no examples for the itc class are included.

4.2.2) CAMELYON17

The training and the test set contain each 100 patients. One patient consists of 5 WSIs, which are labeled with negative, itc, micro or macro, but only the labels of the training set are publicly available.

4.3) Data Processing Pipeline

With an average size of 200.000 x 100.000 pixels per WSI at the highest zoom level, it is impossible to directly train a CNN to predict the labels negative, itc, micro or macro. Therefore, the problem has to be divided into sub tasks. At the end of a sub task, the jupyter notebook containing the corresponding exercise is linked.

4.3.1) Tiling

The WSIs are divided into smaller pieces (tiles) with a fixed size, e.g. 256 x 256 pixels. Each tiles is labeled with positive or negative.

4.3.2) Color Normalization

There exist two reasons, why color normalization is crucial:

  • The chemicals used to colorize a WSI (staining process) can slightly differ resulting in very different colors.
  • Different slide scanners used to digitize the slides. Below you can see parts of two different slides, varying greatly in color.

internet connection needed

4.3.2) Data Augmentation

There are a lot more negative tiles than positive ones. And as always in the context of machine learning, more data to train on is never bad. Common data augmentation techniques are mirroring, rotation, cropping and adding noise.

4.3.3) Tile Classification

With normalized and augmented tiles, we can train a CNN to predict whether a tile contains metastases or not.

For an in-depth understanding about convolutional neural networks, we recommend working through the course Convolutional Neural Network.

4.3.4) Heatmap creation

The output of the CNN is a confidence score from 0.0 to 1.0, whether a tile contains metastases. The score of the individual tiles of a WSI can then be used to create a confidence map (or heatmap). Here are some examples for such heatmaps. The brighter a pixel the higher the confidence for metastates. White being very high confidence.

internet connection needed

4.3.5) Feature Extraction

Unfortunately the training data here is very limited as we have only 500 slides from the CAMELYON17 training set and 130 labled slides from the CAMELYON16 test set (labled with negative, itc, micro, macro). So opposed to the task of the CAMELYON16 challenge where we had thousands of tiles and only two different labels (normal and tumor), we will not be able to supply another CNN model with sufficient data. Even worse, for the itc class, we only have 35 examples in total.

To tackle this problem, our approach is to make use of domain specific knowledge and extract geoemtrical features from the heatmaps, which can be used to train a less complex model, like a decision tree. Possible features are:

  1. Highest probability (value) on the heatmap (red)
  2. Average probability on the heatmp. Sum all values and divide by the number of values $\gt 0.0$ (green)
  3. Number of pixels after thresholding (pink)
  4. Length of the larger side of the biggest object after thresholding (orange)
  5. Length of the smaller side of the biggest object after thresholding (yellow)
  6. Number of pixels of the biggest object after thresholding (blue)
internet connection needed

4.3.6) WSI Classification

With the extracted features we can train another classifier to predict whether a WSI contains no metastases (negative), only small area of metastases (itc), medium sized metastases (micro) or a bigger region (macro). A possible classifier for this task would be a decision tree, a random forest, a support vector machine, naive bayes or even a multi layer perceptron (fully connected feed forward network).

4.3.7) pN-Stage Classification

When all 5 slides of a patient are predicted with negative, itc, micro or macro, we can classify the patient’s pN-stage by just applying the simple rules found at the evaluation section of the CAMELYON17 website. After that you can calculate the score and compare your results with the official submission on the CAMELYON challenge results page. However, note that your results are based on the training set, whereas the official submissions are based on the CAMEYLON17 test set. If you apply your classifier on the test set, expect like 5-10% lower kappa score.

Reference (ISO 690)

[ELM15] ELMORE, Joann G., et al. Diagnostic concordance among pathologists interpreting breast biopsy specimens. Jama, 2015, 313. Jg., Nr. 11, S. 1122-1132.
[LEC98] LeCun, Yann, et al. "Gradient-based learning applied to document recognition." Proceedings of the IEEE 86.11 (1998): 2278-2324.
[LIU17] LIU, Yun, et al. Detecting cancer metastases on gigapixel pathology images. arXiv preprint arXiv:1703.02442, 2017.
[ROB95] ROBBINS, P., et al. Histological grading of breast carcinomas: a study of interobserver agreement. Human pathology, 1995, 26. Jg., Nr. 8, S. 873-879.
[WAN16] WANG, Dayong, et al. Deep learning for identifying metastatic breast cancer. arXiv preprint arXiv:1606.05718, 2016.