Springer Fachmedien Wiesbaden, 2016. — 80 p. — ISBN10: 365812878X
Manuel Kroiss examines the differentiation of hematopoietic stem cells using machine learning methods. This work is based on experiments focusing on the lineage choice of CMPs, the progenitors of HSCs, which either become MEP or GMP cells. The author presents a novel approach to distinguish MEP from GMP cells using machine learning on morphology features extracted from bright field images. He tests the performance of different models and focuses on Recurrent Neural Networks with the latest advances from the field of deep learning. Two different improvements to recurrent networks were tested: Long Short Term Memory (LSTM) cells that are able to remember information over long periods of time, and dropout regularization to prevent overfitting. With his method, Manuel Kroiss considerably outperforms standard machine learning methods without time information like Random Forests and Support Vector Machines
Publication in the Field of Organic Chemistry
Introduction to deep neural networks
Using RNNs to predict the lineage choice of stem cells 4 Discussion
FiguresHierarchy of hematopoietic stem cells
Example of a bright field image from long-term microscopy
Labeled tracking tree of HSC differentiation to MEP or GMP
Structure of a Perceptron
Sigmoid activation function
Limitation of a perceptron
Structure of a Multilayer Perceptron
Structure of a Recurrent Neural Network
Problem of the vanishing gradient for RNNs
Structure of a Long Short Term Memory cell
Bidirectional RNN unfolded in time
Example of early stopping to prevent overfitting
Example of dropout with one hidden layer
Histogram of cell cycle length of the 120602PH5 movie
Backpropagation of labels
Segmented bright field images of an exemplary cell over one cycle
Exemplary features of a cell over one cycle with smoothing
RNN with dropout unfolded in time
Deep bidirectional LSTM with two recurrent layers each
Comparison of the classifier performance on set A
Performance over time with morphology or PU.1 on set B
Performance over time for unlabeled cells using morphology
Performance over time for unlabeled cells using only PU.1
TablesOptimal initial parameter values for RNNs
Different optimization methods depending on the size of the data set
The four time-lapse movies used in this thesis and their cell counts
Results of classifiers on unseen test sets from different movies
Training time on GeForce GTX 570 of different RNN models
Feature importance using DB LSTM on set B