Learning non-linear patch embeddings with neural networks for label fusion
Graphical abstract
Introduction
Segmentation of brain structures from magnetic resonance images (MRI) is an important step in many neuroscience applications, including discovery of morphological biomarkers, monitoring disease progression or diagnosis. For example, segmentation is widely used as basic image quantification step in studies of early brain development (Benkarim et al., 2017) and dementia (Chupin, Gerardin, Cuingnet, Boutet, Lemieux, Lehericy, Benali, Garnero, Colliot, 2009, Li, Shi, Pu, Jiang, Xie, Wang, 2007).
Multi-atlas segmentation (MAS) is being increasingly used for segmenting brain MRI (Sanroma et al., 2016). In MAS, a set of atlas images are first registered to the image to be segmented (i.e., target) along with their anatomical labelmaps containing the spatial overlay of the anatomical structures. Then, the so-called label fusion process, labels each target point using the support of the corresponding atlas labels. Compared to using a single atlas, MAS can potentially fit a wider anatomical variability and has higher robustness to registration errors. Image intensities are often not sufficient for globally discriminating the different structures and therefore, spatial constraints are essential (Colliot et al., 2006). Such spatial constraints are usually implemented by restricting the set of feasible labels for each target point to the set of labels in neighboring atlas points.
Patch-based label fusion (PBLF) is a popular approach that computes each target label as a weighted combination of neighboring atlas labels, where atlas locations with higher local image similarity to the to-be-segmented target point have higher weight in the combination (Artaechevarria, Muñoz-Barrutia, de Solorzano, 2009, Coupé, Manjón, Fonov, Pruessner, Robles, Collins, 2011, Wang, Suh, Das, Pluta, Craige, Yushkevich, 2013). Here, the similarity between local image patches around each target and atlas point is taken as a proxy for the local registration accuracy and hence, for anatomical correspondence.
PBLF can potentially be improved by increasing the discriminative capabilities of patch similarity measurements. For example, we proposed to learn discriminative patch embeddings reflecting the latent anatomical similarity between patches (Sanroma et al., 2015a). A similar approach was recently proposed using convolutional neural networks (CNNs) (Yang et al., 2016). Such learned embeddings are then used in standard PBLF. Other supervised approaches for learning optimal fusion rules have been presented. For example, in Sanroma et al. (2015b) we proposed a transductive learning approach, and in Benkarim et al. (2016) we proposed to integrate discriminative learning into probabilistic label fusion. Semi-supervised learning approaches have also been proposed for propagating the anatomical labels from atlases to targets (Guo, Zhang, 2012, Koch, Wright, Vatansever, Kyriakopoulou, Malamateniou, Patkee, Rutheford, Hajnal, Aljabar, Rueckert, 2014). Machine learning techniques such as support vector machines (SVM) (Cortes and Vapnik, 1995) have also been used (Bai, Shi, Ledig, Rueckert, 2015, Hao, Wang, Zhang, Duan, Yu, Jiang, Fan, 2013, Sdika, 2015).
In practice, most of these methods learn a different model (i.e., classifier) at each location (Bai, Shi, Ledig, Rueckert, 2015, Benkarim, Piella, Ballester, Sanroma, 2016, Guo, Zhang, 2012, Hao, Wang, Zhang, Duan, Yu, Jiang, Fan, 2013, Koch, Wright, Vatansever, Kyriakopoulou, Malamateniou, Patkee, Rutheford, Hajnal, Aljabar, Rueckert, 2014, Sanroma, Benkarim, Piella, Wu, Zhu, Shen, Ballester, 2015, Sanroma, Wu, Gao, Thung, Guo, Shen, 2015, Sdika, 2015). This serves two purposes: (1) it implicitly imposes spatial constraints by restricting the training samples on each model to only neighboring atlas locations; and (2) it divides the difficult problem of finding a single global model into the problem of finding multiple simpler local models. However, this increases the complexity of storage and use of the method due to the high number of local models generated, which can easily reach several hundred thousands, even after restricting the modeling to only the most challenging regions. Another difficulty when using local models is that training images must be in spatial correspondence in order to retrieve the training data for each local model. As a result, some methods opt for training the models in a common template space (Sanroma et al., 2015a). This implies that the target image must be segmented in the template space, incurring in interpolation errors when re-sampling the resulting segmentation to the original target space. Moreover, methods that consider pairwise relationships (Benkarim, Piella, Ballester, Sanroma, 2016, Sanroma, Benkarim, Piella, Wu, Zhu, Shen, Ballester, 2015, Yang, Sun, Li, Wang, Xu, 2016) need pairwise registrations among the training images to evaluate the similarity between the embedded patches. This has memory complexity during training, with N being the number of atlases, thus limiting the amount of atlases that can effectively be used for training. A related approach uses convolutional neural networks (CNN) for segmenting cardiac images (Yang et al., 2016). For an input image, they obtain a stack of output images by applying the learned convolutional filters. The number of images in the stack is related to the dimensionality of the output embeddings. Thus, memory requirements for label fusion are where N is the number of atlases and d the dimensionality of the output embedding. This poses serious limitations on the number of atlases at test time (in fact they only use 5 atlases for each target image). In brain MRI segmentation, usually more than 10 atlases are used (Aljabar, Heckemann, Hammers, Hajnal, Rueckert, 2009, Lotjonen, Wolz, Koikkalainen, Thurfjell, Waldemar, Soininen, Rueckert, 2010, Sanroma, Wu, Gao, Shen, 2014).
To overcome these issues, we propose a method to learn discriminative patch embeddings using neural networks,2 with the following contributions:
- •
By incorporating our method into the regular label fusion process, we focus on the problem of learning the model, thus leveraging the capability of the label fusion process of restricting the set of possible labels at each point.
- •
The previous contribution facilitates that we compute a single model per bilateral structure (i.e., one model for both left and right parts of each structure). We take advantage of stochastic gradient descent (SGD) in order to process the vast amounts of data in small mini-batches. Therefore, our method allows for a practical storage and use.
- •
We learn the model in the native space of each training atlas instead of using a template. Therefore, models are learned in the same space as they were annotated, thus avoiding interpolation artifacts during training. Another advantage is that models are orientation-invariant and hence target images can directly be segmented in their native space. As consequence of this, the target anatomy can directly be quantified from the resulting segmentation, without need to correct for geometric distortions caused by the transformation to the template space.
- •
We learn the embeddings using patch relationships within the same image, leading to an attractive storage complexity at training (with N the number of atlases), compared to more costly approaches (Benkarim, Piella, Ballester, Sanroma, 2016, Sanroma, Benkarim, Piella, Wu, Zhu, Shen, Ballester, 2015, Yang, Sun, Li, Wang, Xu, 2016) that require pairwise atlas registrations in this phase.
- •
Our method embeds the image patches independently rather than the whole images. Therefore, we can generate output embeddings of arbitrary dimensionality without compromising the number of atlases that can reasonably be handled (memory requirement at segmentation time is ).
We apply our method to segment the whole hippocampus and the hippocampal subfields (see Section 4), a structure targeted by many studies on psychiatric and neurological disorders (Chupin, Gerardin, Cuingnet, Boutet, Lemieux, Lehericy, Benali, Garnero, Colliot, 2009, Li, Shi, Pu, Jiang, Xie, Wang, 2007). Accurate segmentation methods are required in order to quantify the subtle morphological changes undergone by these structures, especially in the early stages of the disease (Frisoni, Fox, Jack, Scheltens, Thompson, 2010, West, Kawas, Stewart, Rudow, Troncoso, 2004).
In the next section, we introduce multi-atlas segmentation and how it can be improved by using embedding techniques, before describing our method in Section 3.
Section snippets
Multi-atlas segmentation
Let us denote the target image to be segmented and a set of atlas images along with their corresponding labelmaps Yi containing the anatomical information. Multi-atlas segmentation (MAS) aims at estimating the segmentation on the target image using the atlas images and their labelmaps.
This is implemented by (1) registering the atlas images to the target and (2) computing each target label as a combination of locally corresponding atlas labels, the so-called label fusion.
Weighted
Method
We present a new method to learn non-linear patch embeddings using neural networks. The pipeline of the method is shown in Fig. 1. In the training phase, patches are sampled from atlas images near the boundary of the structures (yellow square in Fig. 1(a)). Note that training images are in their native space (i.e., not registered to any template). A training sample is composed of a central patch and neighboring voting patches at both sides of the boundary. Neighboring patches are sampled
Experiments and results
We perform several experiments applying the proposed method for segmenting the whole hippocampus and the hippocampal subfields. We compare the proposed method with the following state-of-the-art PBLF techniques: majority voting (MV) (Heckemann, Hajnal, Aljabar, Rueckert, Hammers, 2006, Rohlfing, Brandt, Menzel, Maurer, 2004), local weighted voting (LWV) (Artaechevarria et al., 2009), non-local weighted voting (NLWV) (Coupé et al., 2011) and Joint label fusion (JOINT) (Wang et al., 2013). It is
Discussion
Results show that the scaling of the similarities used in the Softmax has higher impact on performance than the similarity metric. We argue that one of the reasons for the success of our method lies in the ability of the proposed affine and non-linear transformations to jointly influence the Softmax and the similarity metric in more nuanced ways than a simple scaling. Due to the similar performance of LNCC and negSSD, we conclude that a more important factor than their performance for
Conclusions
We have presented a method for learning discriminative image patch embeddings for optimal PBLF. We applied it to the segmentation of brain structures such as the hippocampus and hippocampal substructures. We used neural networks optimized via stochastic gradient descent (SGD) to learn a single model per bilateral structure. We analyzed the effectiveness of SGD in minimizing the desired objective function. We learned optimal patch embeddings using neighboring patches sampled within the same
Acknowledgments
The first author is co-financed by the Marie Curie FP7-PEOPLE-2012-COFUND Action, Grant agreement no: 600387.
This work is partly supported by the Spanish Ministry of Economy and Competitiveness under the Maria de Maeztu Units of Excellence Programme (MDM-2015-0502).
Part of the data used for this project was funded by the Alzheimer’s Disease Neuroimaging Initiative (ADNI) (National Institutes of Health Grant U01 AG024904) and DOD ADNI (Department of Defense award number W81XWH-12-2-0012).
References (58)
- et al.
Multi-atlas based segmentation of brain images: atlas selection and its effect on accuracy
NeuroImage
(2009) - et al.
Non-local statistical label fusion for multi-atlas segmentation
Med. Image Anal.
(2013) - et al.
Symmetric diffeomorphic image registration with cross-correlation: evaluating automated labeling of elderly and neurodegenerative brain
Med. Image Anal.
(2008) - et al.
Multi-atlas segmentation with augmented features for cardiac MR images
Med. Image Anal.
(2015) - et al.
Training labels for hippocampal segmentation based on the EADC-ADNI harmonized hippocampal protocol
Alzeimer’s Dement.
(2015) - et al.
Integration of fuzzy spatial relations in deformable models – application to brain MRI segmentation
Pattern Recognit.
(2006) - et al.
Patch-based segmentation using expert priors: application to hippocampus and ventricle segmentation
NeuroImage
(2011) - et al.
Unbiased nonlinear average age-appropriate brain templates from birth to adulthood
NeuroImage
(2009) - et al.
Automatic anatomical brain MRI segmentation combining label propagation and decision fusion
NeuroImage
(2006) - et al.
On standardizing the MR image instensity scale
Magn. Reson. Med.
(1999)
Nonlinear dimensionality reduction by locally linear embedding
Science
Enhancing atlas based segmentation with multiclass linear classifiers
Med. Phys.
A learning-based wrapper method to correct systematic errors in automatic image segmentation: consistently improved performance in hippocampus, cortex and brain segmentation
NeuroImage
Simultaneous truth and performance level estimation (STAPLE): an algorithm for the validation of image segmentation
IEEE Trans. Med. Imaging
Graph embedding and extensions: a general framework for dimensionality reduction
Pattern Anal. Mach. Intell.
Deep fusion net for multi-atlas segmentation: application to cardiac MR images
Combination strategies in multi-atlas image segmentation: application to brain MR data
IEEE Trans. Med. Imaging
PatchMatch: a randomized correspondence algorithm for structural image editing
ACM Trans. Graph.
Laplacian eigenmaps for dimensionality reduction and data representation
Neural Comput.
Practical recommendations for gradient-based training of deep architectures
Neural Networks: Tricks of the Trade
Enhanced probabilistic label fusion by estimating label confidences through discriminative learning
Toward the automatic quantification of in utero brain development in 3D structural MRI: a review
Hum. Brain Mapp.
STEPS: Similarity and truth estimation for propagated segmentations and its application to hippocampal segmentation and brain parcelation
Med. Image Anal.
Fully automatic hippocampus segmentation and classification in Alzheimer’s disease and mild cognitive impairment applied on data from ADNI
Hippocampus
Support vector networks
Mach. Learn.
The clinical use of structural MRI in alzheimer disease
Nat. Rev. Neurol.
An optimized PatchMatch for multi-scale and multi-feature label fusion
NeuroImage
Understanding the difficulty of training deep feedforward neural networks
Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics (AISTATS)
Semi-supervised sparse label fusion for multi-atlas based segmentation
Cited by (0)
- 1
Part of the data used in preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu) . As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this report. A complete listing of ADNI investigators can be found at: http://adni.loni.usc.edu/wp-content/uploads/how_to_apply/ADNI_Acknowledgement_List.pdf.