Elsevier

Medical Image Analysis

Volume 44, February 2018, Pages 143-155
Medical Image Analysis

Learning non-linear patch embeddings with neural networks for label fusion

https://doi.org/10.1016/j.media.2017.11.013Get rights and content

Highlights

  • We present a method to improve discriminative abilities of patch-based label fusion.

  • We use neural networks to learn optimal embeddings of image patches.

  • Our method allows for embeddings with different complexities.

  • Our method scales linearly with the number of atlases both in train and test phases.

  • Results show an improvement over standard PBLF even with the simplest embeddings.

Abstract

In brain structural segmentation, multi-atlas strategies are increasingly being used over single-atlas strategies because of their ability to fit a wider anatomical variability. Patch-based label fusion (PBLF) is a type of such multi-atlas approaches that labels each target point as a weighted combination of neighboring atlas labels, where atlas points with higher local similarity to the target contribute more strongly to label fusion. PBLF can be potentially improved by increasing the discriminative capabilities of the local image similarity measurements. We propose a framework to compute patch embeddings using neural networks so as to increase discriminative abilities of similarity-based weighted voting in PBLF. As particular cases, our framework includes embeddings with different complexities, namely, a simple scaling, an affine transformation, and non-linear transformations. We compare our method with state-of-the-art alternatives in whole hippocampus and hippocampal subfields segmentation experiments using publicly available datasets. Results show that even the simplest versions of our method outperform standard PBLF, thus evidencing the benefits of discriminative learning. More complex transformation models tended to achieve better results than simpler ones, obtaining a considerable increase in average Dice score compared to standard PBLF.

Introduction

Segmentation of brain structures from magnetic resonance images (MRI) is an important step in many neuroscience applications, including discovery of morphological biomarkers, monitoring disease progression or diagnosis. For example, segmentation is widely used as basic image quantification step in studies of early brain development (Benkarim et al., 2017) and dementia (Chupin, Gerardin, Cuingnet, Boutet, Lemieux, Lehericy, Benali, Garnero, Colliot, 2009, Li, Shi, Pu, Jiang, Xie, Wang, 2007).

Multi-atlas segmentation (MAS) is being increasingly used for segmenting brain MRI (Sanroma et al., 2016). In MAS, a set of atlas images are first registered to the image to be segmented (i.e., target) along with their anatomical labelmaps containing the spatial overlay of the anatomical structures. Then, the so-called label fusion process, labels each target point using the support of the corresponding atlas labels. Compared to using a single atlas, MAS can potentially fit a wider anatomical variability and has higher robustness to registration errors. Image intensities are often not sufficient for globally discriminating the different structures and therefore, spatial constraints are essential (Colliot et al., 2006). Such spatial constraints are usually implemented by restricting the set of feasible labels for each target point to the set of labels in neighboring atlas points.

Patch-based label fusion (PBLF) is a popular approach that computes each target label as a weighted combination of neighboring atlas labels, where atlas locations with higher local image similarity to the to-be-segmented target point have higher weight in the combination (Artaechevarria, Muñoz-Barrutia, de Solorzano, 2009, Coupé, Manjón, Fonov, Pruessner, Robles, Collins, 2011, Wang, Suh, Das, Pluta, Craige, Yushkevich, 2013). Here, the similarity between local image patches around each target and atlas point is taken as a proxy for the local registration accuracy and hence, for anatomical correspondence.

PBLF can potentially be improved by increasing the discriminative capabilities of patch similarity measurements. For example, we proposed to learn discriminative patch embeddings reflecting the latent anatomical similarity between patches (Sanroma et al., 2015a). A similar approach was recently proposed using convolutional neural networks (CNNs) (Yang et al., 2016). Such learned embeddings are then used in standard PBLF. Other supervised approaches for learning optimal fusion rules have been presented. For example, in Sanroma et al. (2015b) we proposed a transductive learning approach, and in Benkarim et al. (2016) we proposed to integrate discriminative learning into probabilistic label fusion. Semi-supervised learning approaches have also been proposed for propagating the anatomical labels from atlases to targets (Guo, Zhang, 2012, Koch, Wright, Vatansever, Kyriakopoulou, Malamateniou, Patkee, Rutheford, Hajnal, Aljabar, Rueckert, 2014). Machine learning techniques such as support vector machines (SVM) (Cortes and Vapnik, 1995) have also been used (Bai, Shi, Ledig, Rueckert, 2015, Hao, Wang, Zhang, Duan, Yu, Jiang, Fan, 2013, Sdika, 2015).

In practice, most of these methods learn a different model (i.e., classifier) at each location (Bai, Shi, Ledig, Rueckert, 2015, Benkarim, Piella, Ballester, Sanroma, 2016, Guo, Zhang, 2012, Hao, Wang, Zhang, Duan, Yu, Jiang, Fan, 2013, Koch, Wright, Vatansever, Kyriakopoulou, Malamateniou, Patkee, Rutheford, Hajnal, Aljabar, Rueckert, 2014, Sanroma, Benkarim, Piella, Wu, Zhu, Shen, Ballester, 2015, Sanroma, Wu, Gao, Thung, Guo, Shen, 2015, Sdika, 2015). This serves two purposes: (1) it implicitly imposes spatial constraints by restricting the training samples on each model to only neighboring atlas locations; and (2) it divides the difficult problem of finding a single global model into the problem of finding multiple simpler local models. However, this increases the complexity of storage and use of the method due to the high number of local models generated, which can easily reach several hundred thousands, even after restricting the modeling to only the most challenging regions. Another difficulty when using local models is that training images must be in spatial correspondence in order to retrieve the training data for each local model. As a result, some methods opt for training the models in a common template space (Sanroma et al., 2015a). This implies that the target image must be segmented in the template space, incurring in interpolation errors when re-sampling the resulting segmentation to the original target space. Moreover, methods that consider pairwise relationships (Benkarim, Piella, Ballester, Sanroma, 2016, Sanroma, Benkarim, Piella, Wu, Zhu, Shen, Ballester, 2015, Yang, Sun, Li, Wang, Xu, 2016) need pairwise registrations among the training images to evaluate the similarity between the embedded patches. This has memory complexity O(N2) during training, with N being the number of atlases, thus limiting the amount of atlases that can effectively be used for training. A related approach uses convolutional neural networks (CNN) for segmenting cardiac images (Yang et al., 2016). For an input image, they obtain a stack of output images by applying the learned convolutional filters. The number of images in the stack is related to the dimensionality of the output embeddings. Thus, memory requirements for label fusion are O(N×d), where N is the number of atlases and d the dimensionality of the output embedding. This poses serious limitations on the number of atlases at test time (in fact they only use 5 atlases for each target image). In brain MRI segmentation, usually more than 10 atlases are used (Aljabar, Heckemann, Hammers, Hajnal, Rueckert, 2009, Lotjonen, Wolz, Koikkalainen, Thurfjell, Waldemar, Soininen, Rueckert, 2010, Sanroma, Wu, Gao, Shen, 2014).

To overcome these issues, we propose a method to learn discriminative patch embeddings using neural networks,2 with the following contributions:

  • By incorporating our method into the regular label fusion process, we focus on the problem of learning the model, thus leveraging the capability of the label fusion process of restricting the set of possible labels at each point.

  • The previous contribution facilitates that we compute a single model per bilateral structure (i.e., one model for both left and right parts of each structure). We take advantage of stochastic gradient descent (SGD) in order to process the vast amounts of data in small mini-batches. Therefore, our method allows for a practical storage and use.

  • We learn the model in the native space of each training atlas instead of using a template. Therefore, models are learned in the same space as they were annotated, thus avoiding interpolation artifacts during training. Another advantage is that models are orientation-invariant and hence target images can directly be segmented in their native space. As consequence of this, the target anatomy can directly be quantified from the resulting segmentation, without need to correct for geometric distortions caused by the transformation to the template space.

  • We learn the embeddings using patch relationships within the same image, leading to an attractive O(N) storage complexity at training (with N the number of atlases), compared to more costly approaches (Benkarim, Piella, Ballester, Sanroma, 2016, Sanroma, Benkarim, Piella, Wu, Zhu, Shen, Ballester, 2015, Yang, Sun, Li, Wang, Xu, 2016) that require pairwise atlas registrations in this phase.

  • Our method embeds the image patches independently rather than the whole images. Therefore, we can generate output embeddings of arbitrary dimensionality without compromising the number of atlases that can reasonably be handled (memory requirement at segmentation time is O(N)).

We apply our method to segment the whole hippocampus and the hippocampal subfields (see Section 4), a structure targeted by many studies on psychiatric and neurological disorders (Chupin, Gerardin, Cuingnet, Boutet, Lemieux, Lehericy, Benali, Garnero, Colliot, 2009, Li, Shi, Pu, Jiang, Xie, Wang, 2007). Accurate segmentation methods are required in order to quantify the subtle morphological changes undergone by these structures, especially in the early stages of the disease (Frisoni, Fox, Jack, Scheltens, Thompson, 2010, West, Kawas, Stewart, Rudow, Troncoso, 2004).

In the next section, we introduce multi-atlas segmentation and how it can be improved by using embedding techniques, before describing our method in Section 3.

Section snippets

Multi-atlas segmentation

Let us denote X^ the target image to be segmented and Xi,i=1,,N a set of atlas images along with their corresponding labelmaps Yi containing the anatomical information. Multi-atlas segmentation (MAS) aims at estimating the segmentation on the target image using the atlas images and their labelmaps.

This is implemented by (1) registering the atlas images to the target and (2) computing each target label as a combination of locally corresponding atlas labels, the so-called label fusion.

Weighted

Method

We present a new method to learn non-linear patch embeddings z=f(x) using neural networks. The pipeline of the method is shown in Fig. 1. In the training phase, patches are sampled from atlas images near the boundary of the structures (yellow square in Fig. 1(a)). Note that training images are in their native space (i.e., not registered to any template). A training sample is composed of a central patch and neighboring voting patches at both sides of the boundary. Neighboring patches are sampled

Experiments and results

We perform several experiments applying the proposed method for segmenting the whole hippocampus and the hippocampal subfields. We compare the proposed method with the following state-of-the-art PBLF techniques: majority voting (MV) (Heckemann, Hajnal, Aljabar, Rueckert, Hammers, 2006, Rohlfing, Brandt, Menzel, Maurer, 2004), local weighted voting (LWV) (Artaechevarria et al., 2009), non-local weighted voting (NLWV) (Coupé et al., 2011) and Joint label fusion (JOINT) (Wang et al., 2013). It is

Discussion

Results show that the scaling of the similarities used in the Softmax has higher impact on performance than the similarity metric. We argue that one of the reasons for the success of our method lies in the ability of the proposed affine and non-linear transformations to jointly influence the Softmax and the similarity metric in more nuanced ways than a simple scaling. Due to the similar performance of LNCC and negSSD, we conclude that a more important factor than their performance for

Conclusions

We have presented a method for learning discriminative image patch embeddings for optimal PBLF. We applied it to the segmentation of brain structures such as the hippocampus and hippocampal substructures. We used neural networks optimized via stochastic gradient descent (SGD) to learn a single model per bilateral structure. We analyzed the effectiveness of SGD in minimizing the desired objective function. We learned optimal patch embeddings using neighboring patches sampled within the same

Acknowledgments

The first author is co-financed by the Marie Curie FP7-PEOPLE-2012-COFUND Action, Grant agreement no: 600387.

This work is partly supported by the Spanish Ministry of Economy and Competitiveness under the Maria de Maeztu Units of Excellence Programme (MDM-2015-0502).

Part of the data used for this project was funded by the Alzheimer’s Disease Neuroimaging Initiative (ADNI) (National Institutes of Health Grant U01 AG024904) and DOD ADNI (Department of Defense award number W81XWH-12-2-0012).

References (58)

  • S.T. Roweis et al.

    Nonlinear dimensionality reduction by locally linear embedding

    Science

    (2000)
  • M. Sdika

    Enhancing atlas based segmentation with multiclass linear classifiers

    Med. Phys.

    (2015)
  • WangH. et al.

    A learning-based wrapper method to correct systematic errors in automatic image segmentation: consistently improved performance in hippocampus, cortex and brain segmentation

    NeuroImage

    (2011)
  • S.K. Warfield et al.

    Simultaneous truth and performance level estimation (STAPLE): an algorithm for the validation of image segmentation

    IEEE Trans. Med. Imaging

    (2004)
  • YanS.C. et al.

    Graph embedding and extensions: a general framework for dimensionality reduction

    Pattern Anal. Mach. Intell.

    (2007)
  • YangH. et al.

    Deep fusion net for multi-atlas segmentation: application to cardiac MR images

  • X. Artaechevarria et al.

    Combination strategies in multi-atlas image segmentation: application to brain MR data

    IEEE Trans. Med. Imaging

    (2009)
  • C. Barnes et al.

    PatchMatch: a randomized correspondence algorithm for structural image editing

    ACM Trans. Graph.

    (2009)
  • M. Belkin et al.

    Laplacian eigenmaps for dimensionality reduction and data representation

    Neural Comput.

    (2003)
  • Y. Bengio

    Practical recommendations for gradient-based training of deep architectures

    Neural Networks: Tricks of the Trade

    (2012)
  • O.M. Benkarim et al.

    Enhanced probabilistic label fusion by estimating label confidences through discriminative learning

  • O.M. Benkarim et al.

    Toward the automatic quantification of in utero brain development in 3D structural MRI: a review

    Hum. Brain Mapp.

    (2017)
  • M.J. Cardoso et al.

    STEPS: Similarity and truth estimation for propagated segmentations and its application to hippocampal segmentation and brain parcelation

    Med. Image Anal.

    (2013)
  • M. Chupin et al.

    Fully automatic hippocampus segmentation and classification in Alzheimer’s disease and mild cognitive impairment applied on data from ADNI

    Hippocampus

    (2009)
  • C. Cortes et al.

    Support vector networks

    Mach. Learn.

    (1995)
  • G.B. Frisoni et al.

    The clinical use of structural MRI in alzheimer disease

    Nat. Rev. Neurol.

    (2010)
  • R. Giraud et al.

    An optimized PatchMatch for multi-scale and multi-feature label fusion

    NeuroImage

    (2015)
  • X. Glorot et al.

    Understanding the difficulty of training deep feedforward neural networks

    Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics (AISTATS)

    (2010)
  • GuoQ. et al.

    Semi-supervised sparse label fusion for multi-atlas based segmentation

  • Cited by (0)

    1

    Part of the data used in preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu) . As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this report. A complete listing of ADNI investigators can be found at: http://adni.loni.usc.edu/wp-content/uploads/how_to_apply/ADNI_Acknowledgement_List.pdf.

    View full text