Advertisement

Concept for Automatic Multi-object Organ Detection and Segmentation in Abdominal CT Data

Research Article | DOI: https://doi.org/10.31579/2834-8761/001

Concept for Automatic Multi-object Organ Detection and Segmentation in Abdominal CT Data

  • Nico Zettler 1
  • Andre Mastmeyer 1*

1 Digital HealthManagement, Hochschule Aalen, Germany.

*Corresponding Author: Andre Mastmeyer, Digital Health Management, Hochschule Aalen, Germany.

Citation: Nico Zettler and Andre Mastmeyer, (2022), Concept for Automatic Multi-object Organ Detection and Segmentation in Abdominal CT Data, Clinical Endocrinology and Metabolism, 1(1); DOI: 10.31579/2834-8761/001

Copyright: © 2022, Andre Mastmeyer, this is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Received: 05 August 2022 | Accepted: 15 September 2022 | Published: 20 October 2022

Keywords: 3D patient models; abdominal organs; liver; kidneys; spleen; pancreas

Abstract

The ability to generate 3D patient models in a fast and reliable way, is of great importance, e.g. for the simulation of liver punctures in a virtual reality simulation. The aim is to automatically detect and segment abdominal structures in CT scans. In particular among the selected organ group, the pancreas poses a challenge. We use a combination of random regression forests and U-Nets to detect bounding boxes and generate segmentation masks for five abdominal organs (liver, kidneys, spleen, and pancreas). Proof of concept training and testing was carried out on 50 CT scans from various public sources. Preliminary results showed Dice coefficients of up to 0.71. The proposed method can theoretically be used for any anatomical structure, as long as sufficient training data is available.

Introduction

Introduction

Virtual reality (VR) based simulation of interventions for training and planning is slowly gaining importance in clinical teaching and routine. VR methods can be used for various tasks, ranging from training scenarios for the medical student and staff, to individual patient related sim- ulations of planned operations [Mastmeyer et al., 2016a], [Mastmeyer et al., 2016b], [Mastmeyer et al., 2017], [Mastmeyer et al., 2018]. The necessary individual patient models should be available fast and be accurate to guarantee a plausible simulation.

The first step in producing patient models is to acquire patient image data. This data will be used for the following steps in the modelling process. It is desirable to use high-quality image data, because inaccuracies will be carried over to the resulting model. The next step is the coarse localisation of the organs inside organ-specific volumes of interest (VOI) to simplify their subsequent segmentation.

The aim of our study, is to automatically detect bounding boxes for organs in abdominal CT data by a learning from example method. The detected bounding boxes are then used for segmentation. Detection of the abdominal organs can be a challenge, because of the variety of shapes these organs can have. The intensity  based features of these organs are also challenging, because the intensity range of neighbouring organs often overlaps. We detected five abdominal organs (liver, right kidney, left kidney, pancreas, spleen), that are commonly used in simulations.

We used a random-forest method, that is developed by Criminisi et al. [Criminisi et al., 2013a] to automatically detect the bounding boxes of abdominal organs. The method uses random-regression-forests to predict the location of organ bounding boxes in CT data. Furthermore, we used the U-Net, proposed by Ronneberger et al. [Ronneberger et al., 2015], to automatically segment organs. The training and testing are carried out on a database of 50 abdominal CT scans, that had to be segmented beforehand.

Material and Methods

The database, that is used for training and testing, consists of 50 CT scans from three different sources: SLIVER071, LiTS2 and VISCERAL3. The data sets contain only abdominal CT data. The sets also vary in quality and field of view, to capture the variability of organs and scans. The scans are not only different in the number of slices (ranging from 64 to 861), thickness of slices (1-5 mm) and field- of-view, but also in image noise. This variability in the datasets is important, to ensure robustness vs. typical inference factors of clinical image data during training and application. All five target organs (liver, pancreas, left kidney, right kidney, spleen) are included in the scans.

Only a small fraction of the segmentation maps contained all five target organs, thus manual structure segmentation was necessary frequently.

1http://sliver07.org 
2https://competitions.codalab.orgcompetitions/17094 
3http://visceral.eu/closed-benchmarks/anatomy3  

1.1 Definition of ground truth bounding boxes

The segmentation maps associated to the CT scans are the basis for the extraction of ground truth bounding boxes to be learned. A bounding box can be described as a cubical polyhedron that completely encloses an object.

The corresponding organ bounding boxes of each of the scans can be created by scanning the segmentation maps for coordinate direction extremes of label occurrence. To create a three-dimensional bounding box, for each coordinate direction (x, y, z), we iterate slice wise through the segmentation map and save the extreme limits with an object label.

1.2 Training of the models

Our concept is composed of two different machine learning approaches,

(1) bounds detection and (2) CNNs U-Nets for segmentation. E.g. here, the random regression forests (RRF) are used to detect organ bounding boxes in CT data, while other approached are ready for usage [Kern and Mastmeyer, 2022]. The U-Nets make use of the detected bounding boxes and segment the contained organs.

1.2.1 Training random regression forests for bounding box detection

The decision trees used in RRFs split by minimising variance, then each leaf node outputs the mean of all label values in the node. We use RRFs to determine the location and extent of abdominal organs [Criminisi et al., 2013a]. The main difference between random classification forests and RRFs, is the type of output that is predicted. While classification forests try to categorize objects, regression forests predict continuous values. Regression forests partition the data into manageable chunks to predict average values. As seen in Fig. 1, the method expects CT scans and ground truth bounding boxes as input.

Figure 1: Training of a RRF: The inputs for the training process are CT scans and ground truth bounding boxes of the targeted organs. We create one feature vector and one off- set vector for each voxel that is part of a predefined medial cylinder subset in the scan [Criminisi et al., 2013a]. The trained RRF is able to predict the offset between a voxel and an organ bounding box. 

A three-dimensional bounding box bc of an organ c can be described by using a six-dimensional vector bc = (bLeft, bRight, bAnterior, bPosterior, bHead, bFoot) with co-ordinates in mm [Criminisi et al., 2013a]. We run over all voxels p = (xp, yp, zp), which are within a specified radial distance (r = 5 cm) from the scan medial axis. The distance d between such a voxel and each of the bounding box walls, can be calculated by using d(p) = (xP − xLeft, xP − xRight, yP − yAnterior, yP − yPosterior, zP − zHead, zP − zFoot) and is saved as the offset-vector to be learned. In contrast to Criminisi et al. [Criminisi et al., 2013a], we use only 50 feature boxes, that are evenly distributed on three spheres (r = 5 cm, 2.5 cm, 1.25 cm) to generate the input feature vector. The feature boxes Fj are intended to capture the spatial and intensity context of the current voxel. For this purpose, the mean intensities of the feature boxes are calculated and saved in the feature vector. An example feature box is shown in Fig. 2. While training, the RRF learns

Figure 2: Example feature box: The feature box Fj is generated in correlation to the current voxel and calculated the mean value of a 3D image section [Criminisi et al., 2013b]. the distance vector (later: output) to the reference bounding box using the feature boxes.

1.2.2    Training of a U-Net for semantic segmentation

The training data for our U-Net consists of the expert segmentations and ground truth bounding boxes, as schematised in Fig. 3. The bounding boxes (VOIs) are then used to locally extract the intensity and label data from the CT scans and their corresponding label maps. As input, the U-Net receives a VOI from the intensity data, while the same region within the label data is connected to the output. We use the U-Net architecture proposed by Ronneberger et al. [Ronneberger et al., 2015], which consists of nine layers with four down- and up-scale steps. The network was trained using batches of size 15 over 50 epochs. In addition, Adam optimization and a cross entropy loss function were used. We trained one U-Net for each organ, using a ReLU activation function.

Figure 3: Training of a U-Net: The inputs for the training process are ground truth bounding boxes (BB), CT scans and their correspondingsegmentation maps. The box is applied to the CT- and segmentation data to extract the relevant image region. Inside the organ VOIs, the segmentation is learned. The process results in an organ-wise U-Net, that can segment image regions.

1.3 Application of the models

In the first step, the organ specific RRFs predict the organ bounding box candidates. Then, a distance vector is selected by majority voting and converted into a six-dimensional vector to describe the final organ-specific bounding box.

Now, the U-Net model gets the corresponding bounding box as input. Since the U-Net only accepts a static number of voxels, the BB is resized to a fixed size. The U-Net uses the data contained inside the given bounding box, to segment the corresponding organ. The output is a segmentation map of the full target organ.

1.4 Evaluation

We used a five-fold Monte Carlo cross-validation based on a 30:20 (train:test) data split. The target bounding boxes extend along the three-axis x, y, z. The resulting segmentations of the combined application are compared against the reference segmentations by using the Dice coefficient [Taha and Hanbury, 2015].

Results

Table 1 shows the Dice coefficients achieved by our conceptual method for all five target organs. We compared our values to studies, that tried to segment the same organs with automatic methods.

Table 1: Comparison of [3] mean - over all per organ results - Dice coefficients and standard deviation (SD) of our general method vs. other one organ-focused methods. 

The best results show up for the liver and spleen. Both organs achieve a dice coefficient of 0.71. The right kidney is slightly worse vs. left with mean values of 0.55 and 0.67 respectively. The segmentation of the pancreas achieved the lowest dice coefficient with only 0.32. The standard deviation is similar for all organs ranging from 0.26 to 0.34. An example segmentation is presented in Figure. 4.

Segmentation of unseen patient image data scan takes between 30 seconds and one minute depending on the size of the data, i.e. on an Intel-i7 processor with a consumer NVIDIA GTX 1050 GPU.

Discussion

We were able to detect organ bounding boxes of variable qualitys automatically. Though, the overall position and extent of the predicted bounding boxes has to be improved to ensure a satisfying segmentation.

Especially the prediction along the z-axis is not accurate enough and resulted in boxes that are displaced. Due to misalignment, some segmentation masks are cut off, because parts of the organ are missing. The standard deviations were too high, indicating highly varying results.

In summary, the results of the current state of the segmentation method are affected by displaced bounding boxes and could not compete with the Dice coefficients of related studies. Especially the pancreas is a challenge.

Figure 4: Example segmentations:(a) A good segmentation of the liver, with some leakage into surrounding structures. (b) A poor segmentation of the pancreas. The bounding box was shifted along the z-axis. This cut off the upper half of the pancreas, resulting in an incomplete segmentation. The segmentation also included parts of surrounding structures. because of its shape and intensity similarity to surrounding tissue. Though it is important to mention that most of the studies had more training data and are often focused on a single organ not addressing a group. 

Future works should focus on the accuracy and robustness of the bounding box detection [Kern and Mastmeyer, 2022]. The quality of the tar- get segmentation depends heavily on the quality of the detected bounding boxes. We used predicted bounding boxes, to produce results for a completely automatic method.

Acknowledgements

Funding from German Research Foundation: DFG-MA 6791/1-1. 

References

Clinical Trials and Clinical Research: I am delighted to provide a testimonial for the peer review process, support from the editorial office, and the exceptional quality of the journal for my article entitled “Effect of Traditional Moxibustion in Assisting the Rehabilitation of Stroke Patients.” The peer review process for my article was rigorous and thorough, ensuring that only high-quality research is published in the journal. The reviewers provided valuable feedback and constructive criticism that greatly improved the clarity and scientific rigor of my study. Their expertise and attention to detail helped me refine my research methodology and strengthen the overall impact of my findings. I would also like to express my gratitude for the exceptional support I received from the editorial office throughout the publication process. The editorial team was prompt, professional, and highly responsive to all my queries and concerns. Their guidance and assistance were instrumental in navigating the submission and revision process, making it a seamless and efficient experience. Furthermore, I am impressed by the outstanding quality of the journal itself. The journal’s commitment to publishing cutting-edge research in the field of stroke rehabilitation is evident in the diverse range of articles it features. The journal consistently upholds rigorous scientific standards, ensuring that only the most impactful and innovative studies are published. This commitment to excellence has undoubtedly contributed to the journal’s reputation as a leading platform for stroke rehabilitation research. In conclusion, I am extremely satisfied with the peer review process, the support from the editorial office, and the overall quality of the journal for my article. I wholeheartedly recommend this journal to researchers and clinicians interested in stroke rehabilitation and related fields. The journal’s dedication to scientific rigor, coupled with the exceptional support provided by the editorial office, makes it an invaluable platform for disseminating research and advancing the field.

img

Dr Shiming Tang

Clinical Reviews and Case Reports, The comment form the peer-review were satisfactory. I will cements on the quality of the journal when I receive my hardback copy

img

Hameed khan