Leveraging AutoML to provide NAFLD screening diagnosis: Proposed machine learning models

+1-(302)-520-2644

Articles

Abstract

Introduction

Materials & Methods

Results

Discussion

Conclusion

Potential Conflict of Interest

Sources of Funding

Study Association

References

Pdf

Quick Links

Aims and scope

Indexing

Article processing charges

Editorial board

Editorial Workflow

Abstract

Introduction

Materials & Methods

Results

Discussion

Conclusion

Potential Conflict of Interest

Sources of Funding

Study Association

References

Quick Links

Aims and scope

Indexing

Article processing charges

Editorial board

Editorial Workflow

Pdf Download

Pdf View

Views 20

PDF Downloads10

Research Article | DOI: https://doi.org/10.31579/2834-8788/014

Leveraging AutoML to provide NAFLD screening diagnosis: Proposed machine learning models

Ali Haider Bangash

Shifa College of Medicine, Shifa Tameer e Millat University, Islamabad, Pakistan.

*Corresponding Author: Ali Haider Bangash, Shifa College of Medicine, Shifa Tameer e Millat University, Islamabad, Pakistan.

Citation: Ali H.Bangash, (2023), Leveraging Auto ML to provide NAFLD screening diagnosis: Proposed machine learning models, Journal of Heart and Vasculature, 2(5); DOI:10.31579/2834-8788/014

Copyright: © 2023, Ali Haider Bangash. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Received: 05 September 2023 | Accepted: 18 September 2023 | Published: 29 September 2023

Keywords: nafld; fatty liver disease; automl; machine learning; extreme gradient boosting; xgboost; lightgbm; random forest; regularized greedy forest

Abstract

NAFLD is reported to be the only hepatic ailment increasing in its prevalence concurrently with both; obesity & T2DM. In the wake of a massive strain on global health resources due to COVID 19 pandemic, NAFLD is bound to be neglected & shelved. Abdominal ultrasonography is done for NAFLD screening diagnosis which has a high monetary cost associated with it. We uti- lized MLjar, an autoML web platform, to propose machine learning models that require no coding whatsoever & take in only easy-to-measure anthropometric measures for coming up with a screening diagnosis for NAFLD with considera- bly high AUC. Further studies are suggested to validate the generalization of the presented models.

Introduction

Hepatic diseases taking the lives of as many as 1.75 million people worldwide annual- ly are a menace to be reckoned with.(1) Younossi ZM et al(1) indicated that non- alcoholic fatty liver disease (NAFLD) is the only hepatic pathology growing exponen- tially in its prevalence, in coincidence with the increasing rates of both; obesity as well as type 2 diabetes mellitus (T2DM): all wreaking havoc.(2)(3)(4) Where COVID-19 pandemic is undeniably encroaching over the global health system’s resources, NAFLD, along with other ailments, is bound to be ignored which may lead to a rise in the associated morbidity & mortality.
A considerable amount of financial resources are being put in globally in the fight against COVID-19 pandemic where radiological diagnosis for NAFLD demands a considerable allocation of funds, the prioritization of resources shall undeniably end up in missing cases on the cost of the patients as well as their families in terms of the psychological burden. A low-cost, accurate system to screen patients for a potential NAFLD diagnosis using simple, easy-to-measure anthropometric measures is a dire need of time, thus. Huang BX et al, in a cross-sectional study conducted in the Health Examination Center in Guangzhou, China took in anthropometric measures & abdominal ultraso- nography & concluded neck circumference to be an independent predictor for the fatty liver disease. (5) The purpose of this study is to create such machine learning models that take in the said easy-to-measure variables from the Huang BX et al study (5) & come up with an autoML protocol for initial screening diagnosis for NAFLD: models that are, if not having a potential to replace, should be at least comparable with the abdominal ultra- sound screening diagnosis for NAFLD. By adopting Mljar(6), a zero-code autoML web platform providing feature preprocessing and eningeering, algorithm training and hyperparameters selection bundle for machine learning, we are able to create such practical models.

Materials & Methods

Study Cohort
The study takes in data (7) from Huang BX et al. (5). The authors took in 4053 subjects, 2436 men and 1617 women between 20 and 88 years of age, after excluding those patients that had a history of co-morbid conditions as well as those with a lack of heaptic ultrasonography data. Patients’ history records were inquired & state-of-the- art methods were adopted to measure anthropometric & biochemical variables leaving negligible measurement errors, only. Contrary to South Asian standards where BMI of ≥ 25 Kg/m2 is termed as Overweight, a BMI ≧24 kg/m2, for both genders, is termed as Overweight by Huang BX et al. (5), citing Zhou B (8). The Graif’s criteria (9) was adopted to diagnose Fatty liver disease on ultrasonography.
Statistical analysis Development of Models
Homogenous Development Framework MLjar(6), an AutoML zero-code machine learning web platform, is a complete pack- age for data loading, pre-processing, modelling & result interpretation with a consid- erably high quality of machine learning models which can be deployed both locally & across a rest API.
A homogenous approach (Table 1) was adopted for the development of the models vis-à-vis the preprocessing & tuning protocols as well as system specifications so as to keep the model development bias to a minimum.
Development framework Features
Preprocessing protocol One-hot encoding to convert categoricalfeatures
System specifications 8 CPUs, 15 GB RAM
Tuning protocol
Validation type
Tuning mode
Time limitfor single model
15-fold cross-validation, Shuffling of samples & stratification of classes in folds
Perfect mode (25-35models) 5 minutes
Table 1. Homogenous Development Framework
Models developed
Henceforth mentioned machine learning models were, thus, created:
Extreme Gradient Boosting (Xgboost)
LightGBM (LGBM)
Random Forest (RF)
Regularized Greedy Forest (RGF)
Extra Trees (ET)
k-Nearest Neighbor (KNN)
Logistic Regression (LR)
Neural Network (NN)
Outcome variables
Since the class imbalance was considerably high, the discriminative ability of the models were the primary outcome variables. AUC-ROC analysis was adopted to measure that ability.(10) The respective values were interpreted in accordance with the schema provided by Lau L et al.(10) (Table 2)
AUC-ROC value Interpretation
>0.9 Excellent discrimination
>0.75 Good discrimination
>0.5 Random guessing
Table 2. Interpretation of the AUC-ROC analysis (Adopted from: Lau L et al.) (AUC-ROC: Area under the receiver operating characteristic curve)
Secondarily, training time was also analyzed. Ideally, the best model shall be the one that has the highest discriminating capacity & yields results within the smallest time period.

Results

Algorithm performances
The study adopted a zero-code ML platform to come up with 8 types of machine learning models that take in easy-to-measure anthropometric measures such as BMI & waist-to-hip ratio in order to provide a screening diagnosis for NAFLD.
As indicated in table 3, all of the algorithms, trained in accordance with the afore- mentioned Homogenous Development Framework, have good discriminating ability to designate the dichotomous variable of interest.
RF came out to have the highest discriminating ability, with a computation time of 4 minutes 9 seconds. Out of the proposed models, KNN had the least AUC but a con- siderably less computation time of only 6 seconds. LGBM required as much as 10 minutes to come up with a considerable AUC. LR completed its computation in the least amount of time.
Among the ensemble averages, RGF achieved the highest average. Given that the best model of RGF was up with its training in only 10 seconds & an additional 35 seconds were lapsed for ensemble averaging, RGF outperforms all others models by achieving the highest AUC & thus exhibiting the best discriminating ability out of all the models.
KNN exhibited the lowest ensemble average. (Ensemble average of LGBM & NN was not calculated.) Thus, it is safe to indicate that KNN underperformed the most out all the proposed models.
Algorithm
Area under
Curve
Time
Ensemble
Averaging
Time
Xgboost
0.822004
0:07:45
0.822679 0:00:35
Lgbm
0.823532
0:10:05
- -
Rf
0.824809
0:04:09
0.825137 0:00:33
Rgf
0.823396
0:00:10
0.826389 0:00:35
Et
0.823557
0:01:02
0.824575 0:00:35
Knn
0.815769
0:00:06
0.816616 0:00:28
Lr
0.821363
0:00:01
0.821632 0:00:26
Nn
0.820271
0:03:16
- -
Table 3. AUC & training time of the proposed models

Discussion

Many studies have been done to utilize machine learning for the prediction of fatty liver disease. Atabaki-Pasdar N et al (11) in a major modelling & validation study con- cluded that the highest AUC (of 0.84 for the respective study) is obtained by the com- bination of “-omics” data & clinical variables. Using MRI-derived proton density fat fraction for referencing, Han A et al (12) developed deep learning one-dimensional convolutional neural networks for NAFLD diagnosis by taking in ultrasound data.1 By taking in all the patients who had been screened for fatty liver at the New Taipei City Hospital between the 1st and 31st of December 2009, Wu CC et al (13) developed several classification models to predict fatty liver disease and obtained the highest AUC of 0.925 on a Random Forest model. Feature selection was employed to obtain the best variables to be fed in the models, here. The utilization of machine learning to predict hepatic pathologies in general & NAFLD in particular is thus evident.
Our proposed models are the very first effort, to the best of our knowledge, to lev- erage autoML zero-code platforms to come up with machine learning models that are trained to have a good discriminating ability to predict NAFLD using only anthropo- metric measures. The proposed models neither require costly analysis so that varia- bles, such as unltrasonographic signals, may be fed in them to obtain a prediction nor does it require considerably high computation time & resources.
This been stated, the model does require external validation using data from popu- lations different from its training population. Only thus can a machine learning mod- el’s generalization can be truly validated. Moreover, a study comparing the presented model’s diagnosis with an abdominal ultrasound diagnosis for NAFLD, the predic- tions assessed against hepatic biopsy, is proposed to be in order to explore the pre- sented models’ potential to replace abdominal ultrasound as an initial diagnostic tool for NAFLD.
Since autoML platform was adopted, the proposed models are analogous to a black-box, the internal workings of which are difficult to decipher. Moreover, the computation time of the best LR model is only 1 second which might possibly be due to overfitting of the respective model.
The presented models indicate that the fusion of machine learning & medicine is fruitful for cutting down the associated costs of screening and initial diagnosis of NAFLD: an ailment that has considerable morbidity and mortality associated with it.For the Han A et al (12) study, the metrics against which the respective proposed model was evaluated did not include AUC.

Conclusion

By adopting an autoML zero-code platform, machine learning models with good dis- criminating ability are presented that require only easy-to-measure anthropometric measures as input variables to come up with an initial screening diagnosis for NAFLD. Further studies should be conducted to compare the proposed models with abdominal ultrasound for the screening diagnosis of NAFLD.

Potential Conflict of Interest

The authors report no potential conflict of interest whatsoever.

Sources of Funding

This study was not funded by any institution. We extend our token of appreciation towards mljar (https://mljar.com/).

Study Association

This study is neither associated with any thesis or dissertation work nor with any con- ference.

References

Younossi ZM, Stepanova M, Younossi Y, Golabi P, Mishra A, Rafiq N, et al. Epidemiology of chronic liver diseases in the USA in the past three decades. Gut [Internet]. 2020 Mar 1 [cited 2020 Jul 13];69(3):564–568.
View at Publisher | View at Google Scholar

Berry EM. The Obesity Pandemic—Whose Responsibility? No Blame, No Shame, Not More of the Same. Front Nutr [Internet]. 2020 Jan 31 [cited 2020 Jul 13];7(2):2.
View at Publisher | View at Google Scholar

Ghanemi A, Yoshioka M, St-Amand J. Will an obesity pandemic replace the coronavirus disease-2019 (COVID-19) pandemic? Med Hypotheses [Internet]. 2020 Jun [cited 2020 Jul 13]; 144:110042.
View at Publisher | View at Google Scholar

Ghosal S, Arora B, Dutta K, Ghosh A, Sinha B, Misra A. Increase in the risk of type 2 diabetes during lockdown for the COVID19 pandemic in India: A cohort analysis. Diabetes Metab Syndr Clin Res Rev. 2020 Sep 1;14(5):949– 952.
View at Publisher | View at Google Scholar

Huang B, Zhu M, Wu T, Zhou J, Liu Y, Chen X, et al. Neck Circumference, along with Other Anthropometric Indices, Has an Independent and Additional
View at Publisher | View at Google Scholar

Contribution in Predicting Fatty Liver Disease. Targher G, editor. PLoS One [Internet]. 2015 Feb 13 [cited 2020 Jul 13];10(2): e0118071.
View at Publisher | View at Google Scholar

MLJAR | Machine Learning Made Simple [Internet]. [cited 2020 Oct 20]. Neck Circumference, along with Other Anthropometric Indices, Has an Independent and Additional Contribution in Predicting Fatty Liver Disease [Internet]. [cited 2020 Sep 28].
View at Publisher | View at Google Scholar

Zhou B. Predictive values of body mass index and waist circumference to risk factors of related diseases in Chinese adult population. Zhonghua Liu Xing Bing Xue Za Zhi [Internet]. 2002 Feb 1 [cited 2020 Oct 20];23(1):5–10.
View at Publisher | View at Google Scholar

Graif M, Yanuka M, Baraz M, Blank A, Moshkovitz M, Kessler A, et al. Quantitative estimation of attenuation in ultrasound video images: Correlation with histology in diffuse liver disease. Invest Radiol [Internet]. 2000 May [cited 2020 Oct 20];35(5):319–24.
View at Publisher | View at Google Scholar

Lau L, Kankanige Y, Rubinstein B, Jones R, Christophi C, Muralidharan V, et al. Machine-Learning Algorithms Predict Graft Failure After Liver Transplantation. Transplantation [Internet]. 2017 Apr 1 [cited 2020 Oct 20];101(4): e125–132.
View at Publisher | View at Google Scholar

Atabaki-Pasdar Id N, Ohlsson Id M, Viñuela Id A, Frau F, Pomares-Millanid H, Haidid M, et al. Predicting and elucidating the etiology of fatty liver disease: A machine learning modeling and validation study in the IMI DIRECT cohorts PLOS MEDICINE. Jagadish VangipurapuID [Internet]. [cited 2020 Jul 13]; 25:40.
View at Publisher | View at Google Scholar

Han A, Byra M, Heba E, Andre MP, Erdman JW, Loomba R, et al. Noninvasive diagnosis of nonalcoholic fatty liver disease and quantification of liver fat with radiofrequency ultrasound data using one-dimensional convolutional neural networks. Radiology [Internet]. 2020 May 1 [cited 2020 Jul 13];295(2):342–350.
View at Publisher | View at Google Scholar

Wu CC, Yeh WC, Hsu WD, Islam MM, Nguyen PA (Alex), Poly TN, et al. Prediction of fatty liver disease using machine learning algorithms. Comput Methods Programs Biomed. 2019 Mar 1; 170:23–29.
View at Publisher | View at Google Scholar

Clinical Trials and Clinical Research: I am delighted to provide a testimonial for the peer review process, support from the editorial office, and the exceptional quality of the journal for my article entitled “Effect of Traditional Moxibustion in Assisting the Rehabilitation of Stroke Patients.” The peer review process for my article was rigorous and thorough, ensuring that only high-quality research is published in the journal. The reviewers provided valuable feedback and constructive criticism that greatly improved the clarity and scientific rigor of my study. Their expertise and attention to detail helped me refine my research methodology and strengthen the overall impact of my findings. I would also like to express my gratitude for the exceptional support I received from the editorial office throughout the publication process. The editorial team was prompt, professional, and highly responsive to all my queries and concerns. Their guidance and assistance were instrumental in navigating the submission and revision process, making it a seamless and efficient experience. Furthermore, I am impressed by the outstanding quality of the journal itself. The journal’s commitment to publishing cutting-edge research in the field of stroke rehabilitation is evident in the diverse range of articles it features. The journal consistently upholds rigorous scientific standards, ensuring that only the most impactful and innovative studies are published. This commitment to excellence has undoubtedly contributed to the journal’s reputation as a leading platform for stroke rehabilitation research. In conclusion, I am extremely satisfied with the peer review process, the support from the editorial office, and the overall quality of the journal for my article. I wholeheartedly recommend this journal to researchers and clinicians interested in stroke rehabilitation and related fields. The journal’s dedication to scientific rigor, coupled with the exceptional support provided by the editorial office, makes it an invaluable platform for disseminating research and advancing the field.

Dr Shiming Tang

Clinical Reviews and Case Reports, The comment form the peer-review were satisfactory. I will cements on the quality of the journal when I receive my hardback copy

Hameed khan

Quick Links

Home

Explore Journals

Open Access

Authors

Editors

About Us

Submit Manuscript

Explore Journals

Article Processing Charges

Peer Review at ClinicSearch

Submission Guidelines

News letter

Get latest update, news & journal offers

Subscribe!

Contact

ClinicSearch Publishing LLC,
701 Tillery Street Unit 12-1049, Austin, Texas 78702, United States.
Phone: +1-(302)-520-2644
Email: info@clinicsearchonline.com

Contact Us (We contact you)

© 2022-2025 ClinicSearch Online, All rights reserved. No part of this content may be reproduced or transmitted in any form or by any means as per the standard guidelines of fair use. Creative Commons License Open Access by ClinicSearch Online is licensed under a Creative Commons Attribution 4.0 International License.

Development framework	Features
Preprocessing protocol	One-hot encoding to convert categoricalfeatures
System specifications	8 CPUs, 15 GB RAM
Tuning protocol Validation type Tuning mode Time limitfor single model	15-fold cross-validation, Shuffling of samples & stratification of classes in folds Perfect mode (25-35models) 5 minutes

AUC-ROC value	Interpretation
>0.9	Excellent discrimination
>0.75	Good discrimination
>0.5	Random guessing

Algorithm	Area under Curve	Time	Ensemble Averaging	Time
Xgboost	0.822004	0:07:45	0.822679	0:00:35
Lgbm	0.823532	0:10:05	-	-
Rf	0.824809	0:04:09	0.825137	0:00:33
Rgf	0.823396	0:00:10	0.826389	0:00:35
Et	0.823557	0:01:02	0.824575	0:00:35
Knn	0.815769	0:00:06	0.816616	0:00:28
Lr	0.821363	0:00:01	0.821632	0:00:26
Nn	0.820271	0:03:16	-	-