Predicting diabetes complications from electronic health records visits

Original research article

front. Jennett.

Second calculation genomics

Volume 16-2025 |

doi:10.3389/fgene.2025.1451290

This article is part of the research topic Annual Meeting of Critical Assessment for Large-Scale Data Analysis (CAMDA) View all 5 articles

It is temporarily accepted

  • 1 Al-Kudz University, Jerusalem, Palestine
  • 2 Zefat Academic College, SAFED, Israel
  • 3 Abdullah Gül University, Kayseri, Torkier

The final format version of the article will be released soon.

    Diabetes has a major impact on millions of people around the world, leading to substantial morbidity, disability and mortality. Predicting diabetes-related complications from health records is important for early prevention and the development of effective treatment plans. This study introduces a new feature engineering approach to predict four different complications: diabetes, IE, retinopathy, chronic kidney disease, ischemic heart disease, and amputation. During classification model development, we utilize XGBoost feature selection methods and various monitored machine learning algorithms such as Random Forest, XGBoost, LogitBoost, Adaboost, and Decision Tree. These models were trained with synthetic electronic health records (EHRs) generated by an automated double by-product encoder. These EHRs represent nearly 1 million synthetic patients derived from a genuine cohort of 979,308 diabetes. The variables considered in the model were the age range with chronic diseases that occur during patient visits beginning with the onset of diabetes. Throughout the experiment, Xgboost and Random Forest achieved the best overall predictive performance. The final model tailored to each complication and trained using a functional engineering approach achieved accuracy between 69% and 77% and AUC between 77% and 84% using cross-validation. However, the partitioned verification approach has provided accuracy in between. 59% and 78%, while AUC is 66% to 85%. These findings imply that the performance of our method outweighs the performance of traditional bag sack approaches, highlighting the effectiveness of the approach in increasing model accuracy and robustness.

    keyword:
    Random Forest, xgboost, logitboost, adaboost, and learning algorithms such as decision tree diabetes, diabetes complications, machine learning

    Received:
    June 18, 2024.
    Accepted:
    January 31, 2025.

    Copyright:
    ©2025
    Voskergian, Yousef, Bakir-Gungor. this is,
    Creative Commons Attribution License (CC by). If the original author or licensor is credited, it is permitted to be used, distributed or reproduced in other forums, and the original publications of this journal are cited in accordance with accepted academic practices. Any use, distribution, or reproduction that does not comply with these terms is not permitted.

    * correspondence:

    Daniel Voskergian, Al-Kuds University, Jerusalem, Palestine

    Malik Yousef, Zefat Academic College, Safed, Israel

    Disclaimer:
    All claims expressed in this article are solely by the author and do not necessarily represent the claims of the affiliated organizations, or publishers, editors, or reviewers. Products that may be evaluated in this article or claims that may be made by its manufacturer are not warranted or endorsed by the publisher.

    Related posts

    Editorial: Novel Insights into the Pathophysiology of Diabetes-related Complications: Implications for Improved Therapeutic Strategies, Volume II

    People with flu and diabetes | Influenza (influenza)

    Diabetic complications associated with poor oral hygiene