Angélica Atehortúa, Polyxeni Gkontra, Marina Camacho, Oliver Diaz, Maria Bulgheroni, Valentina Simonetti, Marc Chadeau-Hyam, Janine F. Felix, Sylvain Sebert, Karim Lekadir, Cardiometabolic risk estimation using exposome data and machine learning, International Journal of Medical Informatics, Volume 179, 2023, 105209, ISSN 1386-5056, https://doi.org/10.1016/j.ijmedinf.2023.105209
Cardiometabolic risk estimation using exposome data and machine learning
|Author:||Atehortúa, Angélica1; Gkontra, Polyxeni1; Camacho, Marina1;|
1BCN-AIM laboratory, Facultat de Matemàtiques i Informàtica, Universitat de Barcelona, Barcelona, Spain
2R&D Ab.Acus s.r.l., Milano, Italy
3Department of Epidemiology and Biostatistics, MRC-HPA Centre for Environment and Health, School of Public Health, Imperial College London, London, United Kingdom
4The Generation R Study Group, Erasmus MC, University Medical Center Rotterdam, Rotterdam, the Netherlands
5Department of Pediatrics, Erasmus MC, University Medical Center Rotterdam, Rotterdam, the Netherlands
6Research Unit of Population Health, Faculty of Medicine, University of Oulu, Oulu, Finland
|Online Access:||PDF Full Text (PDF, 2.3 MB)|
|Persistent link:|| http://urn.fi/urn:nbn:fi-fe20231102142349
|Publish Date:|| 2023-11-02
Background: The human exposome encompasses all exposures that individuals encounter throughout their lifetime. It is now widely acknowledged that health outcomes are influenced not only by genetic factors but also by the interactions between these factors and various exposures. Consequently, the exposome has emerged as a significant contributor to the overall risk of developing major diseases, such as cardiovascular disease (CVD) and diabetes. Therefore, personalized early risk assessment based on exposome attributes might be a promising tool for identifying high-risk individuals and improving disease prevention.
Objective: Develop and evaluate a novel and fair machine learning (ML) model for CVD and type 2 diabetes (T2D) risk prediction based on a set of readily available exposome factors. We evaluated our model using internal and external validation groups from a multi-center cohort. To be considered fair, the model was required to demonstrate consistent performance across different sub-groups of the cohort.
Methods: From the UK Biobank, we identified 5,348 and 1,534 participants who within 13 years from the baseline visit were diagnosed with CVD and T2D, respectively. An equal number of participants who did not develop these pathologies were randomly selected as the control group. 109 readily available exposure variables from six different categories (physical measures, environmental, lifestyle, mental health events, sociodemographics, and early-life factors) from the participant’s baseline visit were considered. We adopted the XGBoost ensemble model to predict individuals at risk of developing the diseases. The model’s performance was compared to that of an integrative ML model which is based on a set of biological, clinical, physical, and sociodemographic variables, and, additionally for CVD, to the Framingham risk score. Moreover, we assessed the proposed model for potential bias related to sex, ethnicity, and age. Lastly, we interpreted the model’s results using SHAP, a state-of-the-art explainability method.
Results: The proposed ML model presents a comparable performance to the integrative ML model despite using solely exposome information, achieving a ROC-AUC of 0.78 ± 0.01 and 0.77 ± 0.01 for CVD and T2D, respectively. Additionally, for CVD risk prediction, the exposome-based model presents an improved performance over the traditional Framingham risk score. No bias in terms of key sensitive variables was identified.
Conclusions: We identified exposome factors that play an important role in identifying patients at risk of CVD and T2D, such as naps during the day, age completed full-time education, past tobacco smoking, frequency of tiredness/unenthusiasm, and current work status. Overall, this work demonstrates the potential of exposome-based machine learning as a fair CVD and T2D risk assessment tool.
International journal of medical informatics
|Type of Publication:||
A1 Journal article – refereed
|Field of Science:||
3121 General medicine, internal medicine and other clinical medicine
3142 Public health care science, environmental and occupational health
This work has received funding by the European Union's Horizon 2020 research and innovation programme under grant agreement No 874739 (LongITools project). PG and KL have additionally received funding by the European Union's Horizon 2020 research and innovation programme under grant agreement No 825903 (euCanSHare project).
|EU Grant Number:||
(874739) LONGITOOLS - Dynamic longitudinal exposome trajectories in cardiovascular and metabolic non-communicable diseases
© 2023 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).