Predicting Chagas Disease from ECGs Using Simulator-Augmented DNN-Derived Abnormality Scores

Yuta Hashimoto1, Naoki Nonaka2, Jun Seita1
1RIKEN Center for Integrative Medical Sciences, 2RIKEN


Abstract

Chagas disease is characterized by a variety of electrocardiographic (ECG) abnormalities, including bundle branch blocks, atrioventricular blocks, and both atrial and ventricular arrhythmias, which result from myocardial damage and fibrosis of the cardiac conduction system. In this study, we developed a supervised deep neural network (DNN)-based multi-label classification model to predict the presence and severity (score range: 0–1) of 15 ECG abnormalities commonly associated with Chagas disease. The model was trained using publicly available ECG datasets, PTB-XL and G12EC. To address the scarcity of certain disease-related ECG patterns in these datasets, synthetic ECGs were generated using a simulator informed by known abnormal waveforms and were incorporated into the training data as supplementary samples. The predicted abnormality scores were subsequently used as input features for a second-stage classification model. In this stage, the abnormality scores were combined with demographic features—age and sex—which have been reported in previous studies as risk factors for Chagas disease. A binary classifier was then trained to predict the presence or absence of Chagas disease. For the construction of the abnormality scoring model, self-supervised learning was first applied to the CODE-15% dataset, followed by fine-tuning using the PTB-XL dataset and the synthetic ECGs. For the final binary classification model, we used the SaMi-Trop dataset as the positive class and the PTB-XL dataset as the negative class . To ensure both interpretability and robustness to heterogeneous data sources, a random forest classifier was adopted. When submitted to the official PhysioNet Challenge 2025 evaluation system, the model achieved a validation score of 0.037 according to the Challenge-specific scoring metric.