Detecting Chagas Disease Using a Vision Transformer-based ECG Foundation Model

Lore Van Santvliet1, Phu Xuan Nguyen1, Bert Vandenberk2, Maarten De Vos3
1STADIUS Center for Dynamical Systems, Signal Processing and Data Analytics, Department of Electrical Engineering (ESAT), KU Leuven, 2KU Leuven, 3KU Leuven - Dept. of Electrical Engineering / Dept. of Development & Regeneration


Abstract

Chagas disease, a parasitic condition endemic to Latin America, is associated with cardiac abnormalities, making electrocardiograms (ECGs) a valuable tool for non-invasive and cost-effective population screening. However, its detection remains challenging due to overlapping characteristics with other cardiovascular conditions and limited data availability.

We demonstrate the use of a vision transformer (ViT) foundation model for Chagas detection from clinical 12-lead ECGs, for the George B. Moody PhysioNet Challenge 2025 under the team name Biomed-Cardio. The model was first pretrained using a masked ECG modeling task on open-source datasets (CinC2020, Chapman-Shaoxing and CODE-test) to learn generalizable ECG representations. Task-specific training was then performed via linear probing then fine-tuning on PTB-XL, CODE-15% and SaMi-Trop databases. To address the strong class imbalance caused by the low prevalence of Chagas, we employed data augmentation (cropping and shifting), and weighted sampling strategies. Using 5-fold stratified cross validation on the challenge's training set, our method achieved a mean internal validation score of 0.437±0.023 for the challenge metric, 0.856±0.005 AUROC, 0.193±0.008 AUPRC, and 0.272±0.010 F1-score. On the hidden validation set, the approach scored 0.390 for the challenge metric.

Our results highlight the value of extensive pretraining for learning robust ECG representations, applicable to downstream tasks such as rare disease detection. This approach is particularly beneficial when labeled data is scarce, as it leverages large-scale, heterogeneous ECG datasets to improve performance on low-resource classification tasks. We will still investigate the added value of using the challenge datasets for pretraining of the foundation model.