Evaluating Auxiliary Pretraining and Fine-Tuning Across Heterogeneous Datasets for ECG-Based Chagas Disease Detection

Bjørn-Jostein Singstad1, Amila Ruwan Guruge2, Nikolai Eidheim2, Ola Marius Lysaker3, Vimala Nunavath4
1Akershus University Hospital, 2University of South-Eastern Norway, 3University of South Eastern Norway, 4USN


Abstract

Chagas disease (American trypanosomiasis) is a neglected tropical disease caused by the parasite Trypanosoma cruzi. The disease can cause cardiac damage to humans known as chronic Chagas cardiomyopathy (CCC), manifesting as conduction blocks, arrhythmias, heart failure, and sudden death. The CODE-15% dataset contains more than 300000 12-lead electrocardiogram (ECG) recordings, but the labeled data in this dataset are mostly weak, relying heavily on self-reported medical histories. We introduce auxiliary pretraining, leveraging more dependable labels, and subsequently perform fine-tuning on SaMi-Trop, which includes serologically verified Chagas patients, and PTB-XL, assumed to contain non-Chagas patients. The results show that the proposed model, when pretrained on the CODE-15% dataset and then fine-tuned with Sami-Trop and PTB-XL, attained an AUROC of 0.69, an AUPRC of 0.22 on internal validation, and a challenge metric of 0.040 on hidden validation. Conversely, training only on CODE-15% and SaMi-Trop yielded an AUROC of 0.81, an AUPRC of 0.41 on internal validation, and a challenge metric of 0.316. These findings highlight a significant key limitation as the proposed pretraining strategy on auxiliary labels from CODE 15% and fine-tuning on PTB-XL and SaMi-trop offered no benefit and underperformed relative to conventional methods.