Interpretable Detection of Chagas Disease from 12-Lead ECGs Using LightGBM with Adaptive Class Balancing.

Clinton Mwangi Kuya, David Wachira Warutumo, Paul K Bett
DSAIL


Abstract

Aims: Chagas disease, a leading cause of cardiomyopathy in endemic regions, remains underdiagnosed due to limited access to specialized healthcare. We aimed to develop an interpretable, lightweight machine learning framework using 12-lead electrocardiograms (ECGs) to prioritize high-risk patients for confirmatory testing, addressing extreme class imbalance ( 2% prevalence).

Methods: We trained LightGBM, a gradient-boosting decision tree algorithm, on morphological (QRS duration and T-wave amplitude), temporal (RR intervals), and spectral ECG features from the PhysioNet 2025 Challenge dataset. To mitigate class imbalance, adaptive synthetic oversampling (ADASYN) was applied to the minority class, and cost-sensitive learning via class-weighted loss penalized misclassification of Chagas-positive cases. Model performance was evaluated using 5-fold cross-validation.

Results: Initial training without hyperparameter tuning yielded an challengescore of 0.23 for Chagas-positive cases for our validation dataset. After integrating ADASYN and class-weighted loss, performance improved to a score of 0.58 and sensitivity of 82%, with specificity maintained at 74%. Feature importance analysis identified prolonged QRS (Q wave, R wave and S wave) duration and reduced T-wave amplitude as top discriminators, aligning with known Chagas cardiomyopathy markers. Inference required less than 2 seconds per ECG.

Conclusion: Despite initial performance limitations, our framework demonstrates feasibility for Chagas screening in resource-constrained settings. The interpretable features and computational efficiency enable deployment on lowpower devices, while adaptive balancing strategies improve sensitivity to rare cases. This approach offers a pragmatic pathway to early detection, potentially reducing morbidity in underserved populations.