Automated ECG-based detection of Chagas disease, the focus of the 2025 PhysioNet Challenge, presents two major challenges: noisy supervision due to the self-reported weak labels in the CODE-15% dataset and severe class imbalance (2% prevalence). We address both issues through large-scale pretraining and dataset-asymmetric finetuning.
We combine the complementary strengths of attention-based models and recurrent architectures by pretraining multiple foundation models – masked autoencoding transformers and xLSTMs trained with simDINOv2. This enables the learning of low-level ECG representations without relying on label quality. During finetuning, we utilise the known disparity in label noise between datasets by applying smooth labelling to the CODE-15% dataset, where the labels are self-reported, but not to the PTB-XL or Sami-Trop datasets where the labels are more reliable. To reduce class imbalance, we oversample positives during training to enforce a 5% prevalence.
Our team (DlaskaLabMUI) ranked 3rd on the leaderboard with a score of 0.440 on the hidden validation set.