Explainable Multimodal Fusion with Vision Transformer and Wave2Vec for ECG-Based Chagas Disease Detection

Aditya Nagori1, Junseob Kim2, Tilendra Choudhary2
1Duke, 2Duke University


Abstract

Introduction: Early and noninvasive detection of Chagas disease is critical for improving outcomes and screening in resource-limited settings. In this work, we propose a robust framework for predicting Chagas infection using 6-10 second 12-lead ECGs as two complementary modalities, multidimensional time-series and derived images, to capture diverse diagnostic features that are inaccessible through a single modality alone. Method: ECGs were resampled, bandpass filtered, and normalized. For the time-series modality, signal quality was assessed and fiducial features (R-peaks, diagnostic intervals from lead-II including P, QRS, T durations; PR, ST, QT/QTc; atrial/ventricular systole) and interlead eigenvalues, along with 19 HRV features, were extracted. Additionally, 783 TSFRESH features capturing time-domain statistics, FFT coefficients (centroid, entropy), CWT representations, and complexity metrics were derived. An ECG foundation model using Fairseq signals with Wave2Vec 2.0 produced 768-dimensional embeddings. For the image modality, ECGs were converted to time-frequency images via CWT and processed with a pretrained Vision Transformer, yielding a 1536-dimensional vector. All features were concatenated as input to an XGBoost model. Development and evaluation were performed on 47,982 samples preserving prevalence, using stratified splits, 5-fold cross-validation and a 20% held-out set, hyperparameter search, and SHAP analysis. Results: On the sampled cohort, ECG features and HRV metrics performed best with a held-out set challenge score of 0.43, and cross-validation score of 0.37, 95%CI [0.35, 0.38]. SHAP analysis revealed, conventional features (age, HRV metrics, ECG intervals, eigenvalues) drive predictions by modulating risk scores, while TSFRESH features capture subtle, dynamic ECG patterns that complement these markers, highlighting the potential of our approach for non-invasive detection of Chagas infection. Future work: We plan to explore advanced modality adapter and fusion techniques, ECG-FM fine-tuning, attention-based fusion, end-to-end deep learning strategies and diffusion-based signal augmentation to build more robust and clinically effective predictive models for Chagas identification.