GIRAFFE: Crafting Deep Learning Ensembles for Classifying ECG Paper Printouts

Damian Marek Kucharski1, Arkadiusz Paweł Czerwiński1, Agata Maria Wijata1, Jacek Kawa1, Yalin Zheng2, Gregory Lip2, Jakub Nalepa1
1SIlesian University of Technology, 2University of Liverpool


Abstract

Ensemble methods have proven to be one of the best approaches helping to improve robustness of machine learning models while simultaneously ensuring high predictive performance. Different measuring equipment, data quality and patient populations pose significant challenge in the development of highly generalizable machine learning models of clinical relevance. We propose a genetic programming-based algorithm (dubbed GIRAFFE) which allows for building a tree-like classifier architecture from a set of base learners. The aggregation steps are derived in a form of a computational graph, preserving differentiability, allowing for gradient-based model explanations and further optimization. In this study, we generated a dataset of synthetic ECG images using the PTB-XL dataset (containing ECGs captured as time series). We split the dataset into the training, validation and test subsets to perform rigorous internal validation of the classification models. This dataset split was stratified according to the class distribution of normal vs. abnormal ECG printouts, with 7596 (44.5%), 955 (44.5%), and 955 (44.6 %) normal printouts included in the training, validation and test subsets, respectively, with the entire dataset containing 21388 ECGs printouts (9514 normal ECGs, 44.5%). Here, the training set was further split into 5 non-overlapping folds, and it was used to train 5 base models that were later exploited to create an ensemble with the GIRAFFE algorithm. In our preliminary results, the test set F1 score obtained with this method was 0.888 (95%CI: 0.874-0.901) in the task of classifying ECGs as normal or abnormal. Compared to the best base model achieving F1 score of 0.872 (95%CI: 0.857-0.886) and majority voting ensemble achieving F1 score of 0.884 (95CI: 0.870-0.898), GIRAFFE proved to be outperforming other methods. Our current efforts focus on expanding the set of base learners, and validating GIRAFFE over the real-world ECGs – our team name in the PhysioNet/CinC Challenge is GIRAFFE.