An Ensemble of Machine Learning Models for Multilabel Classification of Cardiovascular Diseases by ECGs

Svyatoslav Khamzin1, Anastasia Bazhutina1, Mikhail Chmelevsky2, Stepan Zubarev1, Margarita Budanova1, Werner Rainer1, Aleksandr Sinitca1
1XSpline S.p.A, 2Division of Cardiology, Fondazione Cardiocentro Ticino


Abstract

The electrocardiogram (ECG) is an integral characteristic for evaluating cardiac electrophysiology. Therefore, automatic analysis of ECG signals, in particular, obtaining a diagnosis by ECG, is an important practical task for choosing a treatment strategy. Today, modern methods of machine learning and neural networks are used for automatic ECG classification. This study introduces two beat-to-beat classifiers for the automatic detection of cardiovascular disease by 12-lead ECG: the first detects a type of heart conduction disorder, and the second detects an infarction type. For the training of the classifiers, three open datasets were used: Chapman-Shaoxing, PTB-XL, and the dataset of Shandong Provincial Hospital. Each dataset contains 11 to 72 classes. Before the training stage, we performed preprocessing of ECGs. Initially, all ECG records were filtered, and noises in signals were deleted. Next, for each lead, only one QRS was chosen. For each QRS complex, we calculated statistical, time-domain, and frequency features. For the first classifier (10 classes), we selected: normal ECG, left bundle branch block (LBBB), incomplete LBBB, ventricular premature complex, left anterior fascicular block (FB), left posterior FB, right bundle branch block (RBBB), incomplete RBBB, nonspecific intraventricular conduction disturbance, and ventricular preexcitation. For the second classifier (5 classes), we selected: non-infarcted ECGs, anteroseptal myocardial infarct (MI), lateral MI, inferior MI, and anterior MI. As the backbone of our solution, we implemented an ensemble of models, and the output result is a weighted prediction of ML models. The main advantage of this study is the extraction of all signal information in different ways and using it for efficient prediction in machine learning models. In the table, the mean weighted values at validation samples for the 10 and 5 classes are presented. The current results may be improved by data balancing and weight settings in the ensemble.