Background Left ventricular hypertrophy (LVH) is an established independent predictor of cardiovascular morbidity and mortality. Indices derived from the electrocardiogram (ECG) have been used to infer the presence of LVH with limited sensitivity. The aim of this study was to assess the discriminative power of a combination of ECG biomarkers to optimally classify LVH using random forest and address class imbalance using downsampling.
Methods We extracted ECG biomarkers with a known physiological association with LVH from the 10-second 12-lead ECG of 38,382 participants (37,731 with normal LV, 615 with LVH) in the UK Biobank. LVH was defined based on parameters from cardiac magnetic resonance imaging. Chi-Squared was used for ECG biomarker selection and classification was performed using random forest. The dataset was split into 80% training set and 20% testing set for performance measurement. Additionally, ten-fold cross validation was applied to train the random forest algorithm. Downsampling was performed in the training set of the majority normal LV group.
Results The combination of the top 40 ranking ECG biomarkers were used in the random forest classifier. Classification of LVH in the imbalanced dataset had 98% accuracy with 4% specificity (sensitivity 99%, F1 score 99%, AUC 0.76). Following downsampling, accuracy decreased to 45% and the specificity was 89% (sensitivity 45%, F1 score 62%, AUC 0.80).
Conclusions A combination of extracted morphological ECG biomarkers in the random forest classifier were able to discriminate between normal LV group and LVH. We addressed class imbalance in our data using downsampling, which improved specificity of the classifier. Our findings provide support for the ECG as an inexpensive screening tool to identify LVH. In future work we will use other classifiers such as support vector machine and deep learning for comparison and explore heterogenous patient populations with more cases of LVH for optimisation.