Influence of the Training Set Size on the Subject-to-Subject Variability of the Estimation Performance of Linear ECG-Lead Transformations

Daniel Guldenring1, Dewar Finlay2, Raymond Bond2, Alan Kennedy3, Peter Doggart4, James McLaughlin2
1HS Kempten, 2Ulster University, 3PulseAI Ltd, 4PulseAI


Linear ECG-lead transformations (LELTs) are used to estimate unrecorded leads by applying a number of recorded leads to a LELT matrix. Such LELT matrices are commonly developed using a training dataset and linear regression analysis. An important performance metric of LELTs is the subject-to-subject variability (SSV) of their estimation performance. In this research, we assess the relationship between an increasing training set size (from n=10 to n=370 subjects) and the SSV of LELTs.

A total of 200 LELT matrices were developed for each training sets size. The developed LELT matrices and 12-lead ECG data of a testing dataset (n=123 subjects) were used for the estimation of Frank VCGs. Root-mean-squared-error (RMSE) values between recorded and estimated Frank VCG leads were used for the quantification of the estimation performance. The SSV associated with each LELT matrix was quantified as the standard deviation of the corresponding RMSE values. This was followed by an analysis of the relationship between the training set size and the associated SSV values.

Increasing the training set size from 10 to 180, to 160 and to 200 subjects, for Frank VCG leads X, Y and Z respectively, was associated with a reduction of the observed SSV. Further increases in training set size were found to only have a marginal effect on the observed SSV.