Linear ECG-lead transformations (LELTs) are used to estimate unrecorded leads by applying a number of recorded leads to a LELT matrix. LELT matrices are commonly developed using a training dataset and linear regression analysis. The subject-to-subject variability (SSV) of the estimation performance is an important performance metric of LELT matrices. In this research, we assess the relationship between an increasing training set size and the SSV of the estimation performance of LELTs.

First, 12-lead ECGs and Frank VCGs of n=493 subjects were divided into one testing dataset (DTest, n=123 subjects) and 37 training sets of varying size (from n=10 to n=370 in steps of 10 subjects). Random sampling with replacement was used to generate 100 different instances for each training set size. Second, the different training sets were used to generate LELT matrices. Third, the LELT matrices and the 12-lead ECG data in DTest were used for the estimation of the Frank VCGs. Forth, the root-mean-squared-error values between the QRS-T complexes of recorded and estimated Frank VCG leads were calculated. Fifth, the SSV of the estimation performance was quantified by calculating the standard deviation of the root-mean-squared-error values associated with each LELT. Sixth, a right-tailed t-test (significance level alpha = 0.05) was used to determine the minimal size of the training set, at which it was not possible to rule out that 95% of the mean reduction in the SSV, that was observed between training set sizes of n=10 and n=370 subjects, was already reached.

The minimal size of the training set was found to be 100, 110 and 110 subjects for Frank VCG leads X, Y and Z respectively.

Our findings suggest that a minimal training set size of 110 subjects should be used when aiming to minimize the SSV of LELTs that are applied to the 12-lead ECG.