Aims: Two deep learning models are trained and evaluated on dataset (A) and then applied to a separate dataset (B) comprising Late Gadolinium Enhanced Cardiac Magnetic Resonance Images (LGE-CMRI) for myocardium segmentation. Due to variations in pixel value distributions between the two datasets, different scaling strategies for preprocessing the images before the model applications are explored for dataset B.
Methods: Two deep learning models are trained on data set A: a Modified U-Net adding residual connections (RES_MOD) and a Multiresolution U-Net (MUL_MOD). The maximum pixel value, pv_m, over all slices for each patient specific image set is determined. To standardize pixel values in dataset (B), scaling is performed by dividing each by various values, including s_std=256, s_max=pv_m, s_floor=2^⌊log2(pv_m)⌋, s_round=2^⌊log2(pv_m)+1/2⌋ and s_ceil=2^⌈log2(pv_m)⌉. For each set of scaled images, the two models are applied. The training and evaluation were conducted with s_std scaling on dataset A comprising MRI scans from 51 patients, divided into training (30), test (13), and validation (8) sets. Dataset B includes MRI scans from 58 patients.
Results: The Jaccard scores (mean(SD)) on the test data for dataset A were consistent across both models, with 0.73(0.01). However, for dataset B, the MUL_MOD, with s_floor-scaling, demonstrated the highest score of 0.60(0.14), outperforming all other scaling methods and models. Scaling with s_std yielded a score of 0.58(0.19). The corresponding scores for RES_MOD were 0.56(0.18) and 0.54(0.18) for s_std and s_floor scaling, respectively.
Conclusion: This study highlights the critical significance of the dataset when assessing a model trained and tested on one dataset against another with varying pixel value distributions. The findings indicate that scaling could be effectively employed in semi-automated segmentation processes.