Cardiovascular diseases (CVDs) are the world's largest cause of mortality. There is consequently a great need for early and accurate detection of CVDs. To enhance the accuracy of CVD detection from 12-lead electrocardiograms (ECGs), we propose a novel Hand-Crafted Convolution Neural Network-Transformer Network (HC-CNN-TN) model. We used the PTB-XL, which is a large public dataset consisting of ECG data in 5 superclasses and 23 subclasses. The data consists of thousands of 10-second 12-lead ECGs uniformly sampled at 100 Hz. Solving the problem of detecting the category of an ECG from this dataset is a multi-label classification task with imbalanced data. We used a CNN-Transformer to extract high-dimension features and time-dependent patterns. To enhance model performance, the hand-crafted features we extracted from the 12-lead ECGs are the QRS-complex, RR interval, heart rate, T wave and P wave. The weighted loss function is used to handle the imbalance. We achieved a precision of 0.75, recall of 0.76, F1-score of 0.76, and macro Area Under Curve (AUC) of 0.933. The same evaluation matrix is adopted for subclass classification, which produced similar results. Thus, the proposed model, which integrates hand-crafted and deep learning features, provides a promising way to classify multiple CVD categories with unbalanced samples.