Congenital heart disease (CHD) is a major cause of death for newborns, especially in low resources countries, due to limited access to heart specialists for timely diagnosis. A promising step for scale, maintaining consistency and affordability, is to develop automatic detection of CHD with a machine-learning model. In this work, to generate an auscultation-like analysis, we used the PhysioNet 2022 Challenge dataset with 5282 heart sounds collected from 1568 children from Brazil that were recorded from multiple auscultation locations and annotated by specialists as having present, absent, or unknown murmurs. Our analysis found that the Mel spectrogram from normal and abnormal heart sounds have distinguishable visual clues. We aimed to mimic the human visual cortex by using a two-stage deep learning framework to generate classification on the patient level. After audio preprocessing, in Stage 1, we deploy a convolutional neural network to classify each Mel spectrogram from a 2-second audio segment on having murmurs or not. Aiming to have a final classification for each patient, in Stage 2, we use the model in Stage 1 as a feature extractor and concatenated multiple of its embeddings generated in its hidden layer to form the input of a feed-forward neural network. Our approach reached, on the hidden test set of the challenge, a weighted accuracy score of 0.699. In our internal 5-fold cross-validation experiments, our approach reached a sensitivity of 0.76 ± 0.10 and a specificity of 0.85 ± 0.11. We showed that our model was 77% better than a random forest model on aggregate audio data and patient demographics. We acknowledge that the current algorithm is not as accurate as experts in detecting congenital heart disease. Still, we demonstrate that automatic auscultation of congenital heart disease is possible and has incredible potential in low resources areas to improve CHD diagnosis.