Purpose This paper aims to identify a suitable convolutional neural network (CNN) model to analyse where void(s) are formed in asymmetrical flip-chips with large amounts of the ball-grid array (BGA) during underfilling. Design/methodology/approach A set of void(s)-filled through-scan acoustic microscope (TSAM) images of BGA underfill is collected, labelled and used to train two CNN models (You Look Only Once version 5 (YOLOv5) and Mask RCNN). Otsu's thresholding method is used to calculate the void percentage, and the model's performance in generating the results with its accuracy relative to real-scale images is evaluated. Findings All discoveries were authenticated concerning previous studies on CNN model development to encapsulate the shape of the void detected combined with calculating the percentage. The Mask RCNN is the most suitable model to perform the image segmentation analysis, and it closely matches the void presence in the TSAM image samples up to an accuracy of 94.25% of the entire void region. The model's overall accuracy of RCNN is 96.40%, and it can display the void percentage by 2.65 s on average, faster than the manual checking process by 96.50%. Practical implications The study enabled manufacturers to produce a feasible, automated means to improve their flip-chip underfilling production quality control. Leveraging an optimised CNN model enables an expedited manufacturing process that will reduce lead costs. Originality/value BGA void formation in a flip-chip underfilling process can be captured quantitatively with advanced image segmentation.