Acute myeloid leukemia (AML) is a highly aggressive cancer form that affects myeloid cells, leading to the excessive growth of immature white blood cells (WBCs) in both bone marrow and peripheral blood. Timely AML detection is crucial for effective treatment and patient well-being. Currently, AML diagnosis relies on the manual recognition of immature WBCs through peripheral blood smear analysis, which is time-consuming, prone to errors, and subject to inter-observers' variation. This study aimed to develop a computer-aided diagnostic framework for AML, called "CAE-ResVGG FusionNet", that precisely identifies and classifies immature WBCs into their respective subtypes. The proposed framework leverages an integrated approach, by combining a convolutional autoencoder (CAE) with finely tuned adaptations of the VGG19 and ResNet50 architectures to extract features from CAE-derived embeddings. The process begins with a binary classification model distinguishing between mature and immature WBCs followed by a multiclassifier further classifying immature cells into four subtypes: myeloblasts, monoblasts, erythroblasts, and promyelocytes. The CAE-ResVGG FusionNet workflow comprises four primary stages, including data preprocessing, feature extraction, classification, and validation. The preprocessing phase involves applying data augmentation methods using geometric transformations and synthetic image generation using the CAE to address imbalance in the WBC distribution. Feature extraction involves image embedding and transfer learning, where CAE-derived image representations are used by a custom integrated model of VGG19 and ResNet50 pretrained models. The classification phase employs a weighted ensemble approach that leverages VGG19 and ResNet50, where the optimal weighting parameters are selected using a grid search. The model performance was assessed during the validation phase using the overall accuracy, precision, and sensitivity, while the area under the receiver characteristic curve (AUC) was used to evaluate the model's discriminatory capability. The proposed framework exhibited notable results, achieving an average accuracy of 99.9%, sensitivity of 91.7%, and precision of 98.8%. The model demonstrated exceptional discriminatory ability, as evidenced by an AUC of 99.6%. Significantly, the proposed system outperformed previous methods, indicating its superior diagnostic ability.