The rapid expansion of e-commerce, particularly in the clothing sector, has led to a significant demand for an effective clothing industry. This study presents a novel two-stage image recognition method. Our approach distinctively combines human keypoint detection, object detection, and classification methods into a two-stage structure. Initially, we utilize open-source libraries, namely OpenPose and Dlib, for accurate human keypoint detection, followed by a custom cropping logic for extracting body part boxes. In the second stage, we employ a blend of Harris Corner, Canny Edge, and skin pixel detection integrated with VGG16 and support vector machine (SVM) models. This configuration allows the bounding boxes to identify ten unique attributes, encompassing facial features and detailed aspects of clothing. Conclusively, the experiment yielded an overall recognition accuracy of 81.4% for tops and 85.72% for bottoms, highlighting the efficacy of the applied methodologies in garment categorization.