Melanoma is a skin cancer that spreads quickly and has serious risks. Early diagnosis is essential, but since the symptoms of skin lesions in the early stages are vague and similar, they can be difficult for specialists to detect. Therefore, machine learning-based alternative diagnostic methods can be developed in addition to existing ones. This study proposes a new deep learning model, a modified lightweight vision transformer (ViT) architecture, and a hybrid framework developed with an integrated deep learning model and an Ensemble Learning (EL) model for the early-stage diagnosis of skin lesions. The proposed deep learning model was developed based on convolution layers and transformers. The model is called multi-head attention block depthwise separable convolution network (MABSCNET). The proposed hybrid framework was developed by combining modern deep learning and EL models pre-trained with the ImageNet dataset along with the MABSCNET model. In the experimental process, the effectiveness of the proposed methods was evaluated on the ISIC 2020 dataset. Additionally, additional experiments were conducted on ISIC 2018 and a Kaggle dataset to analyze the proposed hybrid framework's classification performance. Image enhancement techniques were used in the datasets. In the ISIC 2020 dataset, the MABSCNET model reached 78.63 % accuracy, the ViT model obtained 76.50 %, and the hybrid framework reached 92.74 % accuracy. Moreover, the proposed hybrid framework achieved 100 % on the ISIC 2018 dataset and 94.24 % on the Kaggle dataset.