Lung cancer is a prevalent disease, with nearly 238,000 new cases diagnosed in 2023. This study utilizes clinical predictors from a Kaggle dataset containing 309 observations across 15 variables to aid in lung cancer diagnosis. The variables include swallowing difficulty, peer pressure, gender, allergy, yellow fingers, anxiety, wheezing, alcohol consumption, chronic disease, chest pain, coughing, fatigue, smoking, age, and shortness of breath. The research aims to develop and compare various supervised machine learning models for classifying and predicting lung cancer, while also identifying key clinical tests and parameters using unsupervised statistical models. The dataset was divided into training and test sets, balanced, and preprocessed for unbiased training. Feature selection and machine learning models were applied to identify crucial predictors. The study explored tree models, logistic regression, Naïve Bayes, support vector machine (SVM), ensemble, neural network, and kernel models. Among these, the linear SVM achieved the highest accuracy of 93.75% with 5-fold cross-validation. However, it showed overfitting, with a lower test accuracy of 82.55%. The Gaussian Naïve Bayes model emerged as the optimal choice, providing consistent performance between validation and test cases. It achieved the highest cross-validation classification accuracy of 82.81% using only 9 variables: swallowing difficulty, peer pressure, gender, allergy, yellow fingers, anxiety, wheezing, alcohol consumption, and chronic disease. This model allows for effective training with fewer predictors without compromising classification