Optimizing breast cancer diagnosis: Harnessing the power of nature-inspired metaheuristics for feature selection with soft voting classifiers

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

Optimizing breast cancer diagnosis: Harnessing the power of nature-inspired metaheuristics for feature selection with soft voting classifiers

Similar Papers
  • PDF Download Icon
  • Research Article
  • Cite Count Icon 18
  • 10.1007/s44196-024-00428-5
Improving Breast Cancer Diagnosis Accuracy by Particle Swarm Optimization Feature Selection
  • Mar 13, 2024
  • International Journal of Computational Intelligence Systems
  • Reihane Kazerani

Breast cancer has been one of the leading causes of death among women in the world. Early detection of this disease can save patient’s lives and reduce mortality. Due to the large number of features involved in the diagnosis of this disease, the breast cancer diagnosis process can be time consuming. To reduce cost and time and improving accuracy of breast cancer diagnosis, this paper propose a feature selection algorithm based on particle swarm optimization (PSO) combined with machine learning methods for selection the most effective features for breast cancer diagnosis among all features. In order to evaluate the efficiency of the proposed feature selection method, it was tested on three most common breast cancer datasets available in the University of California, Irvine (UCI) repository named: Coimbra dataset (CD), Wisconsin Diagnostic Breast Cancer dataset (WDBC) and Wisconsin Prognostic Breast Cancer dataset (WPBC). In the Coimbra dataset with all its 9 features and without PSO feature selection algorithm the highest obtained accuracy was 87% by Support Vector Machine method, while with PSO feature selection algorithm the accuracy reached to 91% and the number of features was reduced from 9 to 4. In the WDBC dataset with all its 30 features and without PSO feature selection algorithm the highest obtained accuracy was 99% by Random Forest method, while with PSO feature selection algorithm the accuracy reached to 100% and the number of features was reduced from 30 to 19. In the WPBC dataset with all its 33 features and without PSO feature selection algorithm the highest obtained accuracy was 94% by Support Vector Machine method, while with PSO feature selection algorithm the accuracy reached to 96% and the number of features was reduced from 33 to 17. The results of this paper indicated that the proposed feature selection algorithm based on PSO algorithm can improve the accuracy of breast cancer diagnosis. While it has selected fewer and more effective features than the total number of features in the original datasets.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 24
  • 10.1038/s41598-021-00854-x
Application of information theoretic feature selection and machine learning methods for the development of genetic risk prediction models
  • Dec 1, 2021
  • Scientific Reports
  • Farideh Jalali-Najafabadi + 16 more

In view of the growth of clinical risk prediction models using genetic data, there is an increasing need for studies that use appropriate methods to select the optimum number of features from a large number of genetic variants with a high degree of redundancy between features due to linkage disequilibrium (LD). Filter feature selection methods based on information theoretic criteria, are well suited to this challenge and will identify a subset of the original variables that should result in more accurate prediction. However, data collected from cohort studies are often high-dimensional genetic data with potential confounders presenting challenges to feature selection and risk prediction machine learning models. Patients with psoriasis are at high risk of developing a chronic arthritis known as psoriatic arthritis (PsA). The prevalence of PsA in this patient group can be up to 30% and the identification of high risk patients represents an important clinical research which would allow early intervention and a reduction of disability. This also provides us with an ideal scenario for the development of clinical risk prediction models and an opportunity to explore the application of information theoretic criteria methods. In this study, we developed the feature selection and psoriatic arthritis (PsA) risk prediction models that were applied to a cross-sectional genetic dataset of 1462 PsA cases and 1132 cutaneous-only psoriasis (PsC) cases using 2-digit HLA alleles imputed using the SNP2HLA algorithm. We also developed stratification method to mitigate the impact of potential confounder features and illustrate that confounding features impact the feature selection. The mitigated dataset was used in training of seven supervised algorithms. 80% of data was randomly used for training of seven supervised machine learning methods using stratified nested cross validation and 20% was selected randomly as a holdout set for internal validation. The risk prediction models were then further validated in UK Biobank dataset containing data on 1187 participants and a set of features overlapping with the training dataset.Performance of these methods has been evaluated using the area under the curve (AUC), accuracy, precision, recall, F1 score and decision curve analysis(net benefit). The best model is selected based on three criteria: the ‘lowest number of feature subset’ with the ‘maximal average AUC over the nested cross validation’ and good generalisability to the UK Biobank dataset. In the original dataset, with over 100 different bootstraps and seven feature selection (FS) methods, HLA_C_*06 was selected as the most informative genetic variant. When the dataset is mitigated the single most important genetic features based on rank was identified as HLA_B_*27 by the seven different feature selection methods, consistent with previous analyses of this data using regression based methods. However, the predictive accuracy of these single features in post mitigation was found to be moderate (AUC= 0.54 (internal cross validation), AUC=0.53 (internal hold out set), AUC=0.55(external data set)). Sequentially adding additional HLA features based on rank improved the performance of the Random Forest classification model where 20 2-digit features selected by Interaction Capping (ICAP) demonstrated (AUC= 0.61 (internal cross validation), AUC=0.57 (internal hold out set), AUC=0.58 (external dataset)). The stratification method for mitigation of confounding features and filter information theoretic feature selection can be applied to a high dimensional dataset with the potential confounders.

  • Research Article
  • Cite Count Icon 173
  • 10.1016/j.aej.2021.03.048
Deep learning in mammography images segmentation and classification: Automated CNN approach
  • Apr 5, 2021
  • Alexandria Engineering Journal
  • Wessam M Salama + 1 more

Deep learning in mammography images segmentation and classification: Automated CNN approach

  • Research Article
  • Cite Count Icon 28
  • 10.1016/j.health.2023.100218
A Light Gradient-Boosting Machine algorithm with Tree-Structured Parzen Estimator for breast cancer diagnosis
  • Jun 23, 2023
  • Healthcare Analytics
  • Temidayo Oluwatosin Omotehinwa + 2 more

A Light Gradient-Boosting Machine algorithm with Tree-Structured Parzen Estimator for breast cancer diagnosis

  • Conference Article
  • 10.1109/iccta37466.2015.9513449
Accuracy Improvement of WPBC Dataset-Based Breast Cancer Diagnosis
  • Oct 24, 2015
  • Magdy Abd-Elghany Zeid + 2 more

This paper discusses two different methods to improve the resultant accuracy of Breast Cancer Diagnosis. The methods were applied to Wisconsin Prognosis Breast Cancer (WPBC) dataset. The first method generates a reduced dataset feature vector by deriving it from the available feature vector of WPBC dataset. The second one generates a new dataset that combines the Wisconsin Diagnosis Breast Cancer (WDBC) and WPBC datasets and apply the different classifiers systems on the generated dataset by using classification accuracy and Area Under the Curve (AUC) metrics based on confusion matrix of 10-fold cross validation method. In addition, we introduce a fusion at the classification level between these classifiers to get the most suitable multi-classifier approach for each data set. Reducing of WPBC dataset size gives better outcomes through reducing the number of features. The experiments were performed considering the following classifiers individually (decision tree (J48), Multi-Layer Perception (MLP), Naive Bayes (NB), Sequential Minimal Optimization (SMO), and Instance Based for K-Nearest neighbor (IBK)) along with their possible combinations. The results were very promising, for example: The Probabilistic Unification of Heterogeneous Recurrent Malignancy (PUHRM) Dataset has up to 86.17 % improvement in the accuracy. All experiments are conducted in WEKA data mining tool.

  • Research Article
  • Cite Count Icon 1
  • 10.22044/jadm.2018.6489.1763
A New Knowledge-Based System for Diagnosis of Breast Cancer by a combination of the Affinity Propagation and Firefly Algorithms
  • Jan 1, 2019
  • Journal of AI and Data Mining
  • Nasibeh Emami + 1 more

Breast cancer has become a widespread disease around the world in young women. Expert systems, developed by data mining techniques, are valuable tools in diagnosis of breast cancer and can help physicians for decision making process. This paper presents a new hybrid data mining approach to classify two groups of breast cancer patients (malignant and benign). The proposed approach, AP-AMBFA, consists of two phases. In the first phase, the Affinity Propagation (AP) clustering method is used as instances reduction technique which can find noisy instance and eliminate them. In the second phase, feature selection and classification are conducted by the Adaptive Modified Binary Firefly Algorithm (AMBFA) for selection of the most related predictor variables to target variable and Support Vectors Machine (SVM) technique as classifier. It can reduce the computational complexity and speed up the data mining process. Experimental results on Wisconsin Diagnostic Breast Cancer (WDBC) datasets show higher predictive accuracy. The obtained classification accuracy is 98.606%, a very promising result compared to the current state-of-the-art classification techniques applied to the same database. Hence this method will help physicians in more accurate diagnosis of breast cancer.

  • Research Article
  • 10.3389/fmed.2025.1644857
A robust stacked neural network approach for early and accurate breast cancer diagnosis
  • Oct 16, 2025
  • Frontiers in Medicine
  • Xinkang Li + 8 more

Timely and accurate diagnosis of breast cancer remains a critical clinical challenge. In this study, we propose Stacked Artificial Neural Network (StackANN), a robust stacking ensemble framework that integrates six classical machine learning classifiers with an Artificial Neural Network (ANN) meta-learner to enhance diagnostic precision and generalization. By incorporating the Synthetic Minority Over-Sampling Technique (SMOTE) to address class imbalance and employing SHapley Additive exPlanations (SHAP) for model interpretability. StackANN was comprehensively evaluated on Wisconsin Diagnostic Breast Cancer (WDBC) datasets, Ljubljana Breast Cancer (LBC) datasets and Wisconsin Breast Cancer Dataset (WBCD), as well as the METABRIC2 dataset for multi-subtype classification. Experimental results demonstrate that StackANN consistently outperforms individual classifiers and existing hybrid models, achieving near-perfect Recall and Area Under the Curve (AUC) values while maintaining balanced overall performance. Importantly, feature attribution analysis confirmed strong alignment with clinical diagnostic criteria, emphasizing tumor malignancy, size, and morphology as key determinants. These findings highlight StackANN as a reliable, interpretable, and clinically relevant tool with significant potential for early screening, subtype classification, and personalized treatment planning in breast cancer care.

  • Research Article
  • Cite Count Icon 61
  • 10.1007/s00521-021-05997-6
A hybrid artificial bee colony with whale optimization algorithm for improved breast cancer diagnosis
  • May 7, 2021
  • Neural Computing and Applications
  • Punitha Stephan + 3 more

Breast cancer is the most common among women that leads to death if not diagnosed at early stages. Early diagnosis plays a vital role in decreasing the mortality rate globally. Manual methods for diagnosing breast cancers suffer from human errors and inaccuracy, and consume time. A computer-aided diagnosis (CAD) can overcome the disadvantages of manual methods and helps radiologists for accurate decision-making. A CAD system based on artificial neural network (ANN) optimized using a swarm-based approach can improve the accuracy of breast cancer diagnosis due to its strong prediction capabilities. Artificial bee colony (ABC) and whale optimization are metaheuristic search algorithms used to solve combinatorial optimization problems. This paper proposes a hybrid artificial bee colony with whale optimization algorithm (HAW) by integrating the exploitative employee bee phase of ABC with the bubble net attacking method of whale optimization to propose an employee bee attacking phase. In the employee bee attacking phase, employee bees use exploitation of humpback whales for finding better food source positions. The weak exploration of standard ABC is improved using the proposed mutative initialization phase that forms the explorative phase of the HAW algorithm. HAW algorithm is used in simultaneous feature selection (FS) and parameter optimization of an ANN model. HAW is implemented using backpropagation learning that includes resilient backpropagation (HAW-RP), Levenberg–Marquart (HAW-LM) and momentum-based gradient descent (HAW-GD). These hybrid variants are evaluated using various breast cancer datasets in terms of accuracy, complexity and computational time. HAW-RP variant achieved higher accuracy of 99.2%, 98.5%, 96.3%, 98.8%, 98.7% and 99.1% with low-complexity ANN model when compared to HAW-LM and HAW-GD for WBCD, WDBC, WPBC, DDSM, MIAS and INbreast, respectively.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 5
  • 10.1007/s10238-022-00944-8
Improving the malignancy prediction of breast cancer based on the integration of radiomics features from dual-view mammography and clinical parameters.
  • Nov 21, 2022
  • Clinical and Experimental Medicine
  • Chenyi Zhou + 5 more

Radiomics has been a promising imaging biomarker for many malignant diseases. We developed a novel radiomics strategy that incorporating radiomics features extracted from dual-view mammograms and clinical parameters for identifying benign and malignant breast lesions, and validated whether the radiomics assessment could improve the accurate diagnosis of breast cancer. A total of 380 patients (mean age, 52 ± 7years) with 621 breast lesions utilizing mammograms on craniocaudal (CC) and mediolateral oblique (MLO) views were randomly allocated into the training (n = 486) and testing (n = 135) sets in this retrospective study. A total of 1184 and 2368 radiomics features were extracted from single-position region of interest (ROI) and position-paired ROI, separately. Clinical parameters were then combined for better prediction. Recursive feature elimination and least absolute shrinkage and selection operator methods were applied to select optimal predictive features. Random forest was used to conduct the predictive model. Intraclass correlation coefficient test was used to assess repeatability and reproducibility of features. After preprocessing, 467 radiomics features and clinical parameters remained in the single-view and dual-view models. The performance and significance of models were quantified by the area under the curve (AUC), sensitivity, specificity, and accuracy. The correlation analysis between variables was evaluated using the correlation ratio and Pearson correlation coefficient. The model using a combination of dual-view radiomics and clinical parameters achieved a favorable performance (AUC: 0.804, 95% CI: 0.668-0.916), outperformed single-view model and model without clinical parameters. Incorporating with radiomics features of dual-view (CC&MLO) mammogram, age, breast density, and type of suspicious lesions can provide a noninvasive approach to evaluate the malignancy of breast lesions and facilitate clinical decision-making.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 11
  • 10.32604/csse.2023.030463
A Framework of Deep Learning and Selection-Based Breast Cancer Detection from Histopathology Images
  • Jan 1, 2023
  • Computer Systems Science and Engineering
  • Muhammad Junaid Umer + 5 more

Breast cancer (BC) is a most spreading and deadly cancerous malady which is mostly diagnosed in middle-aged women worldwide and effecting beyond a half-million people every year. The BC positive newly diagnosed cases in 2018 reached 2.1 million around the world with a death rate of 11.6% of total cases. Early diagnosis and detection of breast cancer disease with proper treatment may reduce the number of deaths. The gold standard for BC detection is biopsy analysis which needs an expert for correct diagnosis. Manual diagnosis of BC is a complex and challenging task. This work proposed a deep learning-based (DL) solution for the early detection of this deadly disease from histopathology images. To evaluate the robustness of the proposed method a large publically available breast histopathology image database containing a total of 277524 histopathology images is utilized. The proposed automatic diagnosis of BC detection and classification mainly involves three steps. Initially, a DL model is proposed for feature extraction. Secondly, the extracted feature vector (FV) is passed to the proposed novel feature selection (FS) framework for the best FS. Finally, for the classification of BC into invasive ductal carcinoma (IDC) and normal class different machine learning (ML) algorithms are used. Experimental outcomes of the proposed methodology achieved the highest accuracy of 92.7% which shows that the proposed technique can successfully be implemented for BC detection to aid the pathologists in the early and accurate diagnosis of BC.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 46
  • 10.3390/cancers13061249
Breast Tumor Characterization Using [18F]FDG-PET/CT Imaging Combined with Data Preprocessing and Radiomics.
  • Mar 12, 2021
  • Cancers
  • Denis Krajnc + 13 more

Simple SummaryBreast cancer is the second most common diagnosed malignancy in women worldwide. In this study, we examine the feasibility of breast tumor characterization based on [18F]FDG-PET/CT images using machine learning (ML) approaches in combination with data-preprocessing techniques. ML prediction models for breast cancer detection and the identification of breast cancer receptor status, proliferation rate, and molecular subtypes were established and evaluated. Furthermore, the importance of most repeatable features was investigated. Results displayed high performance of malignant/benign tumor differentiation and triple negative tumor subtype ML models. We observed high repeatability of radiomic features for both high performing predictive models.Background: This study investigated the performance of ensemble learning holomic models for the detection of breast cancer, receptor status, proliferation rate, and molecular subtypes from [18F]FDG-PET/CT images with and without incorporating data pre-processing algorithms. Additionally, machine learning (ML) models were compared with conventional data analysis using standard uptake value lesion classification. Methods: A cohort of 170 patients with 173 breast cancer tumors (132 malignant, 38 benign) was examined with [18F]FDG-PET/CT. Breast tumors were segmented and radiomic features were extracted following the imaging biomarker standardization initiative (IBSI) guidelines combined with optimized feature extraction. Ensemble learning including five supervised ML algorithms was utilized in a 100-fold Monte Carlo (MC) cross-validation scheme. Data pre-processing methods were incorporated prior to machine learning, including outlier and borderline noisy sample detection, feature selection, and class imbalance correction. Feature importance in each model was assessed by calculating feature occurrence by the R-squared method across MC folds. Results: Cross validation demonstrated high performance of the cancer detection model (80% sensitivity, 78% specificity, 80% accuracy, 0.81 area under the curve (AUC)), and of the triple negative tumor identification model (85% sensitivity, 78% specificity, 82% accuracy, 0.82 AUC). The individual receptor status and luminal A/B subtype models yielded low performance (0.46–0.68 AUC). SUVmax model yielded 0.76 AUC in cancer detection and 0.70 AUC in predicting triple negative subtype. Conclusions: Predictive models based on [18F]FDG-PET/CT images in combination with advanced data pre-processing steps aid in breast cancer diagnosis and in ML-based prediction of the aggressive triple negative breast cancer subtype.

  • Research Article
  • 10.36108/jrrslasu/0202.70.0160
Random Forest Classifier for Diagnosis of Breast Cancer in African Women
  • Dec 1, 2020
  • JOURNAL OF RESEARCH AND REVIEW IN SCIENCE
  • Babafemi Macaulay

Introduction: Breast cancer is the highest cause of cancer-related mortality among women globally. It is documented that 15% of all female cancer is breast cancer. Diagnosis and treatment of breast cancer in its earliest stage remains the only way to improve its outcome and reduce mortality, thus early and accurate diagnosis of breast cancer is important. Early detection of breast cancer among women in Sub-Saharan Africa (SSA) is very challenging to say the least as factors such as low knowledge of breast cancer, lack of awareness of early detection treatment, treatment cost, poor perception of breast cancer, socio-cultural factors such as belief, traditions and fears affect health seeking behaviour of African women but there is limited research efforts in computational approach to diagnosis of breast cancer in SSA. Aim: Here, we propose a novel diagnosis model for African women using Random Forest (RF) machine learning technique. Methods: Study data comprised of technical indicators for breast cancer diagnosis, collected from breast cancer patients attending oncology clinic in Lagos State University teaching hospital. A total of 180 subjects were studied out of which 90 were confirmed cases of breast cancer and 90 were benign. Nine diagnostic parameters were included. These are clump thickness, marginal adhesion, uniformity of cell size, uniformity of cell shape, single epithelial cell, bare nuclei, bland chromatin, normal nucleoli and mitosis. Principal Component Analysis (PCA) was used for feature selection and RF model was used for classification. Results: The RF model gave an accuracy of 98.23%, sensitivity of 95.24%, and specificity of 100.00% and Area under curve (AUC) of 98%. Conclusion: The proposed Random Forest model has a good potential at classifying breast cancer in African women. Adoption of computational diagnosis approach in SSA can lead to early diagnosis and reduction of mortality rate.

  • Research Article
  • Cite Count Icon 13
  • 10.1097/md.0000000000025878
Identify the triple-negative and non-triple-negative breast cancer by using texture features of medicale ultrasonic image: A STROBE-compliant study.
  • Jun 4, 2021
  • Medicine
  • Qingyu Chen + 2 more

The study aimed to explore the value of ultrasound (US) texture analysis in the differential diagnosis of triple-negative breast cancer (TNBC) and non-TNBC.Retrospective analysis was done on 93 patients with breast cancer (35 patients with TNBC and 38 patients with non-TNBC) who were admitted to Taizhou people's hospital from July 2015 to June 2019. All lesions were pathologically proven at surgery. US images of all patients were collected. Texture analysis of US images was performed using MaZda software package. The differences between textural features in TNBC and non-TNBC were assessed. Receiver operating characteristic curve analysis was used to compare the diagnostic performance of textural parameters showing significant difference.Five optimal texture feature parameters were extracted from gray level run-length matrix, including gray level non-uniformity (GLNU) in horizontal direction, vertical gray level non-uniformity, GLNU in the 45 degree direction, run length non-uniformity in 135 degree direction, GLNU in the 135 degree direction. All these texture parameters were statistically higher in TNBC than in non-TNBC (P <.05). Receiver operating characteristic curve analysis indicated that at a threshold of 268.9068, GLNU in horizontal direction exhibited best diagnostic performance for differentiating TNBC from non-TNBC. Logistic regression model established based on all these parameters showed a sensitivity of 69.3%, specificity of 91.4% and area under the curve of 0.834.US texture features were significantly different between TNBC and non-TNBC, US texture analysis can be used for preliminary differentiation of TNBC from non-TNBC.

  • Research Article
  • Cite Count Icon 8
  • 10.11591/ijece.v13i3.pp3359-3366
Multivariate sample similarity measure for feature selection with a resemblance model
  • Jun 1, 2023
  • International Journal of Electrical and Computer Engineering (IJECE)
  • Tsehay Admassu Assegie + 3 more

Feature selection improves the classification performance of machine learning models. It also identifies the important features and eliminates those with little significance. Furthermore, feature selection reduces the dimensionality of training and testing data points. This study proposes a feature selection method that uses a multivariate sample similarity measure. The method selects features with significant contributions using a machine-learning model. The multivariate sample similarity measure is evaluated using the University of California, Irvine heart disease dataset and compared with existing feature selection methods. The multivariate sample similarity measure is evaluated with metrics such as minimum subset selected, accuracy, F1-score, and area under the curve (AUC). The results show that the proposed method is able to diagnose chest pain, thallium scan, and major vessels scanned using X-rays with a high capability to distinguish between healthy and heart disease patients with a 99.6% accuracy.

  • Research Article
  • 10.1158/1538-7445.am2022-3683
Abstract 3683: Identification of optimal set of genetic variants from a previously reported polygenic risk score for breast cancer risk prediction in Latin American women
  • Jun 15, 2022
  • Cancer Research
  • Valentina A Zavala + 11 more

Around 10% of genetic predisposition for breast cancer is explained by mutations in high/moderate penetrance genes. The remaining proportion is explained by multiple common variants of relatively small effect. A subset of these variants has been identified mostly in Europeans and Asians; and combined into polygenic risk scores (PRS) to predict breast cancer risk. Our aim is to identify a subset of variants to improve breast cancer risk prediction in Hispanics/Latinas (H/Ls).Breast cancer patients were recruited at the Instituto Nacional de Enfermedades Neoplásicas in Peru, to be part of The Peruvian Genetics and Genomics of Breast Cancer Study (PEGEN). Women without a diagnosis of breast cancer from a pregnancy outcomes study conducted in Peru were included as controls. After quality control filters, genome-wide genotypes were available for 1,809 cases and 3,334 controls. Missing genotypes were imputed using the Michigan Imputation Server using individuals from 1000 Genomes Project as reference. Genotypes for 313 previously reported breast cancer associated variants and 2 Latin American specific single nucleotide polymorphisms (SNPs) were extracted from the data, using an imputation r2 filter of 30%. Feature selection techniques were used to identify the best subset of SNPs for breast cancer prediction in Peruvian women. We randomly split the PEGEN data by 4:1 ratio for training/validation and testing. Training/validation data were resampled and split in 3:1 ratio into training and validation sets. SNP ranking and selection were done by bootstrapping results from 100 resampled training and validation sets. PRS were built by adding counts of risk alleles weighted by previously reported beta coefficients. The Area Under the Curve (AUC) was used to estimate the prediction accuracy of subsets of SNPs selected with different techniques. Logistic regression was used to test the association between standardized PRS residuals (after adjustment for genetic ancestry) and breast cancer risk. Of the 315 reported variants, 274 were available from the imputed dataset. The full 274-SNP PRS was associated with an AUC of 0.63 (95%CI=0.59-0.66) in the PEGEN study. Using different feature selection methods, we found subsets of SNPs that were associated with AUC values between 0.65-0.69. The best method (AUC=0.69, 95%CI=0.66-0.72) included a subset of 98 SNPs. Sixty-eight SNPs were selected by all methods, including the protective SNP rs140068132 in the 6q25 region, which is associated with Indigenous American ancestry and the largest contribution to the AUC.We identified a subset of 98 SNPs from a previously identified breast cancer PRS that improves breast cancer risk prediction compared to the full set, in women of high Indigenous American ancestry from Peru. Replication in women from Mexico and Colombia, and H/Ls from the U.S will allow us to confirm these results. Citation Format: Valentina A. Zavala, Tatiana Vidaurre, Xiaosong Huang, Sandro Casavilca, Jeannie Navarro, Michelle A. Williams, Sixto Sanchez, Elad Ziv, Luis Carvajal-Carmona, Susan L. Neuhausen7, Bizu Gelaye, Laura Fejerman. Identification of optimal set of genetic variants from a previously reported polygenic risk score for breast cancer risk prediction in Latin American women [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2022; 2022 Apr 8-13. Philadelphia (PA): AACR; Cancer Res 2022;82(12_Suppl):Abstract nr 3683.

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.