Abstract
Abstract Background: To realize the promise of precision medicine for cancer, assessing genetic variation present in rare cells and understanding the role that these cells play in the evolution of tumor progression is essential. High throughput single cell DNA targeted sequencing enables detection of rare mutations in cells and identification of subclones defined by co-occurrence of mutations. The big challenge with multiplex sequencing at single cell level is the non-uniform amplification of the targeted regions during PCR. This results in inadequate coverage of mutations of interest in the panel and hence makes genotyping challenging. To address this challenge, we developed a machine learning engine to optimize amplicon design for uniform amplification by making reliable performance prediction. Methods: Multiple panels with various sizes were designed with amplicons spanning a wide range of design properties such as amplicon GC, length, secondary structure prediction, primer specificity. These panels were synthesized and processed through Tapestri single cell DNA platform. The tested amplicons are classified into low-performer, OK-performer and high flyer based on their normalized reads-per-cell value. Design properties and property distribution of the amplicons and the panel are the features. We used random forest classifier to calculate feature importance and analyzed the range of the top features for each class and their significance of variance between classes. These ranges were then used as parameters in the assay design pipeline. Next, we train machine learning models with performance data to develop a performance prediction engine. Results: To test the performance of the design pipeline with new parameters, we designed a small (31), medium (128) and large (287) amplicon panel. Multiple runs were conducted for each panel with different cell types. We were able to achieve high panel performance of 97%, 92% and 88% across the three panels. The new parameters resulted in ~10-20% improvement in panel uniformity. We are working on further optimizing the performance prediction engine by using different ML classification models with K-fold cross validation, training using larger group of amplicons and optimizing features using combinations of properties. Citation Format: Shu Wang, Saurabh Gulati, Dong Kim, Sombeet Sahu, Saurabh Parikh, Nianzhen Li, Manimozhi Manivannan, Nigel Beard. Amplicon design algorithm for single cell targeted DNA sequencing using machine learning [abstract]. In: Proceedings of the Annual Meeting of the American Association for Cancer Research 2020; 2020 Apr 27-28 and Jun 22-24. Philadelphia (PA): AACR; Cancer Res 2020;80(16 Suppl):Abstract nr 2109.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have