Abstract
AbstractLarge phylogenies derived from publicly available genetic sequences are becoming a popular and indispensable tool in addressing core questions in ecology and evolution, as well as in tackling challenging conservation issues. Optimizing taxonomic coverage and data quality is essential for improving the precision and reliability of phylogenetic reconstructions and evolutionary inferences. Here we present PyNCBIminer, a user‐friendly software that automates the assembly of large DNA data sets from GenBank for phylogenetic reconstruction using the supermatrix method. PyNCBIminer uses the iterative BLAST procedure to retrieve genetic sequences accurately and efficiently from GenBank. The state‐of‐the‐art strategies also serve to improve taxa coverage and the quality of target DNA markers. PyNCBIminer is designed to efficiently handle large data sets, but it is also suitable for medium and small data sets. It is open source and freely available at GitHub (https://github.com/Xiaoting-Xu/PyNCBIminer) and Gitee (https://gitee.com/xiaotingxu/PyNCBIminer). Its utility and performance are demonstrated through the assembly of phylogenetic data sets encompassing several genetic markers of varying sizes for the angiosperm order Dipsacales. PyNCBIminer holds an advantage over similar programs in that it performs the majority of computations on the NCBI server, eliminating the need for users to build and maintain large local databases and reducing the demands on their computers. In addition, it integrates other commonly used phylogenetic analysis software, providing users from various backgrounds with convenient options for retrieving and assembling GenBank sequence data, along with flexible features that allow for user‐defined parameters and strategies.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have