Abstract

The fast development of next generation sequencing (NGS) has dramatically increased the application of metagenomics in various aspects. Functional annotation is a major step in the metagenomics studies. Fast annotation of functional genes has been a challenge because of the deluge of NGS data and expanding databases. A hybrid annotation pipeline proposed previously for taxonomic assignments was evaluated in this study for metagenomic sequences annotation of specific functional genes, such as antibiotic resistance genes, arsenic resistance genes and key genes in nitrogen metabolism. The hybrid approach using UBLAST and BLASTX is 44–177 times faster than direct BLASTX in the annotation using the small protein database for the specific functional genes, with the cost of missing a small portion (<1.8%) of target sequences compared with direct BLASTX hits. Different from direct BLASTX, the time required for specific functional genes annotation using the hybrid annotation pipeline depends on the abundance for the target genes. Thus this hybrid annotation pipeline is more suitable in specific functional genes annotation than in comprehensive functional genes annotation.

Highlights

  • In recent years, the rapid development of generation sequencing (NGS) has broadened the application of metagenomics in various aspects of biological research [1]

  • RAPSearch2 was one of the ultra-fast tools in database search and only have a small portion of missed sequences when compared to direct BLASTX [9,10]

  • We made a comparison of annotation result from RAPSearch2 and UBLAST to evaluate their speed and annotation accuracy first

Read more

Summary

Introduction

The rapid development of generation sequencing (NGS) has broadened the application of metagenomics in various aspects of biological research [1]. The reduction of DNA sequencing cost has surpassed the rate predicted by Moore’s law [2]. More NGS sequences were generated in the 1000 genomes project within its first 6 months than the sequence data accumulated in NCBI Genbank database over two decades [3]. The deluge of NGS data poses higher requirement on computational resource for data analysis, which became the bottleneck for metagenomic analysis other than the sequencing cost. It may take months to analyze these data, for annotation of the overall functions of these genes. Besides the time cost of metagenomic analysis, cost of computational resources is getting higher for handling the overwhelming increase of data generated, not to mention the hardly quantifiable human resources needed for metagenomic data analysis currently [2]

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.