Efficient COI barcoding using high throughput single-end 400\u2009bp sequencing

Chentao Yang,David G Bourne,Wei Rao,Guanliang Meng,Ai-Bing Zhang,Shanlin Liu,Ao Chen,Xinrui Jia,Caiqing Yang,Shangjin Tan,Sha Liao,Junqiang Xu,Paul A O’Brien,Yuxuan Zheng,Xiaowei Chen

doi:10.1186/s12864-020-07255-w

Abstract

BackgroundOver the last decade, the rapid development of high-throughput sequencing platforms has accelerated species description and assisted morphological classification through DNA barcoding. However, the current high-throughput DNA barcoding methods cannot obtain full-length barcode sequences due to read length limitations (e.g. a maximum read length of 300 bp for the Illumina’s MiSeq system), or are hindered by a relatively high cost or low sequencing output (e.g. a maximum number of eight million reads per cell for the PacBio’s SEQUEL II system).ResultsPooled cytochrome c oxidase subunit I (COI) barcodes from individual specimens were sequenced on the MGISEQ-2000 platform using the single-end 400 bp (SE400) module. We present a bioinformatic pipeline, HIFI-SE, that takes reads generated from the 5′ and 3′ ends of the COI barcode region and assembles them into full-length barcodes. HIFI-SE is written in Python and includes four function modules of filter, assign, assembly and taxonomy. We applied the HIFI-SE to a set of 845 samples (30 marine invertebrates, 815 insects) and delivered a total of 747 fully assembled COI barcodes as well as 70 Wolbachia and fungi symbionts. Compared to their corresponding Sanger sequences (72 sequences available), nearly all samples (71/72) were correctly and accurately assembled, including 46 samples that had a similarity score of 100% and 25 of ca. 99%.ConclusionsThe HIFI-SE pipeline represents an efficient way to produce standard full-length barcodes, while the reasonable cost and high sensitivity of our method can contribute considerably more DNA barcodes under the same budget. Our method thereby advances DNA-based species identification from diverse ecosystems and increases the number of relevant applications.

Highlights

Over the last decade, the rapid development of high-throughput sequencing platforms has accelerated species description and assisted morphological classification through DNA barcoding
We explore the potential of the MGISEQ single-end 400 bp (SE400) sequencing in DNA barcode reference construction and quick species identification, and provide an updated HIFI-SE barcode software package that can generate c oxidase subunit I (COI) barcode assemblies using high-throughput sequencing (HTS) reads of 400 bp length
For the same 96 samples our pipeline produced a total of 12,745,067 HTS SE400 reads that were retained after quality control and around 77.9% (9, 870,823) of reads were assigned to their corresponding samples at either the 5′ or 3′ end

Summary

Introduction

The rapid development of high-throughput sequencing platforms has accelerated species description and assisted morphological classification through DNA barcoding. The current highthroughput DNA barcoding methods cannot obtain full-length barcode sequences due to read length limitations (e.g. a maximum read length of 300 bp for the Illumina’s MiSeq system), or are hindered by a relatively high cost or low sequencing output (e.g. a maximum number of eight million reads per cell for the PacBio’s SEQUEL II system). Since it was first proposed by Hebert et al [1], DNA barcoding has attracted global synergistic efforts resulting in well-curated and centralized reference databases. Tissues sampled by minimal or non-invasive methods cannot be identified morphologically and an efficient method for species identification will benefit the sample pre-treatment and selection for large-scale genome resequencing studies

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Genomics	Publication Date: Dec 1, 2020
Citations: 22	License type: open-access

R Discovery Prime

R Discovery Prime

Efficient COI barcoding using high throughput single-end 400\u2009bp sequencing

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Genomics

Lead the way for us

Similar Papers

DNA barcoding commercially important aquatic invertebrates of Turkey
Emre Keskin ... Hasan Hüseyin Atar
Mitochondrial DNA | VOL. 24
Emre Keskin, et. al.Emre Keskin ... Hasan Hüseyin Atar
06 Feb 2013
Mitochondrial DNA | VOL. 24

A new method for species identification via protein-coding and non-coding DNA barcodes by combining machine learning with bioinformatic methods.
Ai-Bing Zhang ... Qiang Gao
PLoS ONE | VOL. 7
Ai-Bing Zhang, et. al.Ai-Bing Zhang ... Qiang Gao
20 Feb 2012
PLoS ONE | VOL. 7

Revealing the biodiversity of Chilean birds through the COI barcode approach.
Nelson Colihueque ... Margarita Parraguez
ZooKeys | VOL. 1016
Nelson Colihueque, et. al.Nelson Colihueque ... Margarita Parraguez
11 Feb 2021
ZooKeys | VOL. 1016

The origin of the Tibetan Mastiff and species identification of Canis based on mitochondrial cytochrome c oxidase subunit I (COI) gene and COI barcoding
Y Li ... Q Li
Animal | VOL. 5
Y Li, et. al.Y Li ... Q Li
01 Jan 2010
Animal | VOL. 5

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Efficient COI barcoding using high throughput single-end 400\u2009bp sequencing

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Genomics