Machine Learned Replacement of N-Labels for Basecalled Sequences in DNA Barcoding.

Eddie Y T Ma,Stefan C Kremer,Sujeevan Ratnasingham

doi:10.1109/tcbb.2016.2598752

Abstract

This study presents a machine learning method that increases the number of identified bases in Sanger Sequencing. The system post-processes a KB basecalled chromatogram. It selects a recoverable subset of N-labels in the KB-called chromatogram to replace with basecalls (A,C,G,T). An N-label correction is defined given an additional read of the same sequence, and a human finished sequence. Corrections are added to the dataset when an alignment determines the additional read and human agree on the identity of the N-label. KB must also rate the replacement with quality value of in the additional read. Corrections are only available during system training. Developing the system, nearly 850,000 N-labels are obtained from Barcode of Life Datasystems, the premier database of genetic markers called DNA Barcodes. Increasing the number of correct bases improves reference sequence reliability, increases sequence identification accuracy, and assures analysis correctness. Keeping with barcoding standards, our system maintains an error rate of percent. Our system only applies corrections when it estimates low rate of error. Tested on this data, our automation selects and recovers: 79 percent of N-labels from COI (animal barcode); 80 percent from matK and rbcL (plant barcodes); and 58 percent from non-protein-coding sequences (across eukaryotes).

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Machine Learned Replacement of N-Labels for Basecalled Sequences in DNA Barcoding.

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM transactions on computational biology and bioinformatics

Lead the way for us

Journal: IEEE/ACM transactions on computational biology and bioinformatics	Publication Date: Aug 11, 2016
Citations: 43

Similar Papers

Gap analysis of DNA barcoding in ERMS reference libraries for ascidians and cnidarians
Guy Paz ... Baruch Rinkevich
Environmental Sciences Europe | VOL. 33
Guy Paz, et. al.Guy Paz ... Baruch Rinkevich
09 Jan 2021
Environmental Sciences Europe | VOL. 33

Enhancing DNA barcode reference libraries by harvesting terrestrial arthropods at the Smithsonian's National Museum of Natural History.
...
Biodiversity Data Journal | VOL. 11
, et. al. ...
24 Apr 2023
Biodiversity Data Journal | VOL. 11

The identity of Argyrialacteella (Fabricius, 1794) (Lepidoptera, Pyraloidea, Crambinae), synonyms, and related species revealed by morphology and DNA capture in type specimens.
Bernard Landry ... Julia Bilat
ZooKeys | VOL. 1146
Bernard Landry, et. al.Bernard Landry ... Julia Bilat
07 Feb 2023
ZooKeys | VOL. 1146

Utility of GenBank and the Barcode of Life Data Systems (BOLD) for the identification of forensically important Diptera from Belgium and France
Gontran Sonet ... Marc De Meyer
ZooKeys | VOL. 365
Gontran Sonet, et. al.Gontran Sonet ... Marc De Meyer
30 Dec 2014
ZooKeys | VOL. 365

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Machine Learned Replacement of N-Labels for Basecalled Sequences in DNA Barcoding.

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM transactions on computational biology and bioinformatics