Abstract

With the increasing demand for low-cost high-throughput sequencing of large genomes, next-generation sequencing (NGS) technology has developed rapidly. NGS can not only be used in basic scientific research but also in clinical diagnostics and healthcare. Numerous software systems and tools have been developed to analyze NGS data, and various data formats have been produced to accommodate different sequencing equipment providers or analytical software. However, the data interoperability between these tools brings great challenges to researchers. A generic format that could be shared by most of the software and tools in the NGS field would make data interoperability and sharing easier. In this paper, we defined a general XML-based NGS markup language (NGSML) format for the representation and exchange of NGS data. We also developed a user-friendly GUI tool, NGSMLEditor, for presenting, creating, editing, and converting NGSML files. By using NGSML, various types of NGS data can be saved in one unified format. Compared with the unstructured plain text file, a structured data format based on XML technology solves the incompatibility of various NGS data formats. The NGSML specifications are freely available from http://www.sysbio.org.cn/NGSML. NGSMLEditor is open source under GNU GPL and can be downloaded from the website.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call