Abstract

Many publications have demonstrated the huge potential of NGS methods in terms of new species discovery, environment monitoring, ecological studies, etc. [24,35,92,97,103]. Undoubtedly, NGS will become one the major tools for species identification and for routine diagnostic use. While read lengths are still quite short for most existing systems ranging between 50 bp and 800 bp, they are likely to improve soon. This will enable easier, faster, and more reliable contig assembly and subsequent matching against reference databases. When data generation is no longer a bottleneck, the storage, speed of analysis, and interpretation of DNA sequence data are becoming the major challenges. Also, the integration or the use of data originating from diverse datasets and a variety of data providers are serious issues that need to be addressed. Poor sequence record annotations and species name assignments are known problems that should be instantly addressed and would allow the creation of reference databases used for routine diagnostics based on NGS. Samples with huge amounts of short DNA fragments need to be analyzed and compared against reference databases in an efficient and fast way. Although a number of solutions have been proposed by Industry; offering commercial software, there still remain hurdles to take. One of the challenges that we need to address is data upload from client’s computers to central or distributed data storage and analysis services. Another one is the efficient parallelization of analyses using cloud or grid solutions. The reliability and up-time of storage and analyses facilities is another important problem that need to be addressed if one wants to use it for routine diagnostics. Finally, the management, reporting and visualization of the analyses results are among the last issues, but not the least challenging ones. Considering the constant growth of computational power and storage capacity needed by different bioinformatics applications, working with single or a limited number of servers is no longer realistic. Using a cloud environment and grid computing is becoming a must. Even single cloud service provider can be restrictive for bioinformatics applications and working with more than one cloud can make the workflow more robust in the face of failures and always growing capacity needs. In this white paper we review the current state of the art in this field. We discuss the main limitations and challenges that we need to address such as; data upload from client’s computers to central or distributed data storage and analysis services; efficient parallelization of analyses using grid solutions; reliability and up-time of storage and analyses facilities for routine diagnostics; management, retrieving and visualization of the analyses results.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.