Computational Approach for Mining Simple Sequence Repeats in Expressed Sequence Tags

Akhtar Husain,P K Bharti,Umang Saini

doi:10.18090/samriddhi.v14i03.03

Abstract

Expressed sequence tags, the short sequences of cDNA are mined for identifying and characterizing simple sequence repeats for studying genetic variations. Web-based tools due to lack of server maintenance, become unusable; also few available stand-alone tools lack processing adequateness. Therefore with the intent to process multiple expressed sequence tag files without size limitations, proper validations, and the ability to retrieve more genome-related features; a simple to use, speed efficient portable standalone tool has been developed. The algorithm is implemented in Java using microsatellite search algorithm, with dictionary-based approach MISA – Perl script, called via command line for data mining. Another parallel module retrieves additional information from GenBank files. In the pipeline primer 3 was invoked for designing batch primers. This algorithm with an extended interface in Java Net Beans provides naïve users with a simple interactive tool for mining microsatellites, statistical analysis, and primer designing on one platform in the form of a stand-alone application. The number of repeats/ interruptions parameters can be reset through the graphical interface. This tool has interactive modules with proper validations; batch processing and cost-effective analysis of tandem repeats as compared to peers, the source code can be upgraded in the future.

Full Text