Abstract

Several alignment-free sequence comparison methods are available which use similarity, based on a particular numerical descriptor of biological sequences. Any loss of information incurred in the transformation of a sequence into a numerical descriptor affects the results. A pool of descriptors that use different algorithms in their computation is expected to suffer minimum loss of information and an attempt is made in this direction to study the similarity of DNA sequences. A number of descriptors based on information theory and connectivity were computed for DNA sequences. Principal component analysis (PCA) was used to extract minimum number (N) of orthogonal descriptors, principal components (PCs). Similarity/dissimilarity clustering of DNA sequences were carried out in the N-dimensional similarity space constructed using the PCs extracted from the DNA descriptors. The paper explains the extension of quantitative molecular similarity analysis (QMSA) from the prediction of physicochemical properties and toxicity of chemicals to bioinformatics for the classification of DNA sequences.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.