Abstract

Several alignment-free sequence comparison methods are available which use similarity, based on a particular numerical descriptor of biological sequences. Any loss of information incurred in the transformation of a sequence into a numerical descriptor affects the results. A pool of descriptors that use different algorithms in their computation is expected to suffer minimum loss of information and an attempt is made in this direction to study the similarity of DNA sequences. A number of descriptors based on information theory and connectivity were computed for DNA sequences. Principal component analysis (PCA) was used to extract minimum number (N) of orthogonal descriptors, principal components (PCs). Similarity/dissimilarity clustering of DNA sequences were carried out in the N-dimensional similarity space constructed using the PCs extracted from the DNA descriptors. The paper explains the extension of quantitative molecular similarity analysis (QMSA) from the prediction of physicochemical properties and toxicity of chemicals to bioinformatics for the classification of DNA sequences.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call