Abstract
The most plausible hypothesis for explaining the origins of life on earth is the RNA world hypothesis supported by a growing number of research results from various scientific areas. Frequently, the existence of a hypothetical species on earth is supposed, with a base RNA sequence probably dissimilar from any known genomes today. It is hard to distinguish hypothetical sequences obtained by computer simulations from biological sequences and, hence, to decide which characteristics provide biological functionality. In the present consideration biological sequences obtained from RNA-viruses are compared with computationally generated sequences (artificial life probes). The task is to discriminate the samples regarding their origin, biological or artificial. We used the learning vector quantization (LVQ) model as the respective classifier. LVQ is a dissimilarity based classifier, which has only weak requirements regarding the underlying dissimilarity measure. This gives the opportunity to investigate several dissimilarity measures regarding their discriminating behavior for this task. Particularly, we consider information theoretic dissimilarities like the normalized compression distance (NCD) and divergences based on bag-of-word (BoW) vectors generated on the base of nucleotide-codons. Additionally, the geodesic path distance is applied taking an unary coding of sequences for a representation in the underlying Grassmann-manifold. Both, BoW and GPD allow continuous updates of prototypes in the feature space and in the Grassmann-manifold, respectively, whereas NCD restricts the application of LVQ methods to median variants.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.