Abstract

DNA, or deoxyribonucleic acid, carries the entirety of genetic information of any living organism. The study of the bacterial DNA extracted from human bones excavated from archaeological and anthropological sites aims to analyse the evolution of microorganisms inhabiting the human body and to contribute to new insight related to the health, diet and even migration of our ancestors. This paper aims to offer a solution for the discrimination between ancient and modern bacterial DNA in dental calculus. We propose three internal representations for the considered DNA sequences in order to analyse which captures the most information and is more informative for classification models. Two of these are text-based, while the third one takes advantage of several physical and chemical properties of nucleotides in the DNA. We use a data set containing both ancient and modern dental calculus bacterial DNA and apply two supervised models, namely artificial neural networks and support vector machines to distinguish between the two types of sequences. The two main conclusions indicated by the obtained results are: the representation based on physical and chemical properties seems to best capture relevant information for the task at hand; for the considered data set and DNA encoding proposals, support vector machines outperform artificial neural networks, although results obtained by both models are promising.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call