Graph theory provides a systematic method for modeling and analysing complicated biological data as an effective bioinformatics tool. Based on current trends, the number of DNA sequences in the DNA database is growing quickly. To determine the origin of a species and identify homologous sequences, it is crucial to detect similarities in DNA sequences. Alignment-free techniques are required for accurate measures of sequence similarity, which has been one of the main issues facing computational biologists. The current study provides a mathematical technique for comparing DNA sequences that are constructed in graph theory. The sequences of each DNA were divided into pairs of nucleotides, from which weighted loop digraphs and corresponding weighted vectors were computed. To check the sequence similarity, distance measures like Cosine, Correlation, and Jaccard were employed. To verify the method, DNA segments from the genomes of ten species of cotton were tested. Furthermore, to evaluate the efficacy of the proposed methodology, a K-means clustering method was performed. This study proposes a proof-of-model that utilises a distance matrix approach that promises impressive outcomes with future optimisations to be made to the suggested solution to get the hundred percent accurate result. In the realm of bioinformatics, this paper highlights the use of graph theory as an effective tool for biological data study and sequence comparison. It's expected that further optimization in the proposed solution can bring remarkable results, as this paper presents a proof-of-concept implementation for a given set of data using the proposed distance matrix technique.
Read full abstract