Abstract

We present a program UNIREP, written in PowerBASIC for IBM-PCs, that identifies repetitive and unique nucleotide sequences in genomes or parts of genomes. A key feature of the algorithm is an oligonucleotide representation in a numerical code to make possible a comparison of all pairs of oligonucleotides (including overlaps) occurring in the analyzed sequence. This comparison assigns a score to each oligonucleotide, reflecting its similarity/dissimilarity to other oligonucleotides of the same length in the analyzed sequence. The score is plotted along the sequence so that peaks in the plot indicate repetitive regions and very low values reflect unique sequences. The scores are filtered to suppress or enhance the unique or repetitive sequences according to the user's wish. UNIREP is extended by auxiliary programs HIGHER and LOWER to list nucleotide sequences that have scores higher or lower than given limits. The potential of UNIREP is demonstrated using several long nucleotide sequences including the complete genomic sequence of EBV.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call