The UCSC repeat browser allows discovery and visualization of evolutionary conflict across repeat families

Jason D. Fernandes,Maximilian Haeussler,W. James Kent,Sofie R. Salama,Armando Zamudio-Hurtado,David Haussler,Hiram Clawson

doi:10.1186/s13100-020-00208-w

Abstract

BackgroundNearly half the human genome consists of repeat elements, most of which are retrotransposons, and many of which play important biological roles. However repeat elements pose several unique challenges to current bioinformatic analyses and visualization tools, as short repeat sequences can map to multiple genomic loci resulting in their misclassification and misinterpretation. In fact, sequence data mapping to repeat elements are often discarded from analysis pipelines. Therefore, there is a continued need for standardized tools and techniques to interpret genomic data of repeats.ResultsWe present the UCSC Repeat Browser, which consists of a complete set of human repeat reference sequences derived from annotations made by the commonly used program RepeatMasker. The UCSC Repeat Browser also provides an alignment from the human genome to these references, uses it to map the standard human genome annotation tracks, and presents all of them as a comprehensive interface to facilitate work with repetitive elements. It also provides processed tracks of multiple publicly available datasets of particular interest to the repeat community, including ChIP-seq datasets for KRAB Zinc Finger Proteins (KZNFs) – a family of proteins known to bind and repress certain classes of repeats. We used the UCSC Repeat Browser in combination with these datasets, as well as RepeatMasker annotations in several non-human primates, to trace the independent trajectories of species-specific evolutionary battles between LINE 1 retroelements and their repressors. Furthermore, we document at https://repeatbrowser.ucsc.edu how researchers can map their own human genome annotations to these reference repeat sequences.ConclusionsThe UCSC Repeat Browser allows easy and intuitive visualization of genomic data on consensus repeat elements, circumventing the problem of multi-mapping, in which sequencing reads of repeat elements map to multiple locations on the human genome. By developing a reference consensus, multiple datasets and annotation tracks can easily be overlaid to reveal complex evolutionary histories of repeats in a single interactive window. Specifically, we use this approach to retrace the history of several primate specific LINE-1 families across apes, and discover several species-specific routes of evolution that correlate with the emergence and binding of KZNFs.

Highlights

Transposable elements are significant drivers of eukaryotic genome evolution
Masking or filtering these reads is often considered a “conservative” approach in that it avoids mis-assigning the genomic location of a read, but it prevents the discovery of potentially important biological functions of repeat elements [1]
Repbase began as a hand-curated list in 1992 of 53 prototypic repeat sequences identified in the human genome [8]

Summary

Introduction

Transposable elements are significant drivers of eukaryotic genome evolution. In humans and other primates, transposons constitute nearly half the genome; the majority of these repeat elements are retrotransposons, some DNA transposons are present. Despite the high repeat content of the human genome, many genomic analyses struggle to deal with these regions as sequencing reads can often be assigned nearly well to multiple regions in the genome. Masking or filtering these reads is often considered a “conservative” approach in that it avoids mis-assigning the genomic location of a read, but it prevents the discovery of potentially important biological functions of repeat elements [1]. Half the human genome consists of repeat elements, most of which are retrotransposons, and many of which play important biological roles. There is a continued need for standardized tools and techniques to interpret genomic data of repeats

Results

Discussion

Conclusion