Building the Chordata Olfactory Receptor Database using more than 400,000 receptors annotated by Genome2OR.

Wei Han,Suwen Zhao,Liting Zeng,Yiran Wu

doi:10.1007/s11427-021-2081-6

Abstract

Olfactory receptors are poorly annotated for most genome-sequenced chordates. To address this deficiency, we developed a nhmmer-based olfactory receptor annotation tool Genome2OR ( https://github.com/ToHanwei/Genome2OR.git ), and used it to process 1,695 sequenced chordate genomes in the NCBI Assembly database as of January, 2021. In total, 765,248 olfactory receptor genes were annotated, with 404,426 functional genes and 360,822 pseudogenes, which represents a four-fold increase in the number of annotated olfactory receptors. Based on the annotation data, we built a database called Chordata Olfactory Receptor Database (CORD, https://cord.ihuman.shanghaitech.edu.cn ) for archiving, analysing and disseminating the data. Beyond the primary data, we offer derivative information, including pictures of species, cross references to public databases, structural models, sequence similarity networks and sequence profiles in the CORD. Furthermore, we did brief analyses on these receptors, including building a huge protein sequence similarity network covering all receptors in the database, and clustering them into 20 communities, classifying the 20 communities into three categories based on their presences/absences in ray-finned fish and/or lobe-finned fish. We infer that olfactory receptors should have unique activation and desensitization mechanisms by analysing their sequences and structural models. We believe the CORD can benefit the researchers and the general public who are interested in olfaction.

Full Text