QSP: An open sequence database for quorum sensing related gene analysis with an automatic annotation pipeline

Chunxiao Dai,Yuanyuan Qu,Weize Wu,Shuzhen Li,Zhuo Chen,Shengyang Lian,Jiawei Jing

doi:10.1016/j.watres.2023.119814

Abstract

Quorum sensing (QS) has attracted great attention due to its important role in the bacterial interactions and its relevance to water management. With the development of high-throughput sequencing technology, a specific database for QS-related sequence annotation is urgently needed. Here, Hidden Markov Model (HMM) profiles for 38 types of QS-related proteins were built using a total of 4024 collected seed sequences. Based on both homolog search and keywords confirmation against the non-redundant database, we established a QS-related protein (QSP) database, that includes 809,721 protein sequences and 186,133 nucleotide sequences, downloaded available at: https://github.com/chunxiao-dcx/QSP. The entries were classified into 38 types and 315 subtypes among 91 bacterial phyla. Furthermore, an automatic annotation pipeline, named QSAP, was developed for rapid annotation, classification and abundance quantification of QSP-like sequences from sequencing data. This pipeline provided the two homolog alignment strategies offered by Diamond (Blastp) or HMMER (Hmmscan), as well as a data cleansing function for a subset or union set of the hits. The pipeline was tested using 14 metagenomic samples from various water environments, including activated sludge, deep-sea sediments, estuary water, and reservoir water. The QSAP pipeline is freely available for academic use in the code repository at: https://github.com/chunxiao-dcx/QSAP. The establishment of this database and pipeline, provides a useful tool for QS-related sequence annotation in a wide range of projects, and will increase our understanding of QS communication in aquatic environments.

Full Text