Abstract

BackgroundThe ability to generate recombinant drug target proteins is important for drug discovery research as it facilitates the investigation of drug-target-interactions in vitro. To accomplish this, the target’s exact protein sequence is required. Public databases, such as Ensembl, UniProt and RefSeq, are extensive protein and nucleotide sequence repositories. However, many sequences for non-human organisms are predicted by computational pipelines and may thus be incomplete or incorrect. This could lead to misinterpreted experimental outcomes due to gaps or errors in orthologous drug target sequences. Transcriptome analysis by RNA-Seq has been established as a standard method for gene expression analysis. Apart from this common application, paired-end RNA-Seq data can also be used to obtain full coverage cDNA sequences via de novo transcriptome assembly.MethodsTo assess whether de novo transcriptome assemblies can be used to determine a protein’s sequence by searching the assembly for a known orthologous sequence, we generated 3 × 6 = 18 tissue specific assemblies (three organs: brain, kidney and liver; six species: human, mouse, rat, dog, pig and cynomolgus monkey). These assemblies and the manually curated human protein sequences from UniProtKB/Swiss-Prot were used in a reciprocal BLAST search to identify best matching hits.We automated and generalised our approach and present the a&o-tool, a workflow which exploits de novo assemblies of paired-end RNA-Seq data and orthology information for target sequence validation and refinement across related species. Furthermore, the a&o-tool extracts best hits’ sequences from a reciprocal BLAST search, translates them into protein sequences, computes a multiple sequence alignment and quantifies the refinement.ResultsFor the three human assemblies we observed a hit rate greater than 60% with 100% sequence coverage and identity. For assemblies from the other species we observed similar hit rates and coverage with highest identities for cynomolgus monkey.ConclusionsIn summary, we show how to refine protein sequences using RNA-Seq data and sequence information from closely related species. With the a&o-tool we provide a fully automated pipeline to perform refinement including cDNA translation and multiple sequence alignment for visual inspection. The major prerequisite for applying the a&o-tool is high quality sequencing data.

Highlights

  • The ability to generate recombinant drug target proteins is important for drug discovery research as it facilitates the investigation of drug-target-interactions in vitro

  • Biochemical and cellular in vitro assays can be used throughout the drug discovery process to assess the compound’s activity on the target protein. This can be accomplished by using cell lines or bacteria expressing the recombinant protein, i.e. a DNA template of the known target sequence which is to be introduced into the production system

  • We investigated the sequence identity reported by Basic local alignment search tool (BLAST) and the percentage of the human protein sequence covered by the alignment

Read more

Summary

Introduction

The ability to generate recombinant drug target proteins is important for drug discovery research as it facilitates the investigation of drug-target-interactions in vitro. Many sequences for non-human organisms are predicted by computational pipelines and may be incomplete or incorrect. This could lead to misinterpreted experimental outcomes due to gaps or errors in orthologous drug target sequences. Biochemical and cellular in vitro assays can be used throughout the drug discovery process to assess the compound’s activity on the target protein. This can be accomplished by using cell lines or bacteria expressing the recombinant protein, i.e. a DNA template of the known target sequence which is to be introduced into the production system. Erroneous target proteins lead to an over- or underestimation of the compound’s activity or wrong dose selection and subsequently to misinterpretation of in vivo experiments

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call