Abstract

Scientific advancement is hindered without proper genome annotation because biologists lack a complete understanding of cellular protein functions. In bacterial cells, hypothetical proteins (HPs) are open reading frames with unknown functions. HPs result from either an outdated database or insufficient experimental evidence (i.e., indeterminate annotation). While automated annotation reviews help keep genome annotation up to date, often manual reviews are needed to verify proper annotation. Students can provide the manual review necessary to improve genome annotation. This paper outlines an innovative classroom project that determines if HPs have outdated or indeterminate annotation. The Hypothetical Protein Characterization Project uses multiple well-documented, freely available, web-based, bioinformatics resources that analyze an amino acid sequence to (1) detect sequence similarities to other proteins, (2) identify domains, (3) predict tertiary structure including active site characterization and potential binding ligands, and (4) determine cellular location. Enough evidence can be generated from these analyses to support re-annotation of HPs or prioritize HPs for experimental examinations such as structural determination via X-ray crystallography. Additionally, this paper details several approaches for selecting HPs to characterize using the Hypothetical Protein Characterization Project. These approaches include student- and instructor-directed random selection, selection using differential gene expression from mRNA expression data, and selection based on phylogenetic relations. This paper also provides additional resources to support instructional use of the Hypothetical Protein Characterization Project, such as example assignment instructions with grading rubrics, links to training videos in YouTube, and several step-by-step example projects to demonstrate and interpret the range of achievable results that students might encounter. Educational use of the Hypothetical Protein Characterization Project provides students with an opportunity to learn and apply knowledge of bioinformatic programs to address scientific questions. The project is highly customizable in that HP selection and analysis can be specifically formulated based on the scope and purpose of each student’s investigations. Programs used for HP analysis can be easily adapted to course learning objectives. The project can be used in both online and in-seat instruction for a wide variety of undergraduate and graduate classes as well as undergraduate capstone, honor’s, and experiential learning projects.

Highlights

  • Nucleic acid sequencing has become so inexpensive that researchers are generating a plethora of fully sequenced genomes annually through massive initiatives such as the Earth BioGenome Project which aims to sequence the genomes of 1.5 million eukaryotic species by 2050 (Yandell and Ence, 2012; Lewin et al, 2018)

  • This paper introduces a Hypothetical Protein Characterization Project based off commonly referenced resources in previously reported in silico hypothetical proteins (HPs) characterization studies that students use while learning interdisciplinary concepts in bioinformatics, microbiology, biochemistry, and genetics (Figure 1)

  • We found PSI-Basic Local Alignment Search Tool (BLAST) of the multidrug ABC transporter Sav1866 from S. aureus (PDB accession: 2ONJ) identified HPs

Read more

Summary

INTRODUCTION

Nucleic acid sequencing has become so inexpensive that researchers are generating a plethora of fully sequenced genomes annually through massive initiatives such as the Earth BioGenome Project which aims to sequence the genomes of 1.5 million eukaryotic species by 2050 (Yandell and Ence, 2012; Lewin et al, 2018). Several previously reported studies have used computational approaches to assign functional annotation to HPs in a wide range of bacterial and viral species, including but not limited to Staphylococcus aureus (Mohan and Venugopal, 2012; School et al, 2016), M. tuberculosis (Raj et al, 2017; Yang et al, 2019), Vibrio cholerae (Islam et al, 2015), Klebsiella pneumoniae (Pranavathiyani et al, 2020), Mycoplasma pneumoniae (Shahbaaz et al, 2015), Orientia tsutsugamushi (Imam et al, 2019), Corynebacterium pseudotuberculosis (Araujo et al, 2020), human adenovirus (Dorden and Mahadevan, 2015; Naveed et al, 2017), and vaccinia virus (Mahmood et al, 2016) These studies utilize some combination of the various computational tools and databases available to analyze the physiochemical, functional, and structural properties of an HP (Table 1) since results generated from a single server cannot provide a complete functional determination currently (Dorden and Mahadevan, 2015). Gene enrichment analysis comparing a group of the most differentially expressed HPs to a gene signature (i.e., gene list ranked by differential expression based on a statistical method)

Objective
DISCUSSION
Findings
DATA AVAILABILITY STATEMENT
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call