Abstract
Massively parallel sequencing technology coupled with saturation mutagenesis has provided new and global insights into gene functions and roles. At a simplistic level, the frequency of mutations within genes can indicate the degree of essentiality. However, this approach neglects to take account of the positional significance of mutations - the function of a gene is less likely to be disrupted by a mutation close to the distal ends. Therefore, a systematic bioinformatics approach to improve the reliability of essential gene identification is desirable. We report here a parametric model which introduces a novel mutation feature together with a noise trimming approach to predict the biological significance of Tn5 mutations. We show improved performance of essential gene prediction in the bacterium Yersinia pestis, the causative agent of plague. This method would have broad applicability to other organisms and to the identification of genes which are essential for competitiveness or survival under a broad range of stresses.
Highlights
Whilst these different approaches have a broad range of applications, the most common relies on negative selection of a population under stress
A Type II essential gene is a gene with insertion count lower than a threshold which is determined by a tight clustering algorithm
A Type III essential gene is a gene which has insertion count larger than the threshold, but its transposon insertions were mainly found at proximal and distal ends
Summary
Whilst these different approaches have a broad range of applications, the most common relies on negative selection of a population under stress. The measurement of reduced fitness before and after exposure to a stress has been used to compare populations of mutants, for example to identify genes required for survival and growth in vivo compared to genes required for survival and growth in vitro[15,16,17] This method is more difficult to apply to the identification of essential genes, since there is no comparator group. Each insertion has a significance value contributing to gene mutation depending on two factors, i.e. how close it is to both distal ends of a gene and how often the insertion site has been hit by a transposon. We have developed a parametric model to integrate all insertions per gene and used it for gene essentiality prediction
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.