Abstract

BackgroundMolecular characterization of highly diverse gene families can be time consuming, expensive, and difficult, especially when considering the potential for relatively large numbers of paralogs and/or pseudogenes. Here we investigate the utility of Pacific Biosciences single molecule real-time (SMRT) circular consensus sequencing (CCS) as an alternative to traditional cloning and Sanger sequencing PCR amplicons for gene family characterization. We target vomeronasal gene receptors, one of the most diverse gene families in mammals, with the goal of better understanding intra-specific V1R diversity of the gray mouse lemur (Microcebus murinus). Our study compares intragenomic variation for two V1R subfamilies found in the mouse lemur. Specifically, we compare gene copy variation within and between two individuals of M. murinus as characterized by different methods for nucleotide sequencing. By including the same individual animal from which the M. murinus draft genome was derived, we are able to cross-validate gene copy estimates from Sanger sequencing versus CCS methods.ResultsWe generated 34,088 high quality circular consensus sequences of two diverse V1R subfamilies (here referred to as V1RI and V1RIX) from two individuals of Microcebus murinus. Using a minimum threshold of 7× coverage, we recovered approximately 90% of V1RI sequences previously identified in the draft M. murinus genome (59% being identical at all nucleotide positions). When low coverage sequences were considered (i.e. < 7× coverage) 100% of V1RI sequences identified in the draft genome were recovered. At least 13 putatively novel V1R loci were also identified using CCS technology.ConclusionsRecent upgrades to the Pacific Biosciences RS instrument have improved the CCS technology and offer an alternative to traditional sequencing approaches. Our results suggest that the Microcebus murinus V1R repertoire has been underestimated in the draft genome. In addition to providing an improved understanding of V1R diversity in the mouse lemur, this study demonstrates the utility of CCS technology for characterizing complex regions of the genome. We anticipate that long-read sequencing technologies such as PacBio SMRT will allow for the assembly of multigene family clusters and serve to more accurately characterize patterns of gene copy variation in large gene families, thus revealing novel micro-evolutionary patterns within non-model organisms.Electronic supplementary materialThe online version of this article (doi:10.1186/1471-2164-15-720) contains supplementary material, which is available to authorized users.

Highlights

  • Molecular characterization of highly diverse gene families can be time consuming, expensive, and difficult, especially when considering the potential for relatively large numbers of paralogs and/or pseudogenes

  • A bimodal distribution of sequence lengths was observed for each single molecule real-time (SMRT) cell, corresponding to the V1RI (~725 bp) and V1RIX (~800 bp) amplicon sizes (Additional file 1: Figure S1)

  • Based on sequence length 12,625 and 11,814 reads were classified as V1RI and 4,289 and 5,360 reads were classified as V1RIX for M. murinus 1 and M. murinus 2, respectively (Table 2)

Read more

Summary

Introduction

Molecular characterization of highly diverse gene families can be time consuming, expensive, and difficult, especially when considering the potential for relatively large numbers of paralogs and/or pseudogenes. Given the intrinsic interest of accurate gene copy representation, it follows that methods of molecular characterization should be highly sensitive both to levels of low nucleotide diversity and to regions of high complexity. Such is not presently the case for organisms that lack a well-characterized genome: i.e., non-model organisms. Low-coverage “draft” genomes are increasingly available for non-model organisms, these draft genomes are notoriously unreliable for accurate gene calling, for regions of high genomic complexity [6,14] Until such time that high-coverage, fully-assembled and annotated genomes are available for all species of interest, alternative molecular methods are desirable

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call