Abstract

Article Figures and data Abstract Editor's evaluation Introduction Results Discussion Materials and methods Appendix 1 Appendix 2 Data availability References Decision letter Author response Article and author information Metrics Abstract Total RNA sequencing (RNA-seq) is an important tool in the study of mosquitoes and the RNA viruses they vector as it allows assessment of both host and viral RNA in specimens. However, there are two main constraints. First, as with many other species, abundant mosquito ribosomal RNA (rRNA) serves as the predominant template from which sequences are generated, meaning that the desired host and viral templates are sequenced far less. Second, mosquito specimens captured in the field must be correctly identified, in some cases to the sub-species level. Here, we generate mosquito rRNA datasets which will substantially mitigate both of these problems. We describe a strategy to assemble novel rRNA sequences from mosquito specimens and produce an unprecedented dataset of 234 full-length 28S and 18S rRNA sequences of 33 medically important species from countries with known histories of mosquito-borne virus circulation (Cambodia, the Central African Republic, Madagascar, and French Guiana). These sequences will allow both physical and computational removal of rRNA from specimens during RNA-seq protocols. We also assess the utility of rRNA sequences for molecular taxonomy and compare phylogenies constructed using rRNA sequences versus those created using the gold standard for molecular species identification of specimens—the mitochondrial cytochrome c oxidase I (COI) gene. We find that rRNA- and COI-derived phylogenetic trees are incongruent and that 28S and concatenated 28S+18S rRNA phylogenies reflect evolutionary relationships that are more aligned with contemporary mosquito systematics. This significant expansion to the current rRNA reference library for mosquitoes will improve mosquito RNA-seq metagenomics by permitting the optimization of species-specific rRNA depletion protocols for a broader range of species and streamlining species identification by rRNA sequence and phylogenetics. Editor's evaluation Mosquitoes are an important vector for viruses and other pathogens worldwide. However, significant genomic resources are scarce for the study of these species. In this work, the authors create a significant genomic resource that will enable the study of mosquitoes and the pathogens that they carry. https://doi.org/10.7554/eLife.82762.sa0 Decision letter Reviews on Sciety eLife's review process Introduction Mosquitoes top the list of vectors for arthropod-borne diseases, being implicated in the transmission of many human pathogens responsible for arboviral diseases, malaria, and lymphatic filariasis (WHO, 2017). Mosquito-borne viruses circulate in sylvatic (between wild animals) or urban (between humans) transmission cycles driven by different mosquito species with their own distinct host preferences. Although urban mosquito species are chiefly responsible for amplifying epidemics in dense human populations, sylvatic mosquitoes maintain the transmission of these viruses among forest-dwelling animal reservoir hosts and are involved in spillover events when humans enter their ecological niches (Valentine et al., 2019). Given that mosquito-borne virus emergence is preceded by such spillover events, continuous surveillance and virus discovery in sylvatic mosquitoes is integral to designing effective public health measures to pre-empt or respond to mosquito-borne viral epidemics. Metagenomics on field specimens is a powerful method in our toolkit to understand mosquito-borne disease ecology through the One Health lens (Webster et al., 2016). With next-generation sequencing becoming more accessible, such studies have provided unprecedented insights into the interfaces among mosquitoes, their environment, and their animal and human hosts. As mosquito-associated viruses are mostly RNA viruses, RNA sequencing (RNA-seq) is especially informative for surveillance and virus discovery. However, working with lesser studied mosquito species poses several problems. First, metagenomics studies based on RNA-seq are bedevilled by overabundant ribosomal RNAs (rRNAs). These non-coding RNA molecules comprise at least 80% of the total cellular RNA population (Gale and Crampton, 1989). Due to their length and their abundance, they are a sink for precious next-generation sequencing reads, decreasing the sensitivity of pathogen detection unless depleted during library preparation. Yet the most common rRNA depletion protocols require prior knowledge of rRNA sequences of the species of interest as they involve hybridizing antisense oligos to the rRNA molecules prior to removal by ribonucleases (Fauver et al., 2019; Phelps et al., 2021) or by bead capture (Kukutla et al., 2013). Presently, reference sequences for rRNAs are limited to only a handful of species from three genera: Aedes, Culex, and Anopheles (Ruzzante et al., 2019). The lack of reliable rRNA depletion methods could deter mosquito metagenomics studies from expanding their sampling diversity, resulting in a gap in our knowledge of mosquito vector ecology. The inclusion of lesser studied yet medically relevant sylvatic species is therefore imperative. Second, species identification based on morphology is notoriously complicated for members of certain species subgroups. This is especially the case among Culex subgroups. Sister species are often sympatric and show at least some competence for a number of viruses, such as Japanese encephalitis virus, St Louis encephalitic virus, and Usutu virus (Nchoutpouen et al., 2019). Although they share many morphological traits, each of these species have distinct ecologies and host preferences, thus the challenge of correctly identifying vector species can affect epidemiological risk estimation for these diseases (Farajollahi et al., 2011). DNA molecular markers are often employed to a limited degree of success to distinguish between sister species (Batovska et al., 2017; Zittra et al., 2016). To address the lack of full-length rRNA sequences in public databases, we sought to determine the 28S and 18S rRNA sequences of a diverse set of Old and New World sylvatic mosquito species from four countries representing three continents: Cambodia, the Central African Republic, Madagascar, and French Guiana. These countries, due to their proximity to the equator, contain high mosquito biodiversity (Foley et al., 2007) and have had long histories of mosquito-borne virus circulation (Desdouits et al., 2015; Halstead, 2019; Héraud et al., 2022; Jacobi and Serie, 1972; Ratsitorahina et al., 2008; Saluzzo et al., 2017; Zeller et al., 2016). Increased and continued surveillance of local mosquito species could lead to valuable insights on mosquito virus biogeography. Using a unique score-based read filtration strategy to remove interfering non-mosquito rRNA reads for accurate de novo assembly, we produced a dataset of 234 novel full-length 28S and 18S rRNA sequences from 33 mosquito species, 30 of which have never been recorded before. We also explored the functionality of 28S and 18S rRNA sequences as molecular markers by comparing their performance to that of the mitochondrial cytochrome c oxidase subunit I (COI) gene for molecular taxonomic and phylogenetic investigations. The COI gene is the most widely used DNA marker for molecular species identification and forms the basis of the Barcode of Life Data System (BOLD) (Hebert et al., 2003; Ratnasingham and Hebert, 2007). Presently, full-length rRNA sequences are much less represented compared to other molecular markers. However, given the availability of relevant reference sequences, 28S and concatenated 28S+18S rRNA sequences can be the better approach for molecular taxonomy and phylogenetic studies. We hope that our sequence dataset, with its species diversity and eco-geographical breadth, and the assembly strategy we describe would further facilitate the use of rRNA as markers. In addition, this dataset enables the design of species-specific oligos for cost-effective rRNA depletion for a broader range of mosquito species and streamlined molecular species identification during RNA-seq. Results Poor rRNA depletion using a non-specific depletion method During library preparations of mosquito samples for RNA-seq, routinely used methods for depleting rRNA are commercial kits optimised for human or mice samples (Belda et al., 2019; Bishop-Lilly et al., 2010; Chandler et al., 2015; Kumar et al., 2012; Weedall et al., 2015; Zakrzewski et al., 2018) or through 80–100 base pair antisense probe hybridisation followed by ribonuclease digestion (Fauver et al., 2019; Phelps et al., 2021). In cases where the complete reference rRNA sequence of the target species is not known, oligos would be designed based on the rRNA sequence of the closest related species (25, this study). These methods should deplete reads from the conserved regions of rRNA sequences. However, reads from the variable regions remain at abundances high enough to compromise RNA-seq output. In our hands, we have found that using probes designed for the Ae. aegypti rRNA sequence followed by RNase H digestion according to the protocol published by Morlan et al., 2012, produced poor depletion in Aedes albopictus, and in Culicine and Anopheline species (Figure 1), in which between 46% and 94% of reads post-depletion were ribosomal. Additionally, the lack of full-length reference rRNA sequences compromises the in silico clean-up of remaining rRNA reads from sequencing data, as reads belonging to variable regions would not be removed. To solve this and to enable RNA-seq metagenomics on a broader range of mosquito species, we performed RNA-seq to generate reference rRNA sequences for 33 mosquito species representing 10 genera from Cambodia, the Central African Republic, Madagascar, and French Guiana. Most of these species are associated with vector activity for various pathogens in their respective ecologies (Table 1). In parallel, we sequenced the mitochondrial COI gene to perform molecular species identification of our samples and to comparatively evaluate the use of rRNA as a molecular marker (Figure 2). Figure 1 Download asset Open asset Percentage of rRNA reads in mosquito total RNA sequencing (RNA-seq) data after depletion using probes antisense to Aedes aegypti sequences. Pools of five individual mosquitoes from genera Aedes (Ae), Culex (Cx), Mansonia (Ma), and Anopheles (An) were ribodepleted by probe hybridisation followed by RNase H digestion according to the protocol by Morlan et al., 2012. Y-axis depicts percentages of remaining rRNA reads calculated as the number of rRNA reads over total reads per sample pool. Depletion efficiency decreases with taxonomic distance from Ae. aegypti underlining the need for reference sequences for species of interest. Table 1 Mosquito species represented in this study and their vector status. Mosquito taxonomy‡Origin*Collection site (ecosystem type)Vector for†ReferenceAedes (Fredardsius) vittatusCFRural (village)ZIKV, CHIKV, YFVDiallo et al., 2020Aedes (Ochlerotatus) scapularisGFRural (village)YFVVasconcelos et al., 2001Aedes (Ochlerotatus) serratusGFRural (village)YFV, OROVCardoso et al., 2010; Romero-Alvarez and Escobar, 2018Aedes (Stegomyia) aegyptiCFUrbanDENV, ZIKV, CHIKV, YFVKraemer et al., 2019Aedes (Stegomyia) albopictusCF, KHRural (village, nature reserve)DENV, ZIKV, CHIKV, YFV, JEVAuerswald et al., 2021; Kraemer et al., 2019Aedes (Stegomyia) simpsoniCFRural (village)YFVMukwaya et al., 2000Anopheles (Anopheles) baezaiKHRural (nature reserve)Unreported–Anopheles (Anopheles) coustaniMG, CFRural (village)RVFV, malariaMwangangi et al., 2013; Nepomichene et al., 2018; Ratovonjato et al., 2011Anopheles (Cellia) funestusMG, CFRural (village)ONNV, malariaLutomiah et al., 2013; Tabue et al., 2017Anopheles (Cellia) gambiaeMG, CFRural (village)ONNV, malariaBrault et al., 2004Anopheles (Cellia) squamosusMGRural (village)RVFV, malariaRatovonjato et al., 2011; Stevenson et al., 2016Coquillettidia (Rhynchotaenia) venezuelensisGFRural (village)OROVTravassos da Rosa et al., 2017Culex (Culex) antennatusMGRural (village)RVFVNepomichene et al., 2018; Ratovonjato et al., 2011Culex (Culex) duttoniCFRural (village)Unreported–Culex (Culex) neaveiMGRural (village)USUVNikolay et al., 2011Culex (Culex) orientalisKHRural (nature reserve)JEVKim et al., 2015Culex (Culex) perexiguusMGRural (village)WNV, USUVVezenegho et al., 2022Culex (Culex) pseudovishnuiKHRural (nature reserve)JEVAuerswald et al., 2021Culex (Culex) quinquefasciatusMG, CF, KHRural (village, nature reserve)ZIKV, JEV, WNV, DENV, SLEV, RVFV, Wuchereria bancroftiBhattacharya and Basu, 2016; Maquart et al., 2021; Ndiaye et al., 2016; Serra et al., 2016Culex (Culex) tritaeniorhynchusMG, KHRural (village, nature reserve)JEV, WNV, RVFVAuerswald et al., 2021; Hayes et al., 1980; Jupp et al., 2002Culex (Melanoconion) spissipesGFRural (village)VEEVWeaver et al., 2004Culex (Melanoconion) portesiGFRural (village)VEEV, TONVTalaga et al., 2021; Weaver et al., 2004Culex (Melanoconion) pedroiGFRural (village)EEEV, VEEV, MADVTalaga et al., 2021; Turell et al., 2008Culex (Oculeomyia) bitaeniorhynchusMG, KHRural (village, nature reserve)JEVAuerswald et al., 2021Culex (Oculeomyia) poicilipesMGRural (village)RVFVNdiaye et al., 2016Eretmapodites intermediusCFRural (village)Unreported–Limatus durhamiiGFRural (village)ZIKVBarrio-Nuevo et al., 2020Mansonia (Mansonia) titillansGFRural (village)VEEV, SLEVHoyos-López et al., 2015; Turell, 1999Mansonia (Mansonioides) indianaKHRural (nature reserve)JEVArunachalam et al., 2004Mansonia (Mansonioides) uniformisMG, CF, KHRural (village, nature reserve)RVFV, Wuchereria bancroftiLutomiah et al., 2013; Ughasi et al., 2012Mimomyia (Etorleptiomyia) mediolineataMGRural (village)Unreported–Psorophora (Janthinosoma) feroxGFRural (village)ROCVMitchell et al., 1986Uranotaenia (Uranotaenia) geometricaGFRural (village)Unreported– * Dengue virus, DENV; Zika virus, ZIKV; chikungunya virus, CHIKV; Yellow Fever virus, YFV; Oropouche virus, OROV; Japanese encephalitis virus, JEV; Rift Valley Fever virus, RVFV; O’Nyong Nyong virus, ONNV; Usutu virus, USUV; West Nile virus, WNV; St Louis encephalitis virus, SLEV; Venezuelan equine encephalitis virus, VEEV; Tonate virus, TONV; Eastern equine encephalitis virus, EEEV; Madariaga virus, MADV; Rocio virus, ROCV. † Origin countries are listed as their ISO alpha-2 codes: Central African Republic, CF; Cambodia, KH; Madagascar, MG; French Guiana, GF. ‡ Subgenus indicated in brackets. Figure 2 Download asset Open asset Novel mosquito rRNA sequences were obtained using a unique reads filtering method. (A) Schematic of sequencing and bioinformatics analyses performed in this study to obtain full-length 18S and 28S rRNA sequences as well as cytochrome c oxidase I (COI) DNA sequences. Nucleic acids were isolated from mosquito specimens for next-generation (for rRNA) or Sanger (for COI) sequencing. Two in-house libraries were created from the SILVA rRNA gene database: Insecta and Non-Insecta, which comprises 8,585 sequences and 558,185 sequences, respectively. Following BLASTn analyses against these two libraries, each RNA-sequencing (RNA-seq) read is assigned a ratio of BLASTn scores to describe their relative nucleotide similarity to insect rRNA sequences. Based on these ratios of scores, RNA-seq reads can then be filtered to remove non-mosquito reads prior to assembly with SPAdes to give full-length 18S and 28S rRNA sequences. Image created with https://biorender.com/. (B) Based on their ratio of scores, reads can be segregated into four categories, as shown on this ratio of scores versus number of reads plot for the representative specimen ‘CF S27’: (i) reads with hits only in the Insecta library (shaded in green), (ii) reads with a higher score against the Insecta library (shaded in blue), (iii) reads with a higher score against the Non-Insecta library (shaded in yellow), and (iv) reads with no hits in the Insecta library (shaded in red). We applied a conservative threshold at 0.8, indicated by the black horizontal line, where only reads above this threshold are used in the assembly with SPAdes. For this given specimen, 175,671 reads (96.3% of total reads) passed the ≥0.8 cut-off, 325 reads (0.18% of total reads) had ratios of scores <0.8, while 6,423 reads (3.52%) did not have hits against the Insecta library. rRNA reads filtering and sequence assembly Assembling Illumina reads to reconstruct rRNA sequences from total mosquito RNA is not a straightforward task. Apart from host rRNA, total RNA samples also contain rRNA from other organisms associated with the host (microbiota, external parasites, or ingested diet). As rRNA sequences share high homology in conserved regions, Illumina reads (150 bp) from non-host rRNA can interfere with the contig assembly of host 28S and 18S rRNA. Our score-based filtration strategy, described in detail in the Materials and methods section, allowed us to bioinformatically remove interfering rRNA reads and achieve successful de novo assembly of 28S and 18S rRNA sequences for all our specimens. Briefly, for each Illumina read, we computed a ratio of BLAST scores against an Insecta library over scores against a Non-Insecta library (Figure 2A). Based on their ratio of scores, reads could be segregated into four categories (Figure 2B): (i) reads mapping only to the Insecta library, (ii) reads mapping better to the Insecta relative to Non-Insecta library, (iii) reads mapping better to the Non-Insecta relative to the Insecta library, and (iv) reads mapping only to the Non-Insecta library. By applying a conservative threshold at 0.8 to account for the non-exhaustiveness of the SILVA database, we removed reads that likely do not originate from mosquito rRNA. Notably, 15 of our specimens were engorged with vertebrate blood, a rich source of non-mosquito rRNA (Appendix 1—table 1). The successful assembly of complete 28S and 18S rRNA sequences for these specimens demonstrates that this strategy performs as expected even with high amounts of non-host rRNA reads. This is particularly important in studies on field-captured mosquitoes as females are often sampled already having imbibed a blood meal or captured using the human landing catch technique. We encountered challenges for three specimens morphologically identified as Mansonia africana (Specimen ID S33–S35) (Appendix 1—table 1). COI amplification by PCR did not produce any product, hence COI sequencing could not be used to confirm species identity. In addition, the genome assembler SPAdes (Bankevich et al., 2012) was only able to assemble partial length rRNA contigs, despite the high number of reads with high scores against the Insecta library. Among other Mansonia specimens, these partial length contigs shared the highest similarity with contigs obtained from sample ‘Ma uniformis CF S51’. We then performed a guided assembly using the 28S and 18S sequences of this specimen as references, which successfully produced full-length contigs. In two of these specimens (Specimen ID S34 and S35), our assembly initially produced two sets of 28S and 18S rRNA sequences, one of which was similar to mosquito rRNA with low coverage and another with 10-fold higher coverage and 95% nucleotide sequence similarity to a water mite of genus Horreolanus known to parasitize mosquitoes. Our success in obtaining rRNA sequences for mosquito and water mite shows that our strategy can be applied to metabarcoding studies where the input material comprises multiple insect species, provided that appropriate reference sequences of the target species or of a close relative are available. Altogether, we were able to assemble 122 28S and 114 18S full-length rRNA sequences for 33 mosquito species representing 10 genera sampled from four countries across three continents. This dataset contains, to our knowledge, the first records for 30 mosquito species and for seven genera: Coquillettidia, Mansonia, Limatus, Mimomyia, Uranotaenia, Psorophora, and Eretmapodites. Individual GenBank accession numbers for these sequences and specimen information are listed in Appendix 1—table 1. Comparative phylogeny of novel rRNA sequences relative to existing records To verify the assembly accuracy of our rRNA sequences, we constructed a comprehensive phylogenetic tree from the full-length 28S rRNA sequences generated from our study and included relevant rRNA sequences publicly available from GenBank (Figure 3). We applied a search criterion for GenBank sequences with at least 95% coverage of our sequence lengths (~4000 bp), aiming to represent as many species or genera as possible. Although we rarely found records for the same species included in our study, the resulting tree showed that our 28S sequences generally clustered according to their respective species and subgenera, supported by moderate to good bootstrap support at terminal nodes. Species taxa generally formed monophyletic clades, with the exception of An. gambiae and Cx. quinquefasciatus. An. gambiae 28S rRNA sequences formed a clade with closely related sequences from Anopheles arabiensis, Anopheles merus, and Anopheles coluzzii, suggesting unusually high interspecies homology for Anophelines or other members of subgenus Cellia (Figure 3, in purple, subgenus Cellia). Meanwhile, Cx. quinquefasciatus 28S rRNA sequences formed a taxon paraphyletic to sister species Culex pipiens (Figure 3, in coral, subgenus Culex). Figure 3 with 2 supplements see all Download asset Open asset 28S sequences generated from this study clustered with conspecifics or congenerics from existing GenBank records. A rooted phylogenetic tree based on full-length 28S sequences (3,900 bp) from this study and from GenBank was inferred using the maximum-likelihood method and constructed to scale in MEGA X (Kumar et al., 2018) using an unknown Horreolanus species found among our samples as an outgroup. Values at each node indicate bootstrap support (%) from 500 replications. Sequences from GenBank are annotated with filled circles and their accession numbers are shown. For sequences from this study, each specimen label contains information on taxonomy, origin (in two-letter country codes), and specimen ID number. Some specimens produced up to two consensus 28S sequences; this is indicated by the numbers 1 or 2 at the beginning of the specimen label. Specimen genera are indicated by colour: Culex in coral, Anopheles in purple, Aedes in dark blue, Mansonia in dark green, Culiseta in maroon, Limatus in light green, Coquillettidia in light blue, Psorophora in yellow, Mimomyia in teal, Uranotaenia in pink, and Eretmapodites in brown. Scale bar at 0.05 is shown. Figure 3—source data 1 Multiple sequence alignment of 169 28S rRNA sequences from this study and from GenBank (FASTA). https://cdn.elifesciences.org/articles/82762/elife-82762-fig3-data1-v2.zip Download elife-82762-fig3-data1-v2.zip 28S rRNA sequence-based phylogenetic reconstructions (Figure 3, with GenBank sequences; Figure 4—figure supplement 1, this study only) showed marked incongruence to that of 18S rRNA sequences (Figure 4—figure supplement 2). Although all rRNA trees show the bifurcation of family Culicidae into subfamilies Anophelinae (genus Anopheles, in purple) and Culicinae (all other genera), the recovered intergeneric phylogenetic relationships vary between the 28S and 18S rRNA trees and are weakly supported. The 18S rRNA tree also exhibited several taxonomic anomalies: (i) the lack of definitive clustering by species within the Culex subgenus (in coral); (ii) the lack of distinction between 18S rRNA sequences of Cx. pseudovishnui and Cx. tritaeniorhynchus (in coral); (iii) the placement of Ma sp.3 CF S35 (in dark green) within a Culex clade; and (iv) the lack of a monophyletic Mimomyia clade (in teal) (Figure 4—figure supplement 2). However, 28S and 18S rRNA sequences are encoded by linked loci in rDNA clusters and should not be analysed separately. Indeed, when concatenated 28S+18S rRNA sequences were generated from the same specimens (Figure 4), the phylogenetic tree resulting from these sequences more closely resembles the 28S tree (Figure 3) with regard to the basal position of the Mimomyia clade (in teal) within the Culicinae subfamily with good bootstrap support in either tree (84% in 28S rRNA tree, 100% in concatenated 28S+18S rRNA tree). For internal nodes, bootstrap support values were higher in the concatenated tree compared to the 28S tree. Interestingly, the 28S+18S rRNA tree formed an Aedini tribe-clade encompassing taxa from genera Psorophora (in yellow), Aedes (in dark blue), and Eretmapodites (in brown), possibly driven by the inclusion of 18S rRNA sequences. Concatenation also resolved the anomalies found in the 18S rRNA tree and added clarity to the close relationship between Culex (in coral) and Mansonia (in dark green) taxa. Of note, relative to the 28S tree (Figure 3) the Culex and Mansonia genera are no longer monophyletic in the concatenated 28S+18S rRNA tree (Figure 4). Genus Culex is paraphyletic with respect to subgenus Mansonoides of genus Mansonia (Figure 3). Ma. titillans and Ma sp.4, which we suspect to be Mansonia pseudotitillans, always formed a distinct branch in 28S or 18S rRNA phylogenies, thus possibly representing a clade of subgenus Mansonia. Figure 4 with 2 supplements see all Download asset Open asset Concatenating 28S and 18S rRNA sequences produces phylogenetic relationships that are concordant with classical Culicidae systematics with higher bootstrap support than 28S sequences alone. This phylogenetic tree based on concatenated 28S+18S rRNA sequences (3,900+1,900 bp) generated from this study was inferred using the maximum-likelihood method and constructed to scale using MEGA X (Kumar et al., 2018) using an unknown Horreolanus species found among our samples as an outgroup. Values at each node indicate bootstrap support (%) from 500 replications. Each specimen label contains information on taxonomy, origin (as indicated in two-letter country codes), and specimen ID number. Some specimens produced up to two consensus 28S+18S rRNA sequences; this is indicated by the numbers 1 or 2 at the beginning of the specimen label. Specimen genera are indicated by colour: Culex in coral, Anopheles in purple, Aedes in dark blue, Mansonia in dark green, Limatus in light green, Coquillettidia in light blue, Psorophora in yellow, Mimomyia in teal, Uranotaenia in pink, and Eretmapodites in brown. Scale bar at 0.05 is shown. Figure 4—source data 1 Multiple sequence alignment of 122 28S rRNA sequences, including two sequences from Horreolanus sp. (FASTA). https://cdn.elifesciences.org/articles/82762/elife-82762-fig4-data1-v2.zip Download elife-82762-fig4-data1-v2.zip Figure 4—source data 2 Multiple sequence alignment of 114 18S rRNA sequences, including two sequences from Horreolanus sp. (FASTA). https://cdn.elifesciences.org/articles/82762/elife-82762-fig4-data2-v2.zip Download elife-82762-fig4-data2-v2.zip The concatenated 28S+18S rRNA tree (Figure 4) recapitulates what is classically known about the systematics of our specimens, namely (i) the early divergence of subfamily Anophelinae from subfamily Culicinae, (ii) the division of genus Anopheles (in purple) into two subgenera, Anopheles and Cellia, (iii) the division of genus Aedes (in dark blue) into subgenera Stegomyia and Ochlerotatus, (iv) the divergence of the monophyletic subgenus Melanoconion within the Culex genus (in coral) (Harbach, 2007; Harbach and Kitching, 2016). rRNA as a molecular marker for taxonomy and phylogeny We sequenced a 621 bp region of the COI gene to confirm morphological species identification of our specimens and to compare the functionality of rRNA and COI sequences as molecular markers for taxonomic and phylogenetic investigations. COI sequences were able to unequivocally determine the species identity in most specimens except for the following cases. An. coustani COI sequences from our study, regardless of specimen origin, shared remarkably high nucleotide similarity (>98%) with several other Anopheles species such as An. rhodesiensis, An. rufipes, An. ziemanni, An. tenebrosus, although An. coustani remained the most frequent and closest match. In the case of Ae. simpsoni, three specimens had been morphologically identified as Ae. opok although their COI sequences showed 97–100% similarity to that of Ae. simpsoni. As GenBank held no records of Ae. opok COI at the time of this study, we instead aligned the putative Ae. simpsoni COI sequences against two sister species of Ae. opok: Ae. luteocephalus and Ae. africanus. We found they shared only 90% and 89% similarity, respectively. Given this significant divergence, we concluded these specimens to be Ae. simpsoni. Ambiguous results were especially frequent among Culex specimens belonging to the Cx. pipiens or Cx. vishnui subgroups, where the query sequence differed with either of the top two hits by a single nucleotide. For example, between Cx. quinquefasciatus and Cx. pipiens of the Cx. pipiens subgroup, and between Cx. vishnui and Cx. tritaeniorhynchus of the Cx. vishnui subgroup. Among our three specimens of Ma. titillans, two appeared to belong to a single species that is different from but closely related to Ma. titillans. We surmised that these specimens could instead be Ma. pseudotitillans based on morphological similarity but were not able to verify this by molecular means as no COI reference sequence is available for this species. These specimens are hence putatively labelled as ‘Ma sp.4

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call