Advanced Computational Frameworks for High-Accuracy Taxonomic Classification of Arowana Fish: Comparative Analysis of Asian and Australian Variants

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

Advanced Computational Frameworks for High-Accuracy Taxonomic Classification of Arowana Fish: Comparative Analysis of Asian and Australian Variants

Similar Papers
  • Research Article
  • Cite Count Icon 9
  • 10.1186/s12955-021-01774-0
A comparative performance analysis of the International Classification of Functioning, Disability and Health and the Item-Perspective Classification framework for classifying the content of patient reported outcome measures
  • Apr 23, 2021
  • Health and quality of life outcomes
  • Derek Rosa + 2 more

BackgroundStandardized coding of the content presented in patient reported outcome measures can be achieved using classification frameworks, and the resulting data can be used for ascertaining content validity or comparative analyses. The International Classification of Functioning (ICF) is a framework with a detailed conceptual structure that has been successfully utilized for such purposes through established coding procedures. The Item Perspective Classification (IPC) framework is a newly developed relational coding system that classifies the respondent perspective and conceptual domains addressed in items. The purpose of this study was to compare and describe the performance of these two frameworks when used alone, and in conjunction, for the generation of data pertaining to the content of patient reported outcome measures.MethodsSix health-related quality of life questionnaires with a total of 159 items were classified by two raters using the Item Perspective Classification framework in conjunction with the International Classification of Functioning. Framework performance indicators included: classification capacity (percent of items amenable to successful classification), coding efficiency (number of codes required to classify items), and content overlap detection (percent of items sharing identical classification codes with at least one other item). Inter-rater reliability of item coding was determined using Krippendorff's alpha.ResultsClassification capacity of the IPC framework was 97%, coding efficiency 26, and content overlap detection was 95%; whereas respective values for the ICF were 68%, 114, and 58%. When used in conjunction values were 63%, 129, and 30%. Krippendorff's alpha exceeded 0.97 for all 3 classification indices.ConclusionInter-rater agreement on classification data was excellent. The IPC framework provided a unique classification of the respondent’s judgment during item response and classified more items using fewer categories, indicated greater content overlap across items and was able to describe the relationship between multiple concepts presented within the context of a single item. The ICF provided a unique classification of item content relating to aspects of disability and generated more detailed and precise descriptions. A combined approach provided a rich description (detailed codes) with each framework providing complementary information. The benefits of this approach in instrument development and content validation require further investigation.

  • Research Article
  • Cite Count Icon 3
  • 10.1097/olq.0000000000000274
Human papillomavirus type 16 sequence variation in concurrent vulvar and penile carcinoma in a married couple.
  • Jun 1, 2015
  • Sexually transmitted diseases
  • Takashi Mitamura + 9 more

We encountered an elderly married couple with concurrent vulvar and penile carcinoma with an Asian variant of human papillomavirus type 16. Asian variants might have an elevated risk of concurrent external genital carcinomas of a male and a female, and analysis of human papillomavirus variants might be important to understand the mechanism of carcinogenesis.

  • Research Article
  • Cite Count Icon 15
  • 10.4149/av_2010_04_247
Variation of Human papillomavirus 16 in cervical and lung cancers in Sichuan, China
  • Dec 1, 2010
  • Acta Virologica
  • J Zhang + 8 more

Although the crucial role of human papillomaviruses (HPVs), especially HPV-16 in various cancers has been confirmed, the variation of HPV-16 among different cancers have not been investigated in a specific geographic location. In order to elucidate whether similar HPV-16 variants are involved in different kinds of cancers in the same geographic location, the analysis of sequence variants of E6 and E7 oncogenes and L1 gene of HPV-16 in cervical and lung cancers in Sichuan, China, was carried out. Tissue samples from 122 cervical cancers, 104 lung cancers, and 138 controls were subjected to RT-PCR or PCR, sequencing, and sequence analysis. The infection rates of HPV-16 in cervical, lung cancers, and non-malignant controls were 68.9%, 17.3%, and 37.0%, respectively. Asian prototype variants prevailed in cervical and lung cancers, while European prototype variants in non-malignant controls. In comparison to the lung cancer, cervical cancer showed a much higher diversity of HPV-16 oncogenes. These results indicate that in Sichuan, China, Asian prototype variants of HPV-16 are more pathogenic than their European counterparts.

  • Research Article
  • Cite Count Icon 20
  • 10.1073/pnas.2202727119
Homologous recombination–deficient mutation cluster in tumor suppressor RAD51C identified by comprehensive analysis of cancer variants
  • Sep 13, 2022
  • Proceedings of the National Academy of Sciences
  • Rohit Prakash + 19 more

Mutations in homologous recombination (HR) genes, including BRCA1, BRCA2, and the RAD51 paralog RAD51C, predispose to tumorigenesis and sensitize cancers to DNA-damaging agents and poly(ADP ribose) polymerase inhibitors. However, ∼800 missense variants of unknown significance have been identified for RAD51C alone, impairing cancer risk assessment and therapeutic strategies. Here, we interrogated >50 RAD51C missense variants, finding that mutations in residues conserved with RAD51 strongly predicted HR deficiency and disrupted interactions with other RAD51 paralogs. A cluster of mutations was identified in and around the Walker A box that led to impairments in HR, interactions with three other RAD51 paralogs, binding to single-stranded DNA, and ATP hydrolysis. We generated structural models of the two RAD51 paralog complexes containing RAD51C, RAD51B-RAD51C-RAD51D-XRCC2 and RAD51C-XRCC3. Together with our functional and biochemical analyses, the structural models predict ATP binding at the interface of RAD51C interactions with other RAD51 paralogs, similar to interactions between monomers in RAD51 filaments, and explain the failure of RAD51C variants in binding multiple paralogs. Ovarian cancer patients with variants in this cluster showed exceptionally long survival, which may be relevant to the reversion potential of the variants. This comprehensive analysis provides a framework for RAD51C variant classification. Importantly, it also provides insight into the functioning of the RAD51 paralog complexes.

  • Research Article
  • Cite Count Icon 20
  • 10.1038/s41431-019-0440-3
Variability in gene-based knowledge impacts variant classification: an analysis of FBN1 missense variants in ClinVar.
  • Jun 21, 2019
  • European Journal of Human Genetics
  • Linnea M Baudhuin + 3 more

Gene-specific knowledge can enhance genetic variant classification, but may not be routinely incorporated into clinical laboratory practice. For example, FBN1 variants associated with Marfan syndrome may be variably classified depending on knowledge of FBN1-specific critical regions. In order to assess variability in classification of FBN1 variants, 674 FBN1 missense variants from 18 ClinVar submitters were compared and reanalyzed using FBN1-specific criteria and ACMG/AMP 2015 guidelines for variant interpretation. Conflicting variant classifications occurred in 30.7% of the missense variants that had multiple submitters. There were 451 classifications of 361 critical residue missense variants, with 80.0% (361/451) classified as likely pathogenic or pathogenic [(L)P]. Non-cysteine critical residue variants were less likely to be classified as (L)P [55.3% (78/141)] than cysteine variants [91.3% (283/310)] and were more likely to lack evidence citing the functional significance of the amino acid impacted. Application of FBN1-specific knowledge allowed for reclassification or discrepancy resolution in 65/361 (18.0%) critical residue variants. There were 522 classifications of 313 unique missense variants not known to impact a critical residue. Of these, 31.6% (165/522) were likely overclassified as either (L)P or uncertain significance (VUS), especially when minor allele frequency (MAF) was taken into account, and we reclassified or resolved classification discrepancies in 128/313 (40.9%) of these variants. Our results provide a refined framework and resource for FBN1 variant classification, and further supports the more global implications of combining gene-based knowledge with ACMG/AMP criteria and appropriate MAF cutoffs for variant classification that extend beyond FBN1.

  • Research Article
  • Cite Count Icon 141
  • 10.1016/j.gastro.2005.06.005
Functional Significance and Clinical Phenotype of Nontruncating Mismatch Repair Variants of
  • Aug 1, 2005
  • Gastroenterology
  • T Raevaara + 9 more

Functional Significance and Clinical Phenotype of Nontruncating Mismatch Repair Variants of

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 2
  • 10.1093/bioadv/vbae016
HiTaxon: a hierarchical ensemble framework for taxonomic classification of short reads.
  • Jan 5, 2024
  • Bioinformatics Advances
  • Bhavish Verma + 1 more

Whole microbiome DNA and RNA sequencing (metagenomics and metatranscriptomics) are pivotal to determining the functional roles of microbial communities. A key challenge in analyzing these complex datasets, typically composed of tens of millions of short reads, is accurately classifying reads to their taxa of origin. While still performing worse relative to reference-based short-read tools in species classification, ML algorithms have shown promising results in taxonomic classification at higher ranks. A recent approach exploited to enhance the performance of ML tools, which can be translated to reference-dependent classifiers, has been to integrate the hierarchical structure of taxonomy within the tool's predictive algorithm. Here, we introduce HiTaxon, an end-to-end hierarchical ensemble framework for taxonomic classification. HiTaxon facilitates data collection and processing, reference database construction and optional training of ML models to streamline ensemble creation. We show that databases created by HiTaxon improve the species-level performance of reference-dependent classifiers, while reducing their computational overhead. In addition, through exploring hierarchical methods for HiTaxon, we highlight that our custom approach to hierarchical ensembling improves species-level classification relative to traditional strategies. Finally, we demonstrate the improved performance of our hierarchical ensembles over current state-of-the-art classifiers in species classification using datasets comprised of either simulated or experimentally derived reads. HiTaxon is available at: https://github.com/ParkinsonLab/HiTaxon.

  • Research Article
  • Cite Count Icon 43
  • 10.1093/bioinformatics/bti1052
Automatic detection of subsystem/pathway variants in genome analysis
  • Jun 1, 2005
  • Bioinformatics
  • Y Ye + 3 more

Proteins work together in pathways and networks, collectively comprising the cellular machinery. A subsystem (a generalization of pathway concept) is a group of related functional roles (such as enzymes) jointly involved in a specific aspect of the cellular machinery. Subsystems provide a natural framework for comparative genome analysis and functional annotation. A subsystem may be implemented in a number of different functional variants in individual species. In order to reliably project functional assignments across multiple genomes, we have to be able to identify the variants implemented in each genome. The analysis of such variants across diverse species is an interesting problem by itself and may provide new evolutionary insights. However, no computational techniques are presently available for an automated detection and analysis of subsystem variants. Here we formulate the subsystem variant detection problem as finding the minimum number of subgraphs of a subsystem, which is represented as a graph, and solve the optimization problem by integer programming approach. The performance of our method was tested on subsystems encoded in the SEED, a genomic integration platform developed by the Fellowship for Interpretation of Genomes as a component of a large-scale effort on comparative analysis and annotation of multiple diverse genomes. Here we illustrate the results obtained for two expert-encoded subsystems of the biosynthesis of Coenzyme A and FMN/FAD cofactors. Applications of variant detection, to support genomic annotations and to assess divergence of species, are briefly discussed in the context of these universally conserved and essential metabolic subsystems. The details of the variant detection results are available at http://ffas.burnham.org/svar/supp.html.

  • Research Article
  • Cite Count Icon 6
  • 10.1016/j.giq.2020.101500
From secrecy privilege to information management: A comparative analysis of classification reforms
  • Jul 8, 2020
  • Government Information Quarterly
  • Marlen Heide + 1 more

From secrecy privilege to information management: A comparative analysis of classification reforms

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 68
  • 10.3389/fmicb.2022.1032186
Phage family classification under Caudoviricetes: A review of current tools using the latest ICTV classification framework.
  • Dec 16, 2022
  • Frontiers in Microbiology
  • Yilin Zhu + 3 more

Bacteriophages, which are viruses infecting bacteria, are the most ubiquitous and diverse entities in the biosphere. There is accumulating evidence revealing their important roles in shaping the structure of various microbiomes. Thanks to (viral) metagenomic sequencing, a large number of new bacteriophages have been discovered. However, lacking a standard and automatic virus classification pipeline, the taxonomic characterization of new viruses seriously lag behind the sequencing efforts. In particular, according to the latest version of ICTV, several large phage families in the previous classification system are removed. Therefore, a comprehensive review and comparison of taxonomic classification tools under the new standard are needed to establish the state-of-the-art. In this work, we retrained and tested four recently published tools on newly labeled databases. We demonstrated their utilities and tested them on multiple datasets, including the RefSeq, short contigs, simulated metagenomic datasets, and low-similarity datasets. This study provides a comprehensive review of phage family classification in different scenarios and a practical guidance for choosing appropriate taxonomic classification pipelines. To our best knowledge, this is the first review conducted under the new ICTV classification framework. The results show that the new family classification framework overall leads to better conserved groups and thus makes family-level classification more feasible.

  • Research Article
  • Cite Count Icon 10
  • 10.1128/msphere.00077-22
Comparative Analysis and Data Provenance for 1,113 Bacterial Genome Assemblies
  • May 2, 2022
  • mSphere
  • David A Yarmosh + 14 more

ABSTRACTThe availability of public genomics data has become essential for modern life sciences research, yet the quality, traceability, and curation of these data have significant impacts on a broad range of microbial genomics research. While microbial genome databases such as NCBI’s RefSeq database leverage the scalability of crowd sourcing for growth, genomics data provenance and authenticity of the source materials used to produce data are not strict requirements. Here, we describe the de novo assembly of 1,113 bacterial genome references produced from authenticated materials sourced from the American Type Culture Collection (ATCC), each with full genomics data provenance relating to bioinformatics methods, quality control, and passage history. Comparative genomics analysis of ATCC standard reference genomes (ASRGs) revealed significant issues with regard to NCBI’s RefSeq bacterial genome assemblies related to completeness, mutations, structure, strain metadata, and gaps in traceability to the original biological source materials. Nearly half of RefSeq assemblies lack details on sample source information, sequencing technology, or bioinformatics methods. Deep curation of these records is not within the scope of NCBI’s core mission in supporting open science, which aims to collect sequence records that are submitted by the public. Nonetheless, we propose that gaps in metadata accuracy and data provenance represent an “elephant in the room” for microbial genomics research. Effectively addressing these issues will require raising the level of accountability for data depositors and acknowledging the need for higher expectations of quality among the researchers whose research depends on accurate and attributable reference genome data.IMPORTANCE The traceability of microbial genomics data to authenticated physical biological materials is not a requirement for depositing these data into public genome databases. This creates significant risks for the reliability and data provenance of these important genomics research resources, the impact of which is not well understood. We sought to investigate this by carrying out a comparative genomics study of 1,113 ATCC standard reference genomes (ASRGs) produced by ATCC from authenticated and traceable materials using the latest sequencing technologies. We found widespread discrepancies in genome assembly quality, genetic variability, and the quality and completeness of the associated metadata among hundreds of reference genomes for ATCC strains found in NCBI’s RefSeq database. We present a comparative analysis of de novo-assembled ASRGs, their respective metadata, and variant analysis using RefSeq genomes as a reference. Although assembly quality in RefSeq has generally improved over time, we found that significant quality issues remain, especially as related to genomic data and metadata provenance. Our work highlights the importance of data authentication and provenance for the microbial genomics community, and underscores the risks of ignoring this issue in the future.

  • Research Article
  • 10.47183/mes.2024-26-3-40-50
Molecular and genetic characteristics from phylogenetic analysis of Russian and foreign variants of measles virus 2020–2024
  • Oct 23, 2024
  • Extreme Medicine
  • E N Chernyaeva + 9 more

Introduction. Over the past two years, there has been an increase in measles morbidity in Russia and other countries. In order to assess the heterogeneity of clinically significant strains of Measles morbillivirus to reveal the sources of infection and transmission routes strains, a molecular genetic study thus becomes an urgent task. In this paper, a genetic identification of clinical strains of measles virus detected in 2023–2024 is compared with global variants as well as Russian strains detected in previous years.Materials and methods. Forty measles virus genome sequences isolated from nasopharyngeal swab samples obtained in Moscow and Novosibirsk in 2023–2024 were included in the study. The data were then compared with strains collected in Russia in May 2023 and deposited at the Central Research Institute of Epidemiology, as well as strains collected in 2020 and 2021 in Moscow.Results. The nucleotide sequences of studied Measles morbillivirus strains were categorized into different phylogenetic groups within genotype D8. For samples of genotype B3 collected in 2020 and 2023, a comparative analysis was performed to identify the region of origin. Phylogenetic analysis of Russian and foreign variants of measles virus suggests that strains currently circulating in Russia may be a variety of strains that had previously circulated in other countries and independently spread to Russia in 2023. After analyzing the most frequent nucleotide substitutions in various measles virus genes, the most variable genes were identified to provide a basis for the extension of phylogenetic analysis.Conclusions. The proposed approach to molecular genetic testing of complete and partial genome sequences of clinical isolate of measles virus detected in 2023–2024 in Moscow and Novosibirsk made it possible to identify strain subgroups that differ in origin. The comparison of the Measles morbillivirus strains sequenced in the present research with global sequences allowed us to detect similar sequences identified both in 2023 and in previous years in various countries of the world. The analysis of epidemiologically significant strains of Measles morbillivirus shows that N gene can be used to reliably determine the main genotype; however, this approach is not sufficient for studying the transmission pathways of the virus.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 11
  • 10.1038/s41541-024-00846-8
Genomic analysis of lumpy skin disease virus asian variants and evaluation of its cellular tropism
  • Mar 21, 2024
  • npj Vaccines
  • Shijie Xie + 13 more

Lumpy skin disease virus (LSDV) is a poxvirus that mainly affects cattle and can lead to symptoms such as severe reduction in milk production as well as infertility and mortality, which has resulted in dramatic economic loss in affected countries in Africa, Europe, and Asia. In this study, we successfully isolated two strains of LSDV from different geographical regions in China. Comparative genomic analyses were performed by incorporating additional LSDV whole genome sequences reported in other areas of Asia. Our analyses revealed that LSDV exhibited an ‘open’ pan-genome. Phylogenetic analysis unveiled distinct branches of LSDV evolution, signifying the prevalence of multiple lineages of LSDV across various regions in Asia. In addition, a reporter LSDV expressing eGFP directed by a synthetic poxvirus promoter was generated and used to evaluate the cell tropism of LSDV in various mammalian and avian cell lines. Our results demonstrated that LSDV replicated efficiently in several mammalian cell lines, including human A549 cells. In conclusion, our results underscore the necessity for strengthening LSD outbreak control measures and continuous epidemiological surveillance.

  • Research Article
  • Cite Count Icon 22
  • 10.1093/ve/vez056
Application of a sequence-based taxonomic classification method to uncultured and unclassified marine single-stranded RNA viruses in the order Picornavirales
  • Jul 1, 2019
  • Virus Evolution
  • Marli Vlok + 2 more

Metagenomics has altered our understanding of microbial diversity and ecology. This includes its applications to viruses in marine environments that have demonstrated their enormous diversity. Within these are RNA viruses, many of which share genetic features with members of the order Picornavirales; yet, very few of these have been taxonomically classified. The only recognized family of marine RNA viruses is the Marnaviridae, which was founded based on discovery and characterization of the species Heterosigma akashiwo RNA virus. Two additional genera of marine RNA viruses, Labyrnavirus (one species) and Bacillarnavirus (three species), were subsequently defined within the order Picornavirales but not assigned to a family. We have defined a sequence-based framework for taxonomic classification of twenty marine RNA viruses into the family Marnaviridae. Using RNA-dependent RNA polymerase (RdRp) phylogeny and distance-based analyses, we assigned the genera Labyrnavirus and Bacillarnavirus to the family Marnaviridae and created four additional genera in the family: Locarnavirus (four species), Kusarnavirus (one species), Salisharnavirus (four species) and Sogarnavirus (six species). We used pairwise capsid protein comparisons to delineate species within families, with 75 per cent identity as the species demarcation threshold. The family displays high sequence diversities and Jukes–Cantor distances for both the RdRp and capsid genes, suggesting that the classified viruses are not representative of all of the virus diversity within the family and that there are many more extant taxa. Our proposed taxonomic framework provides a sound classification system for this group of viruses that will have broadly applicable principles for other viral groups. It is based on sequence data alone and provides a robust taxonomic framework to include viruses discovered via metagenomic studies, thereby greatly expanding the realm of viruses subject to taxonomic classification.

  • Research Article
  • Cite Count Icon 6
  • 10.1016/j.imu.2021.100762
A comprehensive in-silico computational analysis of twenty cancer exome datasets and identification of associated somatic variants reveals potential molecular markers for detection of varied cancer types
  • Jan 1, 2021
  • Informatics in Medicine Unlocked
  • P Padmavathi + 3 more

A comprehensive in-silico computational analysis of twenty cancer exome datasets and identification of associated somatic variants reveals potential molecular markers for detection of varied cancer types

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.

Search IconWhat is the difference between bacteria and viruses?
Open In New Tab Icon
Search IconWhat is the function of the immune system?
Open In New Tab Icon
Search IconCan diabetes be passed down from one generation to the next?
Open In New Tab Icon