Open Source Development Research Articles

ContextIn efforts to mitigate anthropogenic impacts on floodplain biodiversity, restoration measures that enhance habitat connectivity have been applied. However, these approaches have either neglected the spatial position of water bodies or the dynamic nature of the floodplain ecosystem.ObjectivesThis study focuses on the novel application of the multilayer network framework to assess changes in the aquatic habitat connectivity in floodplains, showcasing its application in the context of aquatic passive dispersal (drift) of two indicator groups of benthic macroinvertebrates (Oligochaetes and Chironomids)MethodsOur case study is located in the Donau-Auen National Park in Austria and follows floodplain restoration measures (side-channel reconnection) applied in the mid-1990s. Multilayer networks were constructed to represent the conditions before, short-term, and long-term after restoration to quantify habitat connectivity across inundation frequencies. Our network analyses involved multilayer correlation, static and dynamic monolayer centralities (centrality profiles), and multilayer centrality assessments. We used a Partial Least Squares Regression analysis as a variable selection tool to identify which centrality measures better explained the variance in α\\documentclass[12pt]{minimal} \\usepackage{amsmath} \\usepackage{wasysym} \\usepackage{amsfonts} \\usepackage{amssymb} \\usepackage{amsbsy} \\usepackage{mathrsfs} \\usepackage{upgreek} \\setlength{\\oddsidemargin}{-69pt} \\begin{document}$$\\alpha$$\\end{document} diversity and Local Contributions to Beta Diversity (LCBD) of benthic macroinvertebrates.ResultsIn the short-term, our connectivity analysis indicated an increase in habitat connectivity. However, centrality profiles, multilayer correlation, and multilayer centrality techniques identified a long-term decrease in connectivity. Multilayer centralities had higher Variable Importance in the Projection scores (VIP) than their monolayer counterpart in explaining variations in α\\documentclass[12pt]{minimal} \\usepackage{amsmath} \\usepackage{wasysym} \\usepackage{amsfonts} \\usepackage{amssymb} \\usepackage{amsbsy} \\usepackage{mathrsfs} \\usepackage{upgreek} \\setlength{\\oddsidemargin}{-69pt} \\begin{document}$$\\alpha$$\\end{document} diversity and LCBD for strict aquatic dispersers. Meanwhile, for flying dispersers, monolayer centralities had the highest VIP scores for explaining α\\documentclass[12pt]{minimal} \\usepackage{amsmath} \\usepackage{wasysym} \\usepackage{amsfonts} \\usepackage{amssymb} \\usepackage{amsbsy} \\usepackage{mathrsfs} \\usepackage{upgreek} \\setlength{\\oddsidemargin}{-69pt} \\begin{document}$$\\alpha$$\\end{document} diversity.ConclusionsThis study underscores the relevance of integrating dynamic aspects of water-mediated transport beyond traditional pairwise distances. Although in this study we apply this tool by showcasing the aquatic passive dispersal mode, the application of this method can be extended to other dispersal modes and representative abilities for diverse groups of aquatic organisms. The expanding cross-disciplinary applications and open-source tool development for multilayer networks offer practical implications for planning and evaluating management measures.

Read full abstract

Abstract Clinical data storage in unstructured notes and siloed datasets present a major challenge for large-scale cancer informatics. Whether natural language processing (NLP) combined with multimodal integration across datasets can produce a mineable resource and improve discovery of relationships between tumor genomics and clinical phenotypes is unknown. We hypothesized that NLP could automatically annotate a pan-cancer corpus of 82,464 patients with tumor genomic sequencing. To develop algorithms to annotate free-text reports, we leveraged the AACR Project GENIE Biopharma Collaborative (BPC), a structured curation of EMR from five cancer types (non-small cell lung (NSCLC), breast, colorectal, prostate, and pancreatic cancer), to train and validate several Transformer and rule based-based NLP models. After automating the generation of NLP annotations alongside medication, demographic, tumor registry, survival, and tumor genomic sequencing data, we tested whether clinicogenomic relationships not apparent in the smaller BPC cohort might be discoverable in the larger cohort. In 5-fold cross-validation, NLP Transformers accurately annotated the presence of cancer (AUC=0.99), cancer progression (AUC=0.97), and sites of disease (AUC=0.99) from radiology reports, and presence of prior outside treatment (AUC=0.98) and hormone receptor (HR) and HER2 receptor status (AUC=0.98, 0.98) from clinician notes. In addition, rule-based models, trained on non-BPC data and validated on the whole BPC cohort, annotated smoking status from clinician notes (ACC=0.95), and Gleason score (ACC=1.0), PD-L1 status (ACC=0.98), and mismatch repair deficiency (ACC=0.98) from histopathology reports. NLP annotations were merged with genomic and other structured clinical data to create a Clinicogenomic, Harmonized Oncologic Real-world Dataset (MSK-CHORD). Finally, we tested if associations not apparent in the BPC might be discoverable in MSK-CHORD. We found positive associations between Gleason score and gene-level alterations in prostate cancer including TP53, PTEN and BRCA2 (q&lt;0.1), none of which were adequately powered for detection in the BPC. We found PD-L1 status was associated with better survival following immunotherapy treatment in NSCLC, but only in the larger MSK-CHORD was this association statistically significant. In breast cancer, NF1 mutations were associated with prior therapy in both cohorts, but this association was only significant in MSK-CHORD. The infrastructure generating MSK-CHORD uses a combination of on-premise and cloud computing resources and open-source development operation applications to automate processes. Once annotations are created, data is imported into a local instance of cBioPortal, where researchers can visualize data and perform analyses. The system generating MSK-CHORD demonstrates how large-scale data delivery and integration can fuel cancer research. Citation Format: Christopher J. Fong, Karl Pichotta, Thinh Tran, Michele Waters, Tom Fu, Mono Pirun, Mirella Altoe, Brooke Mastrogiacomo, Anisha Luthra, Mehnaj Ahmed, Arfath Pasha, Armaan Kohli, Raymond Lim, Tom Pollard, Darin Moore, Benjamin Gross, Avery Wang, Calla Chennault, Ritika Kundra, Ramyasree Madupuri, Ino de Bruijn, Aaron Lisman, Walid K. Chatila, Subhiksha Nandakumar, Anika Begum, Doori Rose, Kenneth L. Kehl, Deborah Schrag, Michael Berger, Jian Carrot-Zhang, Pedram Razavi, Bob Li, Peter Stetson, Nikolaus Schultz, Justin Jee. Systematic generation of a clinicogenomic harmonized oncologic real-world dataset (MSK-CHORD) [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2024; Part 1 (Regular Abstracts); 2024 Apr 5-10; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2024;84(6_Suppl):Abstract nr 3892.

Read full abstract

Open Source Development Research Articles

Related Topics

Articles published on Open Source Development

"A Lot of Moving Parts": A Case Study of Open-Source Hardware Design Collaboration in the Thingiverse Community

Collaborative development of open-source outbreak detection tools: Needs, concept and implementation

Multilayer networks in landscape ecology: a case study to assess changes in aquatic habitat connectivity for flying and non-flying benthic macroinvertebrates in a Danube floodplain

Generated power forecast of dye-sensitized solar plant with deep neural network

Decent deepfakes? Professional deepfake developers’ ethical considerations and their governance potential

Multifaceted formal methods and their interdisciplinary role — From the cathedral of ‘components as coalgebras’ to the HCI context and the open source software bazaar

A data science pipeline applied to Australia's 2022 COVID-19 Omicron waves

Automated, Near Real-Time Ground-Motion Processing at the U.S. Geological Survey

Low-cost, portable, easy-to-use kiosks to facilitate home-cage testing of nonhuman primates during vision-based behavioral tasks.

Desain dan Implementasi Robot Mobile 4WD dan Aplikasi Smartphone sebagai Media Pembelajaran Robotik

OPEN SOURCE DEVELOPER SUPPORT TOOL

Development and Application of a New Open-Source Integrated Surface–Subsurface Flow Model in Plain Farmland

FAIR compliant database development for human microbiome data samples.

Characterizing Developers' Linguistic Behaviors in Open Source Development across Their Social Statuses

Community-Led Development and Participatory Design in Open Source: Empowering Collaboration for Sustainable Solutions

Development of a microfluidic-assisted open-source 3D bioprinting system (MOS3S) for the engineering of hierarchical tissues

Abstract 3892: Systematic generation of a clinicogenomic harmonized oncologic real-world dataset (MSK-CHORD)

Phybers: a package for brain tractography analysis.

A Novel Design of a Low-Cost SCADA System for Monitoring Standalone Photovoltaic Systems

How to Find the Right Scope for Open Source Developments?

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Open Source Development Research Articles

Related Topics

Articles published on Open Source Development

"A Lot of Moving Parts": A Case Study of Open-Source Hardware Design Collaboration in the Thingiverse Community

Collaborative development of open-source outbreak detection tools: Needs, concept and implementation

Multilayer networks in landscape ecology: a case study to assess changes in aquatic habitat connectivity for flying and non-flying benthic macroinvertebrates in a Danube floodplain

Generated power forecast of dye-sensitized solar plant with deep neural network

Decent deepfakes? Professional deepfake developers’ ethical considerations and their governance potential

Multifaceted formal methods and their interdisciplinary role — From the cathedral of ‘components as coalgebras’ to the HCI context and the open source software bazaar

A data science pipeline applied to Australia's 2022 COVID-19 Omicron waves

Automated, Near Real-Time Ground-Motion Processing at the U.S. Geological Survey

Low-cost, portable, easy-to-use kiosks to facilitate home-cage testing of nonhuman primates during vision-based behavioral tasks.

Desain dan Implementasi Robot Mobile 4WD dan Aplikasi Smartphone sebagai Media Pembelajaran Robotik

OPEN SOURCE DEVELOPER SUPPORT TOOL

Development and Application of a New Open-Source Integrated Surface–Subsurface Flow Model in Plain Farmland

FAIR compliant database development for human microbiome data samples.

Characterizing Developers' Linguistic Behaviors in Open Source Development across Their Social Statuses

Community-Led Development and Participatory Design in Open Source: Empowering Collaboration for Sustainable Solutions

Development of a microfluidic-assisted open-source 3D bioprinting system (MOS3S) for the engineering of hierarchical tissues

Abstract 3892: Systematic generation of a clinicogenomic harmonized oncologic real-world dataset (MSK-CHORD)

Phybers: a package for brain tractography analysis.

A Novel Design of a Low-Cost SCADA System for Monitoring Standalone Photovoltaic Systems

How to Find the Right Scope for Open Source Developments?