Use Of Files Research Articles

Abstract Phylogenies with extensive taxon sampling have become indispensable for many types of ecological and evolutionary studies. Many large‐scale trees are based on a ‘supermatrix’ approach, which involves amalgamating thousands of published sequences for a group. Constructing up‐to‐date supermatrices can be challenging, especially as new sequences may become available almost constantly. Additionally, genomic datasets (composed of thousands of loci) are becoming common in phylogenetics and phylogeography, and present novel challenges for constructing such datasets. Here we present SuperCRUNCH, a Python toolkit for assembling large phylogenetic datasets. It can be applied to GenBank sequences, unpublished sequences or combinations of GenBank and unpublished data. SuperCRUNCH constructs local databases and uses them to conduct rapid searches for user‐specified sets of taxa and loci. Sequences are parsed into putative loci and passed through rigorous filtering steps. A post‐filtering step allows for selection of one sequence per taxon (i.e. species‐level supermatrix) or retention of all sequences per taxon (i.e. population‐level dataset). Importantly, SuperCRUNCH can generate ‘vouchered’ population‐level datasets, in which voucher information is used to generate multi‐locus phylogeographic datasets. SuperCRUNCH offers many options for taxonomy resolution, similarity filtering, sequence selection, alignment and file manipulation. We demonstrate the range of features available in SuperCRUNCH by generating a variety of phylogenetic datasets. Output datasets include traditional species‐level supermatrices, large‐scale phylogenomic matrices and phylogeographic datasets. Finally, we briefly compare the ability of SuperCRUNCH to construct species‐level supermatrices relative to alternative approaches. SuperCRUNCH generated a large‐scale supermatrix (1,400 taxa and 66 loci) from 16 GB of GenBank data in ~1.5 hr, and generated population‐level datasets (<350 samples, <10 loci) in <1 min. It outperformed alternative methods for supermatrix construction in terms of taxa, loci and sequences recovered. SuperCRUNCH is a modular bioinformatics toolkit that can be used to assemble datasets for any taxonomic group and scale (kingdoms to individuals). It allows rapid construction of supermatrices, greatly simplifying the process of updating large phylogenies with new data. It is also designed to produce population‐level datasets. SuperCRUNCH streamlines the major tasks required to process phylogenetic data, including filtering, alignment, trimming and formatting. SuperCRUNCH is open‐source, documented and available at https://github.com/dportik/SuperCRUNCH.

Read full abstract

Background: The Stroke Encounter Quality Improvement Project (SEQIP) is a collaboration between certified stroke centers, the AHA/ASA and the Kentucky Department for Public Health (KDPH) to implement statewide QI initiatives to improve the care of stroke patients. From 2009 to 2018, 23 hospital in Kentucky participating in SEQIP have entered 76,222 stroke patient records into Get With The Guidelines® (GWTG) / Patient Management Tool™ (PMT). Purpose: Geographic information systems (GIS) tools can expand our understanding of care and outcomes based on patient location. The purpose of this project was to demonstrate the methods of linking a disease management registry with GIS mapping and analysis program,) to understand challenges when performing this link, and to derive meaningful insight on stroke care and outcomes by zip code. Methods: Stroke data from GWTG and PMT was compiled and downloaded by KDPH. The information was converted to a database file for use in ArcGIS. After excluding those who had missing or incomplete zip codes, records were Geocoded annually from 2009 to 2018. The data were then matched to one of 945 zip codes in Kentucky. Data were summarized by zip code, calendar year by the number of ischemic strokes; number IV alteplase administration; rate ischemic stroke receiving IV alteplase; number and rate of ischemic stroke patients arriving to hospital by EMS, privately owned vehicle or transfer; and median time from last known well to hospital arrival; and medical history of hypertension. Additional data including hospitals, certified stroke centers, drive time analysis, etc. were added to maps. Results: Mapping GWTG and PMT stroke data is feasible and may allow for additional analysis by location. Conclusion: Using GIS mapping and methodology can assist hospital stroke coordinators and public health officials in developing and implementing interventions to improvement stroke care and outcomes. Further analysis including socioeconomic, demographic and marketing/consumer preference data is planned to better understand variations by zip codes. This feasibility project provides an example of a useful application of GIS analyses applied to data registry including GWTG and PMT.

Read full abstract

Use Of Files Research Articles

Related Topics

Articles published on Use Of Files

Hydraulic Model Database for Applied Water Distribution Systems Research

Nsink: An R package for flow path nitrogen removal estimation.

In-IDE Code Generation from Natural Language: Promise and Challenges

A study on command block collection and restoration techniques through detection of project file manipulation on engineering workstation of industrial control system

FUSTA: leveraging FUSE for manipulation of multiFASTA files at scale.

Role of 3D Printing and Modeling to Aid in Neuroradiology Education for Medical Trainees.

Twelve years of SAMtools and BCFtools.

PandAna: A Python Analysis Framework for Scalable High Performance Computing in High Energy Physics

Extraordinary Command Line: Basic Data Editing Tools for Biologists Dealing with Sequence Data

A Critical Guide to Unix

SEDA: A Desktop Tool Suite for FASTA Files Processing.

SuperCRUNCH: A bioinformatics toolkit for creating and manipulating supermatrices and other large phylogenetic datasets

Extracting shape features from a surface mesh using geometric reasoning

Digital protocol for creating a virtual gingiva adjacent to teeth with subgingival dental preparations

PyGMQL: scalable data extraction and analysis for heterogeneous genomic datasets

Using the Object-Oriented PowerShell for Simple Proteomics Data Analysis.

A Validated Assessment Scale for Asian Chin Projection.

IRS county-to-county migration data, 1990‒2010

Abstract 141: Using Geographic Information Systems (GIS) to Analyze Statewide Regional Data - A Feasibility Project from the Kentucky Stroke Encounter Quality Improvement Project (SEQIP)

Polyphonic pitch perception in rooms using deep learning networks with data rendered in auditory virtual environments

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Use Of Files Research Articles

Related Topics

Articles published on Use Of Files

Hydraulic Model Database for Applied Water Distribution Systems Research

Nsink: An R package for flow path nitrogen removal estimation.

In-IDE Code Generation from Natural Language: Promise and Challenges

A study on command block collection and restoration techniques through detection of project file manipulation on engineering workstation of industrial control system

FUSTA: leveraging FUSE for manipulation of multiFASTA files at scale.

Role of 3D Printing and Modeling to Aid in Neuroradiology Education for Medical Trainees.

Twelve years of SAMtools and BCFtools.

PandAna: A Python Analysis Framework for Scalable High Performance Computing in High Energy Physics

Extraordinary Command Line: Basic Data Editing Tools for Biologists Dealing with Sequence Data

A Critical Guide to Unix

SEDA: A Desktop Tool Suite for FASTA Files Processing.

SuperCRUNCH: A bioinformatics toolkit for creating and manipulating supermatrices and other large phylogenetic datasets

Extracting shape features from a surface mesh using geometric reasoning

Digital protocol for creating a virtual gingiva adjacent to teeth with subgingival dental preparations

PyGMQL: scalable data extraction and analysis for heterogeneous genomic datasets

Using the Object-Oriented PowerShell for Simple Proteomics Data Analysis.

A Validated Assessment Scale for Asian Chin Projection.

IRS county-to-county migration data, 1990‒2010

Abstract 141: Using Geographic Information Systems (GIS) to Analyze Statewide Regional Data - A Feasibility Project from the Kentucky Stroke Encounter Quality Improvement Project (SEQIP)

Polyphonic pitch perception in rooms using deep learning networks with data rendered in auditory virtual environments