File Download Research Articles

Abstract Whole genome sequencing (WGS) is increasingly used in both research and clinical settings. The Variant Call Format (VCF) specification is a widely adopted file format for genetic variation data exchange partially due to its smaller file size compared to raw WGS BAMs. Each variant in a typical VCF file contains its chromosome position, reference/alternative alleles and corresponding allele counts. This makes it possible to identify copy number alterations (CNAs). To this end, we developed VCF2CNA (http://vcf2cna.stjude.org), a web interface tool for CNA analysis from VCF files. A user of VCF2CNA, uploads a VCF file via the provided web interface. The entire analysis runs remotely with an average run time of 23 minutes. Results are emailed to the user as either a downloadable link or file attachments. VCF2CNA also accepts input in the Mutation Annotation Format (MAF) and the variant file format produced by the Bambino program. We analyzed 22 TCGA glioblastoma tumor/normal pairs by Illumina technology to evaluate VCF2CNA’s performance. It achieved high consistency (average F1-score: 0.952 ± 0.082) with CONSERTING, a tool that incorporated read-depth and SV data from raw BAMs for CNA detection. A segment-by-segment comparison between results from CONSERTING and VCF2CNA indicated that the latter was less sensitive to focal CNAs. This is expected because there is less information in the VCF input than in raw BAMs. Further analysis using samples with a “fractured genome” pattern revealed that VCF2CNA was more robust to library artifacts and produced relatively clean CNA profiles (on average 76.2-fold reduction compared to the number of segments reported by CONSERTING). Finally, we analyzed 137 pediatric neuroblastoma samples from the TARGET project, sequenced by Complete Genomics, Inc. (CGI) technology. MYCN amplification has been clinically validated in 33 samples. VCF2CNA identified high amplitude MYCN gains in 32 samples and the remaining sample carried a low-level broad gain covering MYCN. For comparison, CGI’s HMM-based method reported MYCN gains in only 15 out of the 33 samples. VCF2CNA further identified two additional MYCN amplifications among the remaining samples. Collectively, our analysis suggests that VCF2CNA is a platform-independent, efficient, robust and accurate tool for general WGS-based CNA analysis. It further complements CONSERTING, which produces more accurate result in focal CNAs at the cost of significantly higher computational burden. Citation Format: Daniel K. Putnam, Xiaotu Ma, Stephen V. Rice, Yu Liu, Jinghui Zhang, Xiang Chen. VCF2CNA: a tool for efficiently detecting copy number alteration using VCF genotype data [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2017; 2017 Apr 1-5; Washington, DC. Philadelphia (PA): AACR; Cancer Res 2017;77(13 Suppl):Abstract nr 2587. doi:10.1158/1538-7445.AM2017-2587

Read full abstract

ObjectiveLANL has built a software program that automatically collectsglobal notifiable disease data—particularly data stored in files—andmakes it available and shareable within the Biosurveillance Ecosystem(BSVE) as a new data source. This will improve the prediction andearly warning of disease events and other applications.IntroductionMost countries do not report national notifiable disease data in amachine-readable format. Data are often in the form of a file thatcontains text, tables and graphs summarizing weekly or monthlydisease counts. This presents a problem when information is neededfor more data intensive approaches to epidemiology, biosurveillanceand public health as exemplified by the Biosurveillance Ecosystem(BSVE).While most nations do likely store their data in a machine-readableformat, the governments are often hesitant to share data openly fora variety of reasons that include technical, political, economic, andmotivational issues [1]. For example, an attempt by LANL to obtaina weekly version of openly available monthly data, reported by theAustralian government, resulted in an onerous bureaucratic reply. Theobstacles to obtaining data included: paperwork to request data fromeach of the Australian states and territories, a long delay to obtaindata (up to 3 months) and extensive limitations on the data’s use thatprohibit collaboration and sharing. This type of experience whenattempting to contact public health departments or ministries of healthfor data is not uncommon.A survey conducted by LANL of notifiable disease data reportingin 52 countries identified only 10 as being machine-readable and42 being reported in pdf files on a regular basis. Within the 42 nationsthat report in pdf files, 32 report in a structured, tabular format and10 in a non-structured way.As a result, LANL has developed a tool-Epi Archive (formerlyknown as EPIC)-to automatically and continuously collect globalnotifiable disease data and make it readily accesible.MethodsWe conducted a survey of the national notifiable disease reportingsystems notating how the data is reported in two important dimensions:date standards and case definitions.The development of software to regularly ingests notifiabledisease data frand makes this data available involved four main stepsscraping, extracting, parsing and persisting.For scraping: we would examine website designs and determinereporting mechanisms for each country/website as well as what variesacross the reporting mechanisms. We then designed and wrote codeto automate the downloading of report pdf files, for each country.We stored report pdfs along with appropriate metadata for extractingand parsing.For extracting: we developed software that can extract notifiabledisease data presented in tabular form from a pdf file. We combinedthe methodology of figure placement detection with the in-housedeveloped table extraction and annotation heuristics.For parsing: we determined what to extract from each pdf dataset from the survey conducted. We then parsed the extracted datainto uniform data structures correctly accommodating the dimensionssurveyed and the various human languages. This task involvedingesting notifiable disease data in many disparate formats extractedfrom pdf files and coalescing the data into a standardized format.For persisting: We then store the data in the Epi ArchivePostgreSQL database and make it available through the BSVE.ResultsThe EpiArchive tool currently contains subnational notifiabledisease data from 10 nations. When a user accesses the EpiArchivesite, they are prompted with four fields: country, region, disease,and date duration. These fields allow the user to specify the location(down to the state level), the disease of interest, and the durationof interest. Upon form submission, a time series is generated fromthe users’ specifications. The generated time series can then bedownloaded into a csv file if a user is interested in performingpersonal analysis. Additionally, the data from EpiArchive can bereached through an API.ConclusionsLANL as part of a currently funded DTRA effort so that it willautomatically and continuously collect global notifiable diseasedata—particularly data stored in pdf files—and make it available andshareable within the Biosurveillance Ecosystem (BSVE) as a newdata source. This will provide data to analytics and users that willimprove the prediction and early warning of disease events and otherapplications.

Read full abstract

File Download Research Articles

Related Topics

Articles published on File Download

비트 토렌트에서 건강한 피어 선택 기반의 파일 다운로드 속도 향상

Current Clinical Practice Patterns of Self-Identified Interventional Radiologists.

비트토렌트에서 파일 다운로드 가용성 보장을 위한 S-트래커 설계 및 구현

The Mycology Collections Portal (MyCoPortal)

Intersections of History, Media, and Culture

Feedback-Based Online Network Coding

Optimized device centric aggregation mechanisms for mobile devices with multiple wireless interfaces

The effect of complement inhibition on erythrocyte destruction in AIHA

Reputation Management and Content Control: An Analysis of Radiation Oncologists' Digital Identities

BHL: A Source for Big Data Analysis

Problematic internet use among high school students: Prevalence, associated factors and gender differences

Abstract 2587: VCF2CNA: a tool for efficiently detecting copy number alteration using VCF genotype data

Cloud-Assisted Cooperative Mechanism for File Download in Mobile Peer-to-Peer Networks

A Primer to the Structure, Content and Linkage of the FDA's Manufacturer and User Facility Device Experience (MAUDE) Files.

Epi Archive: automated data collection of notifiable disease data

PUBLIC INTEGRITY AUDITING FOR SHARED DYNAMIC CLOUD DATA WITH GROUP USER REVOCATION

DSRC versus 4G-LTE for Connected Vehicle Applications: A Study on Field Experiments of Vehicular Communication Performance

Demographic factors contributing to online movie piracy of Hindi films produced in Mumbai

A smart storage optimisation technique on the cloud

A smart storage optimisation technique on the cloud

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

File Download Research Articles

Related Topics

Articles published on File Download

비트 토렌트에서 건강한 피어 선택 기반의 파일 다운로드 속도 향상

Current Clinical Practice Patterns of Self-Identified Interventional Radiologists.

비트토렌트에서 파일 다운로드 가용성 보장을 위한 S-트래커 설계 및 구현

The Mycology Collections Portal (MyCoPortal)

Intersections of History, Media, and Culture

Feedback-Based Online Network Coding

Optimized device centric aggregation mechanisms for mobile devices with multiple wireless interfaces

The effect of complement inhibition on erythrocyte destruction in AIHA

Reputation Management and Content Control: An Analysis of Radiation Oncologists' Digital Identities

BHL: A Source for Big Data Analysis

Problematic internet use among high school students: Prevalence, associated factors and gender differences

Abstract 2587: VCF2CNA: a tool for efficiently detecting copy number alteration using VCF genotype data

Cloud-Assisted Cooperative Mechanism for File Download in Mobile Peer-to-Peer Networks

A Primer to the Structure, Content and Linkage of the FDA's Manufacturer and User Facility Device Experience (MAUDE) Files.

Epi Archive: automated data collection of notifiable disease data

PUBLIC INTEGRITY AUDITING FOR SHARED DYNAMIC CLOUD DATA WITH GROUP USER REVOCATION

DSRC versus 4G-LTE for Connected Vehicle Applications: A Study on Field Experiments of Vehicular Communication Performance

Demographic factors contributing to online movie piracy of Hindi films produced in Mumbai

A smart storage optimisation technique on the cloud

A smart storage optimisation technique on the cloud