Abstract

BackgroundData extraction and integration methods are becoming essential to effectively access and take advantage of the huge amounts of heterogeneous genomics and clinical data increasingly available. In this work, we focus on The Cancer Genome Atlas, a comprehensive archive of tumoral data containing the results of high-throughout experiments, mainly Next Generation Sequencing, for more than 30 cancer types.ResultsWe propose TCGA2BED a software tool to search and retrieve TCGA data, and convert them in the structured BED format for their seamless use and integration. Additionally, it supports the conversion in CSV, GTF, JSON, and XML standard formats. Furthermore, TCGA2BED extends TCGA data with information extracted from other genomic databases (i.e., NCBI Entrez Gene, HGNC, UCSC, and miRBase). We also provide and maintain an automatically updated data repository with publicly available Copy Number Variation, DNA-methylation, DNA-seq, miRNA-seq, and RNA-seq (V1,V2) experimental data of TCGA converted into the BED format, and their associated clinical and biospecimen meta data in attribute-value text format.ConclusionsThe availability of the valuable TCGA data in BED format reduces the time spent in taking advantage of them: it is possible to efficiently and effectively deal with huge amounts of cancer genomic data integratively, and to search, retrieve and extend them with additional information. The BED format facilitates the investigators allowing several knowledge discovery analyses on all tumor types in TCGA with the final aim of understanding pathological mechanisms and aiding cancer treatments.

Highlights

  • Data extraction and integration methods are becoming essential to effectively access and take advantage of the huge amounts of heterogeneous genomics and clinical data increasingly available

  • TCGA2BED data repository We created and are maintaining an open access File transfer protocol (FTP) repository, which contains the original The Cancer Genome Atlas (TCGA) data sets [14] addressed by TCGA2BED and the corresponding data converted into the Browser Extensible Data (BED) format

  • Each tumor folder includes a set of directories, each for a specific experiment type, containing the publicly available TCGA data of that experiment type for that tumor converted in BED format, and named, i.e., cnv, dnamethylation, dnaseq, mirnaseq, rnaseq, rnaseqv2

Read more

Summary

Results

We propose TCGA2BED a software tool to search and retrieve TCGA data, and convert them in the structured BED format for their seamless use and integration. It supports the conversion in CSV, GTF, JSON, and XML standard formats. TCGA2BED extends TCGA data with information extracted from other genomic databases (i.e., NCBI Entrez Gene, HGNC, UCSC, and miRBase). We provide and maintain an automatically updated data repository with publicly available Copy Number Variation, DNA-methylation, DNA-seq, miRNA-seq, and RNA-seq (V1,V2) experimental data of TCGA converted into the BED format, and their associated clinical and biospecimen meta data in attribute-value text format

Conclusions
Background
Results and discussion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.