Database Subsets Research Articles

BackgroundRecent studies have shown that long non-coding RNAs (lncRNAs) may play key regulatory roles in many malignant tumors. This study investigated the use of novel lncRNA biomarkers in the diagnosis and prognosis of breast cancer.Materials and MethodsThe database subsets of The Cancer Genome Atlas (TCGA) by RNA-seq for comparing analysis of tissue samples between breast cancer and normal control groups were downloaded. Additionally, anticoagulant peripheral blood samples were collected and used in this cohort study. The extracellular vesicles (EVs) from the plasma were extracted and sequenced, then analyzed to determine the expressive profiles of the lncRNAs, and the cancer-related differentially expressed lncRNAs were screened out. The expressive profiles and associated downstream-mRNAs were assessed using bioinformatics (such as weighted correlation network analysis (WGCNA), Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genome (KEGG) enrichments, Receiver-Operating Characteristic (ROC) curve and survival analysis, etc.) to investigate the diagnostic and prognostic values of these EV lncRNAs and their effectors.ResultsIn this study, 41 breast cancer-related lncRNAs were screen out from two datasets of tissue and fresh collected plasma samples of breast cancer via the transcriptomic and bioinformatics techniques. A total of 19 gene modules were identified with WGCNA analysis, of which five modules were significantly correlated with the clinical stage of breast cancer, including 28 lncRNA candidates. The ROC curves of these lncRNAs revealed that the area under the curve (AUC) of all candidates were great than 70%. However, eight lncRNAs had an AUC >70%, indicating that the combined one has a good diagnostic value. In addition, the results of survival analysis suggested that two lncRNAs with low expressive levels may indicate the poor prognosis of breast cancer. By tissue sample verification, C15orf54, AL157935.1, LINC01117, and SNHG3 were determined to have good diagnostic ability in breast cancer lesions, however, there was no significant difference in the plasma EVs of patients. Moreover, survival analysis data also showed that AL355974.2 may serve as an independent prognostic factor and as a protective factor.ConclusionA total of five lncRNAs found in this study could be developed as biomarkers for breast cancer patients, including four diagnostic markers (C15orf54, AL157935.1, LINC01117, and SNHG3) and a potential prognostic marker (AL355974.2).

Read full abstract

Breast cancer is globally a major threat for women's health. Screening and adequate follow-up can significantly reduce the mortality from breast cancer. Human second reading of screening mammograms can increase breast cancer detection rates, whereas this has not been proven for current computer-aided detection systems as "second reader". Critical factors include the detection accuracy of the systems and the screening experience and training of the radiologist with the system. When assessing the performance of systems and system components, the choice of evaluation methods is particularly critical. Core assets herein are reference image databases and statistical methods. We have analyzed characteristics and usage of the currently largest publicly available mammography database, the Digital Database for Screening Mammography (DDSM) from the University of South Florida, in literature indexed in Medline, IEEE Xplore, SpringerLink, and SPIE, with respect to type of computer-aided diagnosis (CAD) (detection, CADe, or diagnostics, CADx), selection of database subsets, choice of evaluation method, and quality of descriptions. 59 publications presenting 106 evaluation studies met our selection criteria. In 54 studies (50.9%), the selection of test items (cases, images, regions of interest) extracted from the DDSM was not reproducible. Only 2 CADx studies, not any CADe studies, used the entire DDSM. The number of test items varies from 100 to 6000. Different statistical evaluation methods are chosen. Most common are train/test (34.9% of the studies), leave-one-out (23.6%), and N-fold cross-validation (18.9%). Database-related terminology tends to be imprecise or ambiguous, especially regarding the term "case". Overall, both the use of the DDSM as data source for evaluation of mammography CAD systems, and the application of statistical evaluation methods were found highly diverse. Results reported from different studies are therefore hardly comparable. Drawbacks of the DDSM (e.g. varying quality of lesion annotations) may contribute to the reasons. But larger bias seems to be caused by authors' own decisions upon study design. RECOMMENDATIONS/CONCLUSION: For future evaluation studies, we derive a set of 13 recommendations concerning the construction and usage of a test database, as well as the application of statistical evaluation methods.

Read full abstract

Database Subsets Research Articles

Articles published on Database Subsets

Cold-formed steel framed shear wall test database

Ensemble classification combining ResNet and handcrafted features with three-steps training

Novel lncRNAs with diagnostic or prognostic value screened out from breast cancer via bioinformatics analyses.

Research on the Characteristic Model of Learners in Modern Distance Music Classroom Based on Big Data

The Mitochondrial DNA Landscape of Modern Mexico.

Feature selection on database optimization for Wi-Fi fingerprint indoor positioning

AntiMalarial Mode of Action (AMMA) Database: Data Selection, Verification and Chemical Space Analysis

Emergency Risk Communication: Lessons Learned from a Rapid Review of Recent Gray Literature on Ebola, Zika, and Yellow Fever

Needs assessment for next generation computer-aided mammography reference image databases and evaluation studies

Investigation of long-term field experiments on response of breeding lines to common scab in a potato breeding program

Database Assessment of CPT-Based Design Methods for Axial Capacity of Driven Piles in Siliceous Sands

Précis: from unstructured keywords as queries to structured databases as answers

The dimensions of cited reference enhanced database subsets

How big is a database versus how is a database big

MinSet: a general approach to derive maximally representative database subsets by using fragment dictionaries and its application to the SCOP database

Prosodic and accentual information for automatic speech recognition

The CHâ‹¯Ï€ interaction as an important factor in the crystal packing and in determining the structure of clathratesA comprehensive literature list for the CHâ‹¯Ï€ interaction is available on the following website: http://www.tim.hi-ho.ne.jp/dionisio

Selection of reagents for combinatorial synthesis using clique detection

Quantitation of cotton fibre-quality variations arising from boll and plant growth environments

LITTLE KEVIN: a program for the estimation of protein homology by analysing the amino acid compositions and sequences.

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Database Subsets Research Articles

Articles published on Database Subsets

Cold-formed steel framed shear wall test database

Ensemble classification combining ResNet and handcrafted features with three-steps training

Novel lncRNAs with diagnostic or prognostic value screened out from breast cancer via bioinformatics analyses.

Research on the Characteristic Model of Learners in Modern Distance Music Classroom Based on Big Data

The Mitochondrial DNA Landscape of Modern Mexico.

Feature selection on database optimization for Wi-Fi fingerprint indoor positioning

AntiMalarial Mode of Action (AMMA) Database: Data Selection, Verification and Chemical Space Analysis

Emergency Risk Communication: Lessons Learned from a Rapid Review of Recent Gray Literature on Ebola, Zika, and Yellow Fever

Needs assessment for next generation computer-aided mammography reference image databases and evaluation studies

Investigation of long-term field experiments on response of breeding lines to common scab in a potato breeding program

Database Assessment of CPT-Based Design Methods for Axial Capacity of Driven Piles in Siliceous Sands

Précis: from unstructured keywords as queries to structured databases as answers

The dimensions of cited reference enhanced database subsets

How big is a database versus how is a database big

MinSet: a general approach to derive maximally representative database subsets by using fragment dictionaries and its application to the SCOP database

Prosodic and accentual information for automatic speech recognition

The CHâ‹¯Ï€ interaction as an important factor in the crystal packing and in determining the structure of clathratesA comprehensive literature list for the CHâ‹¯Ï€ interaction is available on the following website: http://www.tim.hi-ho.ne.jp/dionisio

Selection of reagents for combinatorial synthesis using clique detection

Quantitation of cotton fibre-quality variations arising from boll and plant growth environments

LITTLE KEVIN: a program for the estimation of protein homology by analysing the amino acid compositions and sequences.