Huge Amount Of Genomic Data Research Articles

Keyword search plays a critical role for researchers in bioinformatics to retrieve structured, semi-structured, and unstructured data. In addition, in order to fully exploit the rich repository of biological databases, data mining has drawn increasing attention of researchers. An interesting issue is to examine the possible relationship between database keyword search (DB KWS) and in- depth database exploration (or data mining) in the context of bioinformatics, and in particular, the potential contribution of DB KWS for data mining. However, so far there is no known systematic investigation on this relationship. In this paper, we provide a preliminary discussion on how we can take advantage of DB KWS for in-depth exploration of biological databases, and describe a case study on the association between genetic variants and diseases. The case study is motivated from the fact that the advent of high throughput sequencing technologies have facilitated in generating a huge amount of genomic data. A wealth of genomic information in the form of publicly available databases is underutilized as a potential resource for uncovering functionally relevant markers underlying complex human traits. The discovery of genetic associations is an important factor in the understanding of human illness to derive disease pathways and a plethora of other information such as the disease-gene associations, the variants associated with the diseases etc. A database was curated of the genome wide association studies, and an algorithm inspired by DBXplorer was used to implement the keyword search over the database in JAVA. The case study further proposes ways to include the association rule mining as a data mining technique, which is useful for discovering interesting relationships hidden in large data sets, to further investigate the results of the keyword search when done with different yet sensible combinations of disease and genes. We believe that such an integrated study to explore the potential of how bioinformatics can take advantage of both techniques in a single bioinformatics application would be a very interesting issue of both theoretical and practical importance.

Read full abstract

The huge amount of genomic data now becoming available offers both opportunities and challenges for epidemiologists. In this “preview” of likely developments as the field of ecological genomics evolves and merges with epidemiology, we discuss how epidemiology can use new information about genetic sequences and gene expression to form predictions about epidemic features and outcomes and for understanding host resistance and pathogen evolution. DNA sequencing is now complete for some hosts and several pathogens. Microarrays make it possible to measure gene expression simultaneously for thousands of genes. These tools will contribute to plant disease epidemiology by providing information about which resistance or pathogenicity genes are present in individuals and populations, what genes other than those directly involved in resistance and virulence are important in epidemics, the role of the phenotypic status of hosts and pathogens, and the role of the status of the environmental metagenome. Conversely, models of group dynamics supplied by population biology and ecology may be used to interpret gene expression within individual organisms and in populations of organisms. Genomic tools have great potential for improving understanding of resistance gene evolution and the durability of resistance. For example, DNA sequence analysis can be used to evaluate whether an arms race model of co-evolution is supported. Finally, new genomic tools will make it possible to consider the landscape ecology of epidemics in terms of host resistance both as determined by genotype and as expressed in host phenotypes in response to the biotic and abiotic environment. Host phenotype mixtures can be modeled and evaluated, with epidemiological predictions based on phenotypic characteristics such as physiological age and status in terms of induced systemic resistance or systemic acquired resistance.

Read full abstract

Huge Amount Of Genomic Data Research Articles

Articles published on Huge Amount Of Genomic Data

Genome evolution in intracellular parasites: Microsporidia and Apicomplexa.

Application of Bioinformatics resources for mining of simple sequence repeats (SSRs) marker in plant genomes: An Overview

Actionable data for precision oncology: Framing trustworthy evidence for exploratory research and clinical diagnostics.

WBTC: a new approach for efficient storage of genomic data

Predatory protists.

A computational framework to analyze human genomes

A random walk-based method to identify driver genes by integrating the subcellular localization and variation frequency into bipartite graph

Smart Data for Genomic Information Systems: the SILE Method

GTRAC: fast retrieval from compressed collections of genomic variants.

SparkBWA: Speeding Up the Alignment of High-Throughput DNA Sequencing Data.

GDC 2: Compression of large collections of genomes.

RNASurface: fast and accurate detection of locally optimal potentially structured RNA segments

Exploring Database Keyword Search for Association Studies between Genetic Variants and Diseases

Modelling knowledge strategy for solving the DNA sequence annotation problem through CommonKADS methodology

Niche specific amino acid features within the core genes of the genus Shewanella.

Whole-genome and whole-exome sequencing in neurological diseases

Coping with genetic diversity: the contribution of pathogen and human genomics to modern vaccinology

Ecological Genomics and Epidemiology

The German cDNA network: cDNAs, functional genomics and proteomics.

A Keyword Recommendation System for GenBank

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Huge Amount Of Genomic Data Research Articles

Articles published on Huge Amount Of Genomic Data

Genome evolution in intracellular parasites: Microsporidia and Apicomplexa.

Application of Bioinformatics resources for mining of simple sequence repeats (SSRs) marker in plant genomes: An Overview

Actionable data for precision oncology: Framing trustworthy evidence for exploratory research and clinical diagnostics.

WBTC: a new approach for efficient storage of genomic data

Predatory protists.

A computational framework to analyze human genomes

A random walk-based method to identify driver genes by integrating the subcellular localization and variation frequency into bipartite graph

Smart Data for Genomic Information Systems: the SILE Method

GTRAC: fast retrieval from compressed collections of genomic variants.

SparkBWA: Speeding Up the Alignment of High-Throughput DNA Sequencing Data.

GDC 2: Compression of large collections of genomes.

RNASurface: fast and accurate detection of locally optimal potentially structured RNA segments

Exploring Database Keyword Search for Association Studies between Genetic Variants and Diseases

Modelling knowledge strategy for solving the DNA sequence annotation problem through CommonKADS methodology

Niche specific amino acid features within the core genes of the genus Shewanella.

Whole-genome and whole-exome sequencing in neurological diseases

Coping with genetic diversity: the contribution of pathogen and human genomics to modern vaccinology

Ecological Genomics and Epidemiology

The German cDNA network: cDNAs, functional genomics and proteomics.

A Keyword Recommendation System for GenBank