Htsint: a Python library for sequencing pipelines that combines data through gene set generation.

Adam J Richards,Anthony Herrel,Camille Bonneaud

doi:10.1186/s12859-015-0729-3

Abstract

BackgroundSequencing technologies provide a wealth of details in terms of genes, expression, splice variants, polymorphisms, and other features. A standard for sequencing analysis pipelines is to put genomic or transcriptomic features into a context of known functional information, but the relationships between ontology terms are often ignored. For RNA-Seq, considering genes and their genetic variants at the group level enables a convenient way to both integrate annotation data and detect small coordinated changes between experimental conditions, a known caveat of gene level analyses.ResultsWe introduce the high throughput data integration tool, htsint, as an extension to the commonly used gene set enrichment frameworks. The central aim of htsint is to compile annotation information from one or more taxa in order to calculate functional distances among all genes in a specified gene space. Spectral clustering is then used to partition the genes, thereby generating functional modules. The gene space can range from a targeted list of genes, like a specific pathway, all the way to an ensemble of genomes. Given a collection of gene sets and a count matrix of transcriptomic features (e.g. expression, polymorphisms), the gene sets produced by htsint can be tested for ‘enrichment’ or conditional differences using one of a number of commonly available packages.ConclusionThe database and bundled tools to generate functional modules were designed with sequencing pipelines in mind, but the toolkit nature of htsint allows it to also be used in other areas of genomics. The software is freely available as a Python library through GitHub at https://github.com/ajrichards/htsint.

Highlights

Sequencing technologies provide a wealth of details in terms of genes, expression, splice variants, polymorphisms, and other features
Subramanian and colleagues introduced the method of Gene Set Enrichment Analysis (GSEA), which both emulates the modular nature of biological systems and provides a generalizable framework to integrate multiple sources of data into transcriptomic analysis pipelines [3]
The gene sets derived from this process provide a faithful description of the annotated portion of the transcriptome, but there will remain a percentage of the genes that are not included in significance testing

Summary

Results

We introduce the high throughput data integration tool, htsint, as an extension to the commonly used gene set enrichment frameworks. The central aim of htsint is to compile annotation information from one or more taxa in order to calculate functional distances among all genes in a specified gene space. Spectral clustering is used to partition the genes, thereby generating functional modules. The gene space can range from a targeted list of genes, like a specific pathway, all the way to an ensemble of genomes. Given a collection of gene sets and a count matrix of transcriptomic features (e.g. expression, polymorphisms), the gene sets produced by htsint can be tested for ‘enrichment’ or conditional differences using one of a number of commonly available packages

Conclusion

Background

Results and discussion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Sep 24, 2015
Citations: 30	License type: cc-by

R Discovery Prime

R Discovery Prime

Htsint: a Python library for sequencing pipelines that combines data through gene set generation.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

An Independent Filter for Gene Set Testing Based on Spectral Enrichment.
H Robert Frost ... Jason H Moore
IEEE/ACM transactions on computational biology and bioinformatics | VOL. 12
H Robert Frost, et. al.H Robert Frost ... Jason H Moore
01 Sep 2015
IEEE/ACM transactions on computational biology and bioinformatics | VOL. 12

Benchmarking Algorithms for Gene Set Scoring of Single-cell ATAC-seq Data.
Xi Wang ... Xiaohui Wu
Genomics, proteomics & bioinformatics | VOL. 22
Xi Wang, et. al.Xi Wang ... Xiaohui Wu
09 Feb 2024
Genomics, proteomics & bioinformatics | VOL. 22

Isolation and Characterization of Progenitor-Like Cells from Human Renal Proximal Tubules
David Lindgren ... Martin E Johansson
The American Journal of Pathology | VOL. 178
David Lindgren, et. al.David Lindgren ... Martin E Johansson
28 Jan 2011
The American Journal of Pathology | VOL. 178

Gene Set Analysis Using Spatial Statistics
Angela L Riffo-Campos ... Guillermo Ayala
Mathematics | VOL. 9
Angela L Riffo-Campos, et. al.Angela L Riffo-Campos ... Guillermo Ayala
03 Mar 2021
Mathematics | VOL. 9

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Htsint: a Python library for sequencing pipelines that combines data through gene set generation.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics