FcGENE: a versatile tool for processing and transforming SNP datasets.

Nab Raj Roshyara,Markus Scholz

doi:10.1371/journal.pone.0097589

Abstract

BackgroundModern analysis of high-dimensional SNP data requires a number of biometrical and statistical methods such as pre-processing, analysis of population structure, association analysis and genotype imputation. Software used for these purposes often rely on specific and incompatible input and output data formats. Therefore extensive data management including multiple format conversions is necessary during analyses.MethodsIn order to support fast and efficient management and bio-statistical quality control of high-dimensional SNP data, we developed the publically available software fcGENE using C++ object-oriented programming language. This software simplifies and automates the use of different existing analysis packages, especially during the workflow of genotype imputations and corresponding analyses.ResultsfcGENE transforms SNP data and imputation results into different formats required for a large variety of analysis packages such as PLINK, SNPTEST, HAPLOVIEW, EIGENSOFT, GenABEL and tools used for genotype imputation such as MaCH, IMPUTE, BEAGLE and others. Data Management tasks like merging, splitting, extracting SNP and pedigree information can be performed. fcGENE also supports a number of bio-statistical quality control processes and quality based filtering processes at SNP- and sample-wise level. The tool also generates templates of commands required to run specific software packages, especially those required for genotype imputation. We demonstrate the functionality of fcGENE by example workflows of SNP data analyses and provide a comprehensive manual of commands, options and applications.ConclusionsWe have developed a user-friendly open-source software fcGENE, which comprehensively supports SNP data management, quality control and analysis workflows. Download statistics and corresponding feedbacks indicate that software is highly recognised and extensively applied by the scientific community.

Highlights

Modern developments in micro-array techniques enable large scale genome-wide association (GWA) studies comprising thousands or millions of SNPs in thousands of individuals
Statistical methods for analysing GWA data were further developed in the last decade to handle several issues of GWA analysis such as principal component analysis (PCA), genotype imputation, haplotype-based analyses and different types of association models
Our aim was to construct fcGENE as a complementary tool to PLINK by developing options for transforming SNP data into the formats required by different tools for GWA analysis

Summary

Introduction

Modern developments in micro-array techniques enable large scale genome-wide association (GWA) studies comprising thousands or millions of SNPs in thousands of individuals. A variety of software packages and environments have been developed to allow corresponding computations even for highdimensional data. These software packages usually require their own specific input and output formats of data. Modern analysis of high-dimensional SNP data requires a number of biometrical and statistical methods such as pre-processing, analysis of population structure, association analysis and genotype imputation. Software used for these purposes often rely on specific and incompatible input and output data formats. Extensive data management including multiple format conversions is necessary during analyses

Objectives

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PLoS ONE	Publication Date: Jul 22, 2014
Citations: 68	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

FcGENE: a versatile tool for processing and transforming SNP datasets.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLoS ONE

Lead the way for us

Similar Papers

SambaR: An R package for fast, easy and reproducible population-genetic analyses of biallelic SNP data sets.
Menno J De Jong ... A Rus Hoelzel
Molecular Ecology Resources | VOL. 21
Menno J De Jong, et. al.Menno J De Jong ... A Rus Hoelzel
20 Feb 2021
Molecular Ecology Resources | VOL. 21

Patterns of simple sequence repeats in cultivated blueberries (Vaccinium section Cyanococcus spp.) and their use in revealing genetic diversity and population structure
Yang Bian ... Robert Reid
Molecular Breeding | VOL. 34
Yang Bian, et. al.Yang Bian ... Robert Reid
25 Mar 2014
Molecular Breeding | VOL. 34

Design and Implementation of Asian Seismic Risk Assessment Data Management System Based on ArcSDE
Zhuowei Hu ... Changqing Liu
-
Zhuowei Hu, et. al.Zhuowei Hu ... Changqing Liu
01 Jan 2012
01 Jan 2012

Abstract 3602: Radiation pharmacogenomics: an integrative analysis approach to identify biomarkers using the human lymphoblastoid cell lines
Junmei Hou ... Krishna Kalari
Cancer Research | VOL. 70
Junmei Hou, et. al.Junmei Hou ... Krishna Kalari
15 Apr 2010
Cancer Research | VOL. 70

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

FcGENE: a versatile tool for processing and transforming SNP datasets.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLoS ONE