Pre-processing Steps ‎ for Genome-wide High-‎density‎ NARAC Dataset Facilitates its ‎‎Haplotype Block Partitioning ‎

Fatma Ibrahim,Hesham Hamed,Ashraf Said &Lrm;,Mohamed &Lrm; Saad

doi:10.21608/jaet.2020.40032.1035

Abstract

The pre-processing ‎ ‎ phase‎ is a crucial step to prepare any data for deep considerable ‎ analysis. ‎Genome-wide data ‎is considered ‎ big data; dealing with such data is not an easy task and still poses ‎a significant challenge. The ‎genome-wide association study (GWAS) ‎ is based on enormous high-‎density data with high throughput. This paper has illustrated the main pre-processing ‎ steps on data ‎from North American Rheumatoid Arthritis Consortium ‎‎(NARAC) for preparing it for haplotype ‎block partitioning using different methods and with different platforms. This paper’s main ‎objective is to summarize the steps of pre-processing the raw genotyped dataset to prepare it for ‎haplotype block partitioning and further analyses. Besides, we present each practical step by clear ‎tables for better visualizing, elucidation, and workflow interpretation. Besides, we aimed to ‎overcome the missing data and normalize the output in a standardized format. Eventually, this will ‎improve the understanding of such data formats and build the foundation stone of critical genome-wide experiments and studies. Thus, this work could a guide for other researchers who use similar ‎data. The pre-processed data will be applied to imputation, BigLD block partitioning under R and ‎Haploview methods. Our sequence of ‎pre-processing steps includes preparing the characters to be ‎in a form that is suitable for imputation. The next step is ‎recording data in 0,1,2 format to be ‎proper for the BigLD. We were finally preparing data for Haploview to ‎provide clear haplotype ‎block partitioning, association analysis, and furthermore.‎

Full Text