Abstract
BackgroundExact sample annotation in expression microarray datasets is essential for any type of pharmacogenomics research.ResultsCandidate markers were explored through the application of Hartigans’ dip test statistics to a publically available human whole genome microarray dataset. The marker performance was tested on 188 serial samples from 53 donors and of variable tissue origin from five public microarray datasets. A qualified transcript marker panel consisting of three probe sets for human leukocyte antigens HLA-DQA1 (2 probe sets) and HLA-DRB4 identified sample donor identifier inconsistencies in six of the 188 test samples. About 3% of the test samples require root-cause analysis due to unresolvable inaccuracies.ConclusionsThe transcript marker panel consisting of HLA-DQA1 and HLA-DRB4 represents a robust, tissue-independent composite marker to assist control donor annotation concordance at the transcript level. Allele-selectivity of HLA genes renders them good candidates for “fingerprinting” with donor specific expression pattern.
Highlights
Exact sample annotation in expression microarray datasets is essential for any type of pharmacogenomics research
To compute empirical p-values assessing the significance of an individual dip test statistic value, the dip test statistic was computed for a simulated dataset of 47 samples, the same sample number as the GSE7753 dataset, and 1 × 106 permutations per probe set
(the genes of the REDKX gender quality control (QC) marker [6]), and probe sets for genes located on autosomes, such as human leukocyte antigen (HLA)-genes HLA-DRB4 (209728_at), and HLA-DQA1 (203290_at)
Summary
Exact sample annotation in expression microarray datasets is essential for any type of pharmacogenomics research. Clinical molecular research and biomarker development rely on a high level of data quality. Ensuring data quality extends beyond the establishment of reproducible technical processes involved in measurement of variables. Obtaining accurate clinical metadata is of utmost importance for meaningful clinical research, as they are necessary for finding clinical disease-treatment or diseasebiomarkers relationships [1]. Drawing conclusions based on incorrect metadata can have detrimental consequences in short-term or long-term patient care. Typical sample annotation errors may be due to sample mix-ups, database entry errors, or subjectivity, e.g. grading of a biopsy. In pharmacogenomics analyses, unrecognized annotation errors or sample mix-ups impact any supervised statistical analysis, such as certain steps during biomarker discovery and qualification, and patient stratification.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.