Abstract
Abstract Background Despite the extensive research on data mining algorithms, there is still a lack of a standard protocol to evaluate the performance of the existing algorithms. Therefore, the study aims to provide a novel procedure that combines data mining algorithms and simplified preprocessing to establish reference intervals (RIs), with the performance of five algorithms assessed objectively as well. Methods The Test data set and the Reference data set are the two data sets derived from the population undergoing a physical examination. After the thyroid-related hormone including thyroid stimulating hormone (TSH), free triiodo-thyronine (FT3), total triiodo-thyronine (TT3), free thyroxine(FT4), and total thyroxine (TT4) were measured by an ADVIA Centaur XP chemiluminescence immunoassay analyzer, five data algorithms were used to calculated RIs. Hoffmann, Bhattacharya, Expectation Maximum (EM), kosmic, and refineR algorithms combined with two-step data preprocessing respectively were implemented in the Test data set to establish RIs for thyroid-related hormones. The first step is to conduct a random sampling strategy to balance the ratio of sex and age, and the second step is to identify the outliers of variables in each subgroup by the Tukey method. Algorithm-calculated RIs were compared with the standard RIs calculated by transformed parametric method from the Reference data set in which reference individuals were selected following strict inclusion and exclusion criteria. RIs partition were comprehensively determined by the multiple linear regression and variance component analysis. Objective assessment of the methods is implemented by the bias ratio (BR) matrix, of which the BR threshold was set to 0.375. Results The levels of the all five thyroid-related hormones are significantly different in sex, with the male having lower TSH and higher FT3, FT4, TT3, and TT4 compared to the female. Further analysis indicates the establishment of sex-specific RIs for FT3 and FT4. Standard RIs derived from the Reference data set by transformed parametric method are 0.801–4.221 μIU/L for TSH, 2.58–3.82 pg/mL for FT3, 0.98–1.53 ng/dL for FT4, 0.80–1.38 ng/mL for TT3, 5.46–10.05 g/dL for TT4, respectively. There is a high consistency between TSH RIs established by the EM algorithm and the standard TSH RIs (BR = 0.063), although EM algorithms seems to perform poor on other hormones with the BR higher than 0.375. RIs calculated by Hoffmann, Bhattacharya, and refineR methods for free and total triiodo-thyronine, free and total thyroxine respectively are close and matched the standard RIs. Conclusion An effective approach for objectively evaluating the performance of the algorithm based on the BR matrix is established. EM algorithm combined with simplified preprocessing can handle data with significant skewness, but its performance is limited in other scenarios. The other four algorithms perform well for data with Gaussian or near-Gaussian distribution. Using the appropriate algorithm based on the data distribution characteristics is recommended.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.