Abstract

Deep learning models usually need extensive amounts of data, and these data have to be labeled, becoming a concern when dealing with real-world applications. It is known that labeling a dataset is a costly task in time, money, and resource-wise. Consequently, Semi-supervised Learning Model (SSLM) approach comes into the picture as it uses labeled and unlabeled datasets to train a model, practice that is useful in improving the overall performance of the models. The unlabeled datasets may include out-of-distribution data or inside-of-distribution data points, which may affect the model’s accuracy and future predictions. This investigation proposes a metric that can be useful to determine how much the unlabeled dataset can or cannot affect the accuracy of the SSLM. It also aims to demonstrate that the data quality metrics is a topic that needs further research, especially, when considering that the future of Deep learning models targets real-world applications such as healthcare. Concepts such as data quality metrics has been normally applied in structured data, however, it can also be applied in unstructured data (datasets used to train deep learning models). The method employed in this research takes the Mahalanobis distance as a base to generate a trend and then a metric. The approach follows what is demonstrated and proposed in [1], but uses the covariance matrices to compare the labeled and unlabeled datasets. The experimentation shows that the Mahalanobis distance generates results that are accordant to the proposed method, achieving a processing time lower by 99%. Using the Pierson’s correlation method the result was a hard negative correlation with the MixMatch results reported in [1].

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.