Abstract

The statistical testing technique is considered to compare the metrics values of machine learning models on a test set. Since the values of metrics depend not only on the models, but also on the data, it may turn out that different models are the best on different test sets. For this reason, the traditional approach to comparing the values of metrics on a test set is often not enough. Sometimes a statistical comparison of the results obtained on the basis of cross-validation is used, but in this case it is impossible to guarantee the independence of the obtained measurements, which does not allow the use of the Student's t-test. There are criteria that do not require independent measurements, but they have less power. For additive metrics, a technique is proposed in this paper, when a test sample is divided into N parts, on each of which the values of the metrics are calculated. Since the value on each part is obtained as the sum of independent random variables, according to the central limit theorem, the obtained metrics values on each of the N parts are realizations of the normally distributed random variable. To estimate the required sample size, it is proposed to use normality tests and build quantile– quantile plots. You can then use a modification of the Student's t-test to conduct a statistical test comparing the mean values of the metrics. A simplified approach is also considered, in which confidence intervals are built for the base model. A model whose metric values do not fall into this interval works differently from the base model. This approach reduces the amount of computations needed, however, an experimental analysis of the binary cross-entropy metric for CTR (Click-Through Rate) prediction models showed that it is more rough than the first one.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.