Over the past decade, machine learning has gained significant traction and is now deployed across diverse domains, including information systems, finance, healthcare, cybersecurity, autonomous driving, and more. As machine learning finds applications in various sensitive scenarios, the demand for models that exhibit accuracy and robustness during the operational phase has grown exponentially. One crucial factor that profoundly shapes the quality of machine learning models revolves around the training data they rely upon and the input data encountered at the operational phase. Therefore, the development of data-aware algorithms is of paramount importance in achieving high-quality machine-learning models. This thesis contributes to this overarching objective by delving into the development of data-aware algorithms, emphasizing the importance of this awareness during both the training and operational phases of machine learning models. The research presented in this thesis focuses on two primary domains. The first domain is information retrieval, with a particular emphasis on enhancing both the efficiency of learning-to-rank learning algorithms and the effectiveness of the learned models in solving ranking tasks. The thesis includes three works in this domain: Marcuzzi et al. [2022] provides a novel algorithm to detect and remove consistent-outliers documents from the training data. In Marcuzzi et al. [2023], we designed a new learning algorithm that handles the problem of gradient incoherencies affecting LambdaRank-based algorithms. Finally, in Lucchese et al. [2023], we designed a new sampling function for the Selective Gradient Boosting algorithm to exploit the most useful low-ranked non-relevant document. The second domain is adversarial machine learning, which focuses on increasing the robustness of binary classifiers against adversarial inputs encountered at the operational phase. Furthermore, the research in this domain focuses on providing certifiable models to efficiently assess robustness against adversarial machine learning attacks. In this regard, in Calzavara et al. [2021], we designed a novel robust learning algorithm to train ensembles of decision trees robust to evasion attacks along with its polynomial robustness-certification algorithm designed to compute a robustness lower bound. Finally, in Calzavara et al. [2022], we provided a new evaluation metric named Resilience to better access the security of machine learning models. Awarded by : Università Ca' Foscari di Venezia, Venice, Italy on 19 April 2024. Supervised by : Claudio Lucchese. Available at : https://federicomarcuzzi.github.io/resources/thesis_phd.pdf.
Read full abstract