What do banks tell us about financial stability? Predicting systemic crises using text-based machine learning

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

This paper extends the literature studying the prediction of financial crises in two ways, namely by: (i) developing a new text-based indicator measuring banks' sentiment tailored to the context of financial stability, and (ii) applying machine learning (ML) techniques to predict systemic crises in the euro area as defined by the European Systemic Risk Board (ESRB). In-sample analysis indicates that banks' financial stability sentiment (BFSS) is a highly statistically significant predictor of systemic crises, with a negative one standard deviation shock in the BFSS indicator corresponding to increases in the probability of a systemic crisis of 7 and 3 percentage points one-quarter and four-quarters ahead, respectively, while controlling for the credit cycle. Out-of-sample results show that, while the BFSS tends to improve the predictive performance of baseline logistic regression models, ML models grounded in financial stability dictionaries deliver substantially higher predictive accuracy in forecasting systemic crises. By improving the accuracy and timeliness of systemic crisis prediction, this novel application can be useful to complement conventional approaches for calibrating macroprudential policy tools and enhance crisis prevention frameworks.

Similar Papers
  • Book Chapter
  • Cite Count Icon 3
  • 10.1007/978-3-319-32174-5_3
Form and Function of the ESRB: A Critical Analysis
  • Jan 1, 2016
  • Trude Myklebust

The establishment of the European Systemic Risk Board (ESRB) in 2010 must be seen against the backdrop of the preceding financial crisis. The crisis revealed the financial supervisory authorities’ failure, in the run-up to the crisis, to anticipate adverse macroprudential developments and prevent the accumulation of excessive risks within the financial system. This triggered a heightened awareness among regulatory and supervisory authorities regarding stability at a systemic level and macroprudential developments. The creation of the ESRB resonates with other initiatives in the aftermath of the crisis, among them the establishment of the Financial Stability Board (based with Bank for International Settlements in Basel) and, more recently, the conferral of macroprudential powers to the European Central Bank under the Single Supervisory Mechanism. The ESRB is part of the European System of Financial Supervision, the purpose of which is to ensure supervision of the European Union’s financial system. This article begins with a description of the institutional aspects of the ESRB, building on a study of its founding documents and regulation. Special attention is given to the ESRB’s mission and objectives, which are defined as contributing to the prevention or mitigation of systemic risks to financial stability within the European Union. An in-depth analysis of the concepts of systemic risk and financial stability is conducted, because these concepts are crucial for determining the reach of the ESRB’s authority and responsibility. In view of the magnitude of the tasks assigned to the ESRB, this chapter considers whether the ESRB in its existing form is well suited for the large and complex undertaking it is expected to perform. This question is addressed with a critical analysis of the ESRB’s applicable measures as well as its governance structure and decision-making processes. There is reason to believe that organisational improvements will be necessary to obtain the envisaged effectiveness. The chapter concludes with some remarks about the legitimacy and accountability of the ESRB.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 2
  • 10.3390/info14010053
Tool Support for Improving Software Quality in Machine Learning Programs
  • Jan 16, 2023
  • Information
  • Kwok Sun Cheng + 3 more

Machine learning (ML) techniques discover knowledge from large amounts of data. Modeling in ML is becoming essential to software systems in practice. The accuracy and efficiency of ML models have been focused on ML research communities, while there is less attention on validating the qualities of ML models. Validating ML applications is a challenging and time-consuming process for developers since prediction accuracy heavily relies on generated models. ML applications are written by relatively more data-driven programming based on the black box of ML frameworks. All of the datasets and the ML application need to be individually investigated. Thus, the ML validation tasks take a lot of time and effort. To address this limitation, we present a novel quality validation technique that increases the reliability for ML models and applications, called MLVal. Our approach helps developers inspect the training data and the generated features for the ML model. A data validation technique is important and beneficial to software quality since the quality of the input data affects speed and accuracy for training and inference. Inspired by software debugging/validation for reproducing the potential reported bugs, MLVal takes as input an ML application and its training datasets to build the ML models, helping ML application developers easily reproduce and understand anomalies in the ML application. We have implemented an Eclipse plugin for MLVal that allows developers to validate the prediction behavior of their ML applications, the ML model, and the training data on the Eclipse IDE. In our evaluation, we used 23,500 documents in the bioengineering research domain. We assessed the ability of the MLVal validation technique to effectively help ML application developers: (1) investigate the connection between the produced features and the labels in the training model, and (2) detect errors early to secure the quality of models from better data. Our approach reduces the cost of engineering efforts to validate problems, improving data-centric workflows of the ML application development.

  • Research Article
  • Cite Count Icon 22
  • 10.2139/ssrn.1676140
Can Soft Law Bodies be Effective? Soft Systemic Risk Oversight Bodies and the Special Case of the European Systemic Risk Board
  • Sep 13, 2010
  • SSRN Electronic Journal
  • Eilis Ferran + 1 more

The global response to the financial crisis has included the establishment of new, or significantly revamped, institutions specifically dedicated to the task of overseeing systemic risk. Internationally, the Financial Stability Forum has morphed into the Financial Stability Board (FSB) and has been given a broader mandate. In Europe, a new body, the European Systemic Risk Board (ESRB), has been assigned the role of monitoring and assessing systemic risks. National systemic risk oversight bodies are being set up as well. Strengthening and reinforcing are words that feature prominently in many policy statements relating to these institutional developments but many of these bodies, including the FSB and the ESRB, are designed to operate without legally-binding powers. This raises questions about how powerful they will actually prove to be. In this article we suggest that lack of formal power need not prevent systemic risk oversight bodies from acting in a credible and authoritative manner. We draw on existing experience of soft laws and institutions in international financial regulation to support this assessment. However, we also acknowledge that softer approaches have been shown to have weaknesses, particularly with respect to surveillance and enforcement. We suggest that the financial crisis has highlighted the limits of what can be achieved through informal methods and the importance of exploring harder alternatives. We consider what the ESRB in particular can learn from the wealth of accumulated experience at the international level with respect to both strengths and weaknesses of an informal approach. At the same time, we emphasise that there is much about the ESRB’s structure that is special because of its place within the EU constitutional and legal framework and in respect of which lessons drawn from international level experience do not pertain. We explore the implications of the ESRB’s special situation. Close connections to bodies with formal power may enhance the ESRB’s effectiveness. On the other hand, this capacity to have hard effect could also inhibit the ESRB. The net result could be the loss of some of the advantages, such as flexibility and willingness to experiment, that are associated with a softer approach. An edited version of this article, entitled ‘Can Soft Law Bodies be Effective? The Special Case of the European Systemic Risk Board’, which focuses mainly on the ESRB and European law, is forthcoming in the European Law Review (December 2010).

  • Research Article
  • Cite Count Icon 6
  • 10.1007/s43832-025-00207-z
A review on the applications of machine learning and deep learning to groundwater salinity modeling: present status, challenges, and future directions
  • Feb 27, 2025
  • Discover Water
  • Dilip Kumar Roy + 4 more

Coastal aquifers are vital for sustaining ecosystems and providing freshwater for agriculture and domestic use. However, rising groundwater salinity in these regions demands innovative approaches for accurate modeling and prediction. In recent decades, machine learning (ML) and deep learning (DL) techniques have been extensively used across various water resource management fields, often yielding promising outcomes. This review examines the growing usage of ML and DL techniques in modeling groundwater salinity in coastal aquifers. The analysis is based on 104 peer-reviewed journal articles including the review articles indexed in PubMed, ScienceDirect, Google Scholar, Scopus, and SpringerLink using keywords like "saltwater intrusion", "machine learning", “deep learning”, and "groundwater salinity". The review discusses recent advancements, challenges, and future directions for improving ML model accuracy and applicability. Studies reveal an increasing reliance on ML techniques to predict groundwater salinity and manage seawater intrusion across various coastal geological settings. Researchers have employed a range of ML models, from traditional methods like Artificial Neural Networks and Adaptive Neuro Fuzzy Inference Systems to advanced DL-based models like Convolutional Neural Networks and Long Short-Term Memory Networks, often using ensemble methods to enhance accuracy. Key findings include the top-tier output of ensemble and optimized models, including Random Forest and Grasshopper Optimization Algorithm-XGBoost. These studies highlight the importance of model selection, input variable ranking, and computational efficiency, demonstrating ML's global relevance in environmental management. The review highlights the prospects of ML, DL, and ensemble models to redefine groundwater management in coastal aquifers, offering more effective and sustainable solutions to the challenges posed by groundwater salinity and saltwater intrusion. Future advancements will focus on further integrating DL, explainable artificial intelligence, and ensemble approaches, along with other innovative techniques, to explore sparsely studied variables, model new or unique study areas, and apply ML methods for managing groundwater salinity.

  • Research Article
  • Cite Count Icon 1
  • 10.1227/neu.0000000000003531
Machine Learning-Based Rupture Risk Prediction for Intracranial Aneurysms: A Systematic Review and Meta-Analysis.
  • May 30, 2025
  • Neurosurgery
  • S Farzad Maroufi + 15 more

Aneurysm risk prediction remains an imprecise science that places patients at risk for either over or undertreatment. Machine learning (ML) models may improve clinical practice by adding precision to risk assessment. This study aims to comprehensively assess the current landscape of ML applications in predicting the risk of aneurysm rupture and compare the performance with the widely used PHASES score. A systematic review of PubMed, Scopus, and Web of Science was conducted. All studies using ML tools to predict the rupture risk of intracranial aneurysms were included. Meta-analysis was conducted with consideration to the ML algorithms and compared with the PHASES score. Thirty-six studies involving 22 462 patients were included in the final analysis. ML techniques, including 124 models using 25 algorithms, were employed. Among various ML models, while they had comparable diagnostic performance, deep learning exhibited a slightly better performance profile (sensitivity = 0.792, specificity = 0.788, and accuracy = 0.778 in external validation). Based on our analysis, ML, regardless of the algorithm, provides comparable sensitivity (0.743 vs 0.771, P = .60) and higher specificity (0.763 vs 0.507, P < .01) compared with the PHASES score. Consistently, pooling the area under the receiver operating characteristic curve (AUC) for 60 ML models and 5 PHASES score data, ML models exhibited higher AUC (0.84 vs 0.64, P < .01). Using hemodynamic parameters as input for models improved specificity ( P < .01) in the test sets without any significant changes in the sensitivity. The later improvement was not observed in the external validation sets. ML techniques have the potential to enhance the prediction of intracranial aneurysm rupture compared with traditional approaches, like the PHASES score. Incorporating hemodynamic parameters may further enhance the accuracy of ML models. Feature prospective studies are required to validate the utility of ML models for clinical integration.

  • Preprint Article
  • 10.5194/egusphere-egu23-11636
State-of-the-Art Review of Machine Learning Models in Civil Engineering: Based on DAMIE Classification Tree
  • May 15, 2023
  • Jaehyun Kim + 1 more

For recent years, Machine Learning (ML) models have been proven to be useful in solving problems of a wide variety of fields such as medical, economic, manufacturing, transportation, energy, education, etc. With increased interest in ML models and advances in sensor technologies, ML models are being widely applied even in civil engineering domain. ML model enables analysis of large amounts of data, automation, improved decision making and provides more accurate prediction. While several state-of-the-art reviews have been conducted in each sub-domain (e.g., geotechnical engineering, structural engineering) of civil engineering or its specific application problems (e.g., structural damage detection, water&amp;#160;quality evaluation), little effort has been devoted to comprehensive review on ML models applied in civil engineering and compare them across sub-domains. A systematic, but domain-specific literature review framework should be employed to effectively classify and compare the models. To that end, this study proposes a novel review approach based on the hierarchical classification tree &amp;#8220;D-A-M-I-E (Domain-Application problem-ML models-Input data-Example case)&amp;#8221;. &amp;#8220;D-A-M-I-E&amp;#8221; classification tree classifies the ML studies in civil engineering based on the (1) domain of the civil engineering, (2) application problem, (3) applied ML models and (4) data used in the problem. Moreover, data used for the ML models in each application examples are examined based on the specific characteristic of the domain and the application problem. For comprehensive review, five different domains (structural engineering, geotechnical engineering, water engineering, transportation engineering and energy engineering) are considered and the ML application problem is divided into five different problems (prediction, classification, detection, generation, optimization). Based on the &amp;#8220;D-A-M-I-E&amp;#8221; classification tree, about 300 ML studies in civil engineering are reviewed. For each domain, analysis and comparison on following questions has been conducted: (1) which problems are mainly solved based on ML models, (2) which ML models are mainly applied in each domain and problem, (3) how advanced the ML models are and (4) what kind of data are used and what processing of data is performed for application of ML models. This paper assessed the expansion and applicability of the proposed methodology to other areas (e.g., Earth system modeling, climate science). Furthermore, based on the identification of research gaps of ML models in each domain, this paper provides future direction of ML in civil engineering based on the approaches of dealing data (e.g., collection, handling, storage, and transmission) and hopes to help application of ML models in other fields.

  • Research Article
  • Cite Count Icon 2
  • 10.1016/j.acags.2024.100201
Current progress in subseasonal-to-decadal prediction based on machine learning
  • Oct 22, 2024
  • Applied Computing and Geosciences
  • Zixiong Shen + 7 more

Current progress in subseasonal-to-decadal prediction based on machine learning

  • Research Article
  • Cite Count Icon 12
  • 10.1145/3524106
A Survey on Requirements of Future Intelligent Networks: Solutions and Future Research Directions
  • Nov 21, 2022
  • ACM Computing Surveys
  • Arif Husen + 2 more

The context of this study examines the requirements of Future Intelligent Networks (FIN), solutions, and current research directions through a survey technique. The background of this study is hinged on the applications of Machine Learning (ML) in the networking field. Through careful analysis of literature and real-world reports, we noted that ML has significantly expedited decision-making processes, enhanced intelligent automation, and helped resolve complex problems economically in different fields of life. Various researchers have also envisioned future networks incorporating intelligent functions and operations with ML. Several efforts have been made to automate individual functions and operations in the networking domain; however, most of the existing ML models proposed in the literature lack several vital requirements. Hence, this study aims to present a comprehensive summary of the requirements of FIN and propose a taxonomy of different network functionalities that needs to be equipped with ML techniques. The core objectives of this study are to provide a taxonomy of requirements envisioned for end-to-end FIN, relevant ML techniques, and their analysis to find research gaps, open issues, and future research directions. The real benefit of ML applications in any domain can only be ensured if intelligent capabilities cover all of its components. We observed that future generations of networks are heterogeneous, multi-vendor, and multidimensional, and ML can provide optimal results only if intelligent capabilities are used on a holistic scale. Realizing intelligence on a holistic scale is only possible if the ML algorithms can solve heterogeneous problems in a multi-vendor and multidimensional environment. ML models must be reliable and efficient, support, and possess the capability to learn and share the knowledge across the network layers and administrative domains to solve issues. First, this study ascertains the requirements of the FIN and proposes their taxonomy through reviews on envisioned ideas by various researchers and articles gathered from reputed conferences and standard developing organizations using keyword queries. Second, we have reviewed existing studies on ML applications focusing on coverage, heterogeneity, distributed architecture, and cross-domain knowledge learning and sharing. Our study observed that in the past, ML applications were focused mainly on an individual/isolated level only, and aspects of global and deep holistic learning with cross-layer/cross-domain knowledge sharing with agile ML operations are not explored at large. We recommend that the issues mentioned previously be addressed with improved ML architecture and agile operations and propose an ML pipeline based architecture for FIN. The significant contribution of this study is the impetus for researchers to seek ML models suitable for a modular, distributed, multi-domain, and multi-layer environment and provide decision making on a global or holistic rather than an individual function level.

  • PDF Download Icon
  • Supplementary Content
  • Cite Count Icon 30
  • 10.3390/cancers13102469
Machine Learning and Radiomics Applications in Esophageal Cancers Using Non-Invasive Imaging Methods—A Critical Review of Literature
  • May 19, 2021
  • Cancers
  • Chen-Yi Xie + 5 more

Simple SummaryNon-invasive imaging modalities are commonly used in clinical practice. Recently, the application of machine learning (ML) techniques has provided a new scope for more detailed imaging analysis in esophageal cancer (EC) patients. Our review aims to explore the recent advances and future perspective of the ML technique in the disease management of EC patients. ML-based investigations can be used for diagnosis, treatment response evaluation, prognostication, and investigation of biological heterogeneity. The key results from the literature have demonstrated the potential of ML techniques, such as radiomic techniques and deep learning networks, to improve the decision-making process for EC patients in clinical practice. Recommendations have been made to improve study design and future applicability.Esophageal cancer (EC) is of public health significance as one of the leading causes of cancer death worldwide. Accurate staging, treatment planning and prognostication in EC patients are of vital importance. Recent advances in machine learning (ML) techniques demonstrate their potential to provide novel quantitative imaging markers in medical imaging. Radiomics approaches that could quantify medical images into high-dimensional data have been shown to improve the imaging-based classification system in characterizing the heterogeneity of primary tumors and lymph nodes in EC patients. In this review, we aim to provide a comprehensive summary of the evidence of the most recent developments in ML application in imaging pertinent to EC patient care. According to the published results, ML models evaluating treatment response and lymph node metastasis achieve reliable predictions, ranging from acceptable to outstanding in their validation groups. Patients stratified by ML models in different risk groups have a significant or borderline significant difference in survival outcomes. Prospective large multi-center studies are suggested to improve the generalizability of ML techniques with standardized imaging protocols and harmonization between different centers.

  • Research Article
  • Cite Count Icon 8
  • 10.1016/j.pnucene.2024.105535
Machine learning in critical heat flux studies in nuclear systems: A detailed review
  • Nov 28, 2024
  • Progress in Nuclear Energy
  • Siwei Qi + 6 more

Machine learning in critical heat flux studies in nuclear systems: A detailed review

  • Research Article
  • 10.1136/bmjopen-2024-093495
Machine learning methods, applications and economic analysis to predict heart failure hospitalisation risk: a scoping review.
  • Jun 1, 2025
  • BMJ open
  • João Abreu + 2 more

Machine Learning (ML) has been transformative in healthcare, enabling more precise diagnostics, personalised treatment regimens and enhanced patient care. In cardiology, ML plays a crucial role in risk prediction and patient stratification, particularly for heart failure (HF), a condition affecting over 64 million people globally and imposing an economic burden of approximately $108 billion annually. ML applications in HF include predictive analytics for risk assessment, identifying patient subgroups with varying prognoses and optimising treatment pathways. By accurately predicting the likelihood of hospitalisation and rehospitalisation, ML tools help tailor interventions, reduce hospital visits, improve patient outcomes and lower healthcare costs. To conduct a comprehensive review of existing ML models designed to predict hospitalisation risk in individuals with HF. A database search including PubMed, SCOPUS and Web of Science was conducted on 31 March 2024. Studies were selected based on inclusion criteria focusing on ML models predicting hospitalisation risks in adults with HF. The data from 27 studies meeting the criteria were extracted and analysed, with a focus on the predictive performance of the ML models and the presence of economic analysis. Most studies focused on predicting readmission rather than first-time hospitalisation. All included studies employed supervised ML algorithms, with ensemble-based methods generally yielding the highest predictive performance. For 30-day hospitalisation or readmission risk, Extreme Gradient Boosting (XGBoost) achieved the highest mean area under the curve (AUC) (0.69), followed by Naïve Bayes (0.68) and Deep Unified Networks (0.66). For 90-day risk, the best-performing models were Least Absolute Shrinkage and Selection Operator and Gradient Boosting, both with a mean AUC of 0.75, followed by Random Forest (0.67). When the prediction timeframe was unspecified, Categorical Boosting achieved the highest performance with a mean AUC of 0.88, followed by Generalised Linear Model Net and XGBoost (both 0.79).Electronic health records were the primary data source across studies; however, few models included patient-reported outcomes or socioeconomic variables.None of the studies conducted an economic evaluation to assess the cost-effectiveness of these models. ML holds substantial potential for improving HF care. However, further efforts are needed to enhance the generalisation of models, integrate diverse data sources and evaluate the cost-effectiveness of these technologies.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 14
  • 10.3389/fpubh.2022.1009164
Application of machine learning and natural language processing for predicting stroke-associated pneumonia.
  • Sep 29, 2022
  • Frontiers in Public Health
  • Hui-Chu Tsai + 2 more

BackgroundIdentifying patients at high risk of stroke-associated pneumonia (SAP) may permit targeting potential interventions to reduce its incidence. We aimed to explore the functionality of machine learning (ML) and natural language processing techniques on structured data and unstructured clinical text to predict SAP by comparing it to conventional risk scores.MethodsLinked data between a hospital stroke registry and a deidentified research-based database including electronic health records and administrative claims data was used. Natural language processing was applied to extract textual features from clinical notes. The random forest algorithm was used to build ML models. The predictive performance of ML models was compared with the A2DS2, ISAN, PNA, and ACDD4 scores using the area under the receiver operating characteristic curve (AUC).ResultsAmong 5,913 acute stroke patients hospitalized between Oct 2010 and Sep 2021, 450 (7.6%) developed SAP within the first 7 days after stroke onset. The ML model based on both textual features and structured variables had the highest AUC [0.840, 95% confidence interval (CI) 0.806–0.875], significantly higher than those of the ML model based on structured variables alone (0.828, 95% CI 0.793–0.863, P = 0.040), ACDD4 (0.807, 95% CI 0.766–0.849, P = 0.041), A2DS2 (0.803, 95% CI 0.762–0.845, P = 0.013), ISAN (0.795, 95% CI 0.752–0.837, P = 0.009), and PNA (0.778, 95% CI 0.735–0.822, P < 0.001). All models demonstrated adequate calibration except for the A2DS2 score.ConclusionsThe ML model based on both textural features and structured variables performed better than conventional risk scores in predicting SAP. The workflow used to generate ML prediction models can be disseminated for local adaptation by individual healthcare organizations.

  • Research Article
  • Cite Count Icon 8
  • 10.1007/s00607-025-01485-0
Sustainable energy management in the AI era: a comprehensive analysis of ML and DL approaches
  • May 22, 2025
  • Computing
  • Haseeb Javed + 3 more

This study comprehensively analyzes the application of innovative deep learning (DL) and machine learning (ML) techniques in smart energy management systems (EMSs), with an emphasis on load forecasting, demand response, and the development of smart energy sectors. The application of various ML and DL models were examined in over 200 studies from 2014 to 2024 in an electrical network's EMS to highlight the key benefits and advances made by each technology for the sustainable management systems in energy sector. The findings emphasize DL and ML models’ enhanced precision and predictive capabilities in load forecasting, their efficacy in enabling efficient demand response mechanisms, and their significance in supporting the development of smart energy sectors. Furthermore, recommendations are made based on the survey results to assist in incorporating these techniques into EMS frameworks, such as investment in data infrastructure, model training and validation, and collaboration between researchers, industry experts, and policymakers. The study also discusses the limitations identified in the literature, such as limited real-world implementations, challenges regarding quality and data availability, and the need for enhanced ML and DL model interpretability. Addressing these limitations can assist in increasing the application and efficacy of ML and DL techniques in EMSs, enabling a more efficient and sustainable energy landscape. Finally, this study facilitates researchers' exploration of ML and DL in energy management, highlighting relevant limitations, strengths, and alternative approaches associated with sustainable energy management. It also indicates potential future research directions for further investigation.

  • Research Article
  • 10.1145/3476415.3476435
Increasing trust in complex machine learning systems
  • Jun 1, 2021
  • ACM SIGIR Forum
  • Jaehun Kim

Machine learning (ML) has become a core technology for many real-world applications. Modern ML models are applied to unprecedentedly complex and difficult challenges, including very large and subjective problems. For instance, applications towards multimedia understanding have been advanced substantially. Here, it is already prevalent that cultural/artistic objects such as music and videos are analyzed and served to users according to their preference, enabled through ML techniques. One of the most recent breakthroughs in ML is Deep Learning (DL), which has been immensely adopted to tackle such complex problems. DL allows for higher learning capacity, making end-to-end learning possible, which reduces the need for substantial engineering effort, while achieving high effectiveness. At the same time, this also makes DL models more complex than conventional ML models. Reports in several domains indicate that such more complex ML models may have potentially critical hidden problems: various biases embedded in the training data can emerge in the prediction, extremely sensitive models can make unaccountable mistakes. Furthermore, the black-box nature of the DL models hinders the interpretation of the mechanisms behind them. Such unexpected drawbacks result in a significant impact on the trustworthiness of the systems in which the ML models are equipped as the core apparatus. In this thesis, a series of studies investigates aspects of trustworthiness for complex ML applications, namely the reliability and explainability. Specifically, we focus on music as the primary domain of interest, considering its complexity and subjectivity. Due to this nature of music, ML models for music are necessarily complex for achieving meaningful effectiveness. As such, the reliability and explainability of music ML models are crucial in the field. The first main chapter of the thesis investigates the transferability of the neural network in the Music Information Retrieval (MIR) context. Transfer learning, where the pre-trained ML models are used as off-the-shelf modules for the task at hand, has become one of the major ML practices. It is helpful since a substantial amount of the information is already encoded in the pre-trained models, which allows the model to achieve high effectiveness even when the amount of the dataset for the current task is scarce. However, this may not always be true if the "source" task which pre-trained the model shares little commonality with the "target" task at hand. An experiment including multiple "source" tasks and "target" tasks was conducted to examine the conditions which have a positive effect on the transferability. The result of the experiment suggests that the number of source tasks is a major factor of transferability. Simultaneously, it is less evident that there is a single source task that is universally effective on multiple target tasks. Overall, we conclude that considering multiple pre-trained models or pre-training a model employing heterogeneous source tasks can increase the chance for successful transfer learning. The second major work investigates the robustness of the DL models in the transfer learning context. The hypothesis is that the DL models can be susceptible to imperceptible noise on the input. This may drastically shift the analysis of similarity among inputs, which is undesirable for tasks such as information retrieval. Several DL models pre-trained in MIR tasks are examined for a set of plausible perturbations in a real-world setup. Based on a proposed sensitivity measure, the experimental results indicate that all the DL models were substantially vulnerable to perturbations, compared to a traditional feature encoder. They also suggest that the experimental framework can be used to test the pre-trained DL models for measuring robustness. In the final main chapter, the explainability of black-box ML models is discussed. In particular, the chapter focuses on the evaluation of the explanation derived from model-agnostic explanation methods. With black-box ML models having become common practice, model-agnostic explanation methods have been developed to explain a prediction. However, the evaluation of such explanations is still an open problem. The work introduces an evaluation framework that measures the quality of the explanations employing fidelity and complexity. Fidelity refers to the explained mechanism's coherence to the black-box model, while complexity is the length of the explanation. Throughout the thesis, we gave special attention to the experimental design, such that robust conclusions can be reached. Furthermore, we focused on delivering machine learning framework and evaluation frameworks. This is crucial, as we intend that the experimental design and results will be reusable in general ML practice. As it implies, we also aim our findings to be applicable beyond the music applications such as computer vision or natural language processing. Trustworthiness in ML is not a domain-specific problem. Thus, it is vital for both researchers and practitioners from diverse problem spaces to increase awareness of complex ML systems' trustworthiness. We believe the research reported in this thesis provides meaningful stepping stones towards the trustworthiness of ML.

  • Research Article
  • Cite Count Icon 3
  • 10.1200/jco.2022.40.16_suppl.e17570
Evaluating the use of machine learning use in ovarian cancer: A systematic review.
  • Jun 1, 2022
  • Journal of Clinical Oncology
  • Sabrina Piedimonte + 6 more

e17570 Background: Ovarian cancer(OC) is the leading cause of death from gynecologic malignancy. Current challenges include lack of diagnostic tools, predictive biomarkers, and identifying appropriate surgical candidates. Machine learning(ML) is an emerging field that can make accurate projections by making inferences on data and may play a crucial role in OC.The objective of the current study was to review the literature on application of ML in OC and report the most commonly used algorithms and their performance in comparison to existing prediction tools and traditional regression models. Methods: This is a systematic review of published literature from January 1985 to March 2021 on the use of ML in OC. An extensive search of electronic library databases was conducted. Four independent reviewers screened the articles initially by title then full text. Quality was assessed using the MINORS criteria. P-values were generated using the Pearson’s Chi-squared(x2) test to compare performance of ML models with traditional statistics. No p-values were reported if only one study was available. Results: Among 4,295 articles screened, 88 studies on ML in OC were included. The mean age of OC patients was 54.7 years(11-90) and the most common stages at diagnosis were:Stage III (39.9%) and IV (34%). Applications of ML were in clinical datasets(33%, n = 29), preoperative diagnostics(30.7%, n = 27), serum biomarkers (21.6%, n = 19), genomics (12.5%, n = 11), and prediction of cytoreductive outcomes (2.3%, n = 2). The most commonly applied algorithms were Support Vector Machine [SVM](28%, n = 33)and Neural Networks[NN] (25.28%). Over the past decades, the number of publications on ML in OC increased three-fold from 20(1994-2010) to 67 (2011–2021). Only 9 (10%) studies compared ML techniques with existing prediction tools, or traditional regression models. Among 29 clinical dataset studies, 4 compared ML with traditional logistic regression(LR). Two studies reported better performance with ML compared to LR but not significant(accuracy: 0.88 vs 0.84, p = 0.15), one study performed comparably(accuracy: 0.1 vs 0.1) while one study performed worse(accuracy: 0.1 vs 0.97). Only one preoperative diagnostic study compared ML techniques with LR. SVM classifiers outperformed LR in classifying ovarian masses as benign or malignant(sensitivity: 0.88 vs. 0.70). One serum biomarker study compared LR with ML algorithms; LR performed better using two biomarkers for predicting OC(accuracy: 0.97 vs. 0.94). Among five studies reporting overall survival outcomes, only one study compared survival ML techniques using NN with LR and showed that NN classifiers outperformed LR in predicting overall survival(AUC: 0.72 vs. 0.62). Conclusions: This is the first systematic review exploring the literature on ML algorithms in OC. Most ML models outperformed traditional models. However, larger datasets would be required to validate findings.

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.