Twenty Questions with Random Error
Twenty Questions originated as a parlor game between two players. The game starts from a player named an oracle, who privately thinks of a secret. The other player, called the questioner, tries to guess the secret by querying the oracle with at most twenty questions having Yes/No answers. Early versions of the game can be traced to ancient Greece and ancient Rome. Motivated by the Hungarian version of this game, in the middle of the twentieth century, Rényi formulated the game as a mathematical problem of guessing an integer from a finite set, where the oracle could lie either randomly to each question or lie to a finite number of questions. The mathematical study of Twenty Questions is motivated by current applications in many domains: communications; ma-chine learning; and computer vision. The game with an oracle who is allowed a fixed number of lies was also studied by Ulam and Berlekamp and is known as the Rényi-Ulam-Berlekamp game. In contrast, the setting where the oracle lies randomly is less understood. In this monograph, we summarize recent advances in the information theoretical analysis of Twenty Questions with random error. In particular, focusing on the practical application of sensor network target localization, we study a query-dependent channel to model oracle’s noisy response behavior, such as providing a wrong answer or declining to answer a question. We concentrate on non-adaptive query procedures where all questions are designed prior to posing questions. We cover settings relevant to estimating a single target, a single moving target, and multiple targets over the unit cube of a finite dimension. We also consider adaptive querying for a single target to illustrate the benefit of adaptivity. In adaptive querying, each question is designed sequentially using responses to all previous questions. All of our theoretical results are illustrated using numerical examples. Finally, we discuss future research directions. These include geometry constraints for query sets, low-complexity query procedures, connections to group testing, and practical applications in machine learning and communications.
- Research Article
2
- 10.1109/access.2025.3534628
- Jan 1, 2025
- IEEE Access
Machine learning (ML) applications face many new, hardly predictable aspects in their production environments. Detecting new aspects in an ML production environment and understanding their impacts on the ML application is crucial if organizations are to ensure ML applications functionality. A monitoring entity is essential if one is to monitor ML applications in their production environments, to both continually minimize risks and improve ML application’s performance. But existing monitoring approaches are struggling to deal with specifics that arise from ML applications. We aim at deriving monitoring practices and providing a holistic view over required steps in successful ML applications monitoring. Since there has been little research on this topic, we followed a qualitative research approach, i.e., we conducted an interview study combined with a multivocal literature review. Thus, we provide a theoretical framework of an ML-enabled agent in its production environment, five characteristics of ML applications’ production environments and 17 monitoring practices – 14 practices arranged sequentially on a typical quality management cycle and three cross-sectional practices. To outline the ML specifics that arise in monitoring ML applications, we investigate the five ML production environment characteristics’ influences on the ML monitoring practices.
- Book Chapter
- 10.1007/978-3-031-54827-7_20
- Jan 1, 2024
While Machine Learning (ML) applications have shown impressive achievements in tasks such as computer vision, NLP, and control problems, such achievements were possible, first and foremost, in the best-case-scenario setting. Unfortunately, settings where ML applications fail unexpectedly, abound, and malicious ML application users or data contributors can trigger such failures. This problem became known as adversarial example robustness. While this field is in rapid development, some fundamental results have been uncovered, allowing some insight into how to make ML methods resilient to input and data poisoning. Such ML applications are termed adversarially robust. While the current generation of LLMs is not adversarially robust, results obtained in other branches of ML can provide insight into how to make them adversarially robust. Such insight would complement and augment ongoing empirical efforts in the same direction (red-teaming).
- Conference Article
3
- 10.1109/icidca56705.2023.10100252
- Mar 14, 2023
Machine learning in medical applications is one of the focus areas of the researchers these days. Machine Learning with the application of Artificial Intelligence is not only giving solutions to the complex problems but also revolutionised the medical field. The main motive of machine learning is to improve its learning process over time by taking all the relevant data and information in the form of different inputs and observations. This study reviews different medical disease prediction and detection techniques with the help of distinct deep learning & machine learning models. The problems related to medical diseases, like cancer related diseases, heart, lung, thyroid and kidney diseases are being discussed in this article. Detection and analysing of medical diseases is one of the prominent applications of machine and deep learning. Deep learning as a technology offers a huge set of different and innovative tools which are relevant to different issues faced in the field of medical image processing. This study will discuss about the applications of Machine Learning, and then discuss some of the advancements done in different diseases like breast cancer, heart disease, skin disease, kidney disease etc.
- Research Article
1
- 10.54254/2755-2721/51/20241165
- Mar 25, 2024
- Applied and Computational Engineering
With the rapid development of the Internet and the rise of e-commerce, commercial enterprises are faced with a large amount of data and a complex market environment. In this situation, machine learning, as a powerful tool, is widely used in the field of business analysis. In this dissertation, we take Amazon and eBay as examples to study the application of machine learning in the company's business analytics, focusing on its role in market prediction, customer behavior analysis and operation optimization. By analyzing the relevant cases, we find that machine learning technology plays an important role in helping companies make more accurate decisions and improve efficiency. Studying the application of Amazon machine learning in business analytics can promote in-depth research on the application of machine learning in business in academia, and promote the application and development of machine learning technology in other business scenarios. Overall, the application of machine learning in business analytics can help companies understand customer behavior, optimize operations, and improve sales results. However, there are still some challenges, such as data quality, algorithm selection and privacy protection. Therefore, further research and innovation are necessary to advance the development of machine learning applications in business analytics.
- Research Article
5
- 10.3390/info14010053
- Jan 16, 2023
- Information
Machine learning (ML) techniques discover knowledge from large amounts of data. Modeling in ML is becoming essential to software systems in practice. The accuracy and efficiency of ML models have been focused on ML research communities, while there is less attention on validating the qualities of ML models. Validating ML applications is a challenging and time-consuming process for developers since prediction accuracy heavily relies on generated models. ML applications are written by relatively more data-driven programming based on the black box of ML frameworks. All of the datasets and the ML application need to be individually investigated. Thus, the ML validation tasks take a lot of time and effort. To address this limitation, we present a novel quality validation technique that increases the reliability for ML models and applications, called MLVal. Our approach helps developers inspect the training data and the generated features for the ML model. A data validation technique is important and beneficial to software quality since the quality of the input data affects speed and accuracy for training and inference. Inspired by software debugging/validation for reproducing the potential reported bugs, MLVal takes as input an ML application and its training datasets to build the ML models, helping ML application developers easily reproduce and understand anomalies in the ML application. We have implemented an Eclipse plugin for MLVal that allows developers to validate the prediction behavior of their ML applications, the ML model, and the training data on the Eclipse IDE. In our evaluation, we used 23,500 documents in the bioengineering research domain. We assessed the ability of the MLVal validation technique to effectively help ML application developers: (1) investigate the connection between the produced features and the labels in the training model, and (2) detect errors early to secure the quality of models from better data. Our approach reduces the cost of engineering efforts to validate problems, improving data-centric workflows of the ML application development.
- Research Article
1
- 10.1145/3729394
- Jun 19, 2025
- Proceedings of the ACM on Software Engineering
Machine learning (ML) applications have become an integral part of our lives. ML applications extensively use floating-point computation and involve very large/small numbers; thus, maintaining the numerical stability of such complex computations remains an important challenge. Numerical bugs can lead to system crashes, incorrect output, and wasted computing resources. In this paper, we introduce a novel idea, namely soft assertions (SA) , to encode safety/error conditions for the places where numerical instability can occur. A soft assertion is an ML model automatically trained using the dataset obtained during unit testing of unstable functions. Given the values at the unstable function in an ML application, a soft assertion reports how to change these values in order to trigger the instability. We then use the output of soft assertions as signals to effectively mutate inputs to trigger numerical instability in ML applications. In the evaluation, we used the GRIST benchmark, a total of 79 programs, as well as 15 real-world ML applications from GitHub. We compared our tool with 5 state-of-the-art (SOTA) fuzzers. We found all the GRIST bugs and outperformed the baselines. We found 13 numerical bugs in real-world code, one of which had already been confirmed by the GitHub developers. While the baselines mostly found the bugs that report NaN and INF, our tool found numerical bugs with incorrect output. We showed one case where the Tumor Detection Model , trained on Brain MRI images, should have predicted ”tumor”, but instead, it incorrectly predicted ”no tumor” due to the numerical bugs. Our replication package is located at https://figshare.com/s/6528d21ccd28bea94c32.
- Conference Article
66
- 10.1109/issrew.2018.00024
- Oct 1, 2018
Machine Learning (ML) applications have emerged as the killer applications for next generation hardware and software platforms, and there is a lot of interest in software frameworks to build such applications. TensorFlow is a high-level dataflow framework for building ML applications and has become the most popular one in the recent past. ML applications are also being increasingly used in safety-critical systems such as self-driving cars and home robotics. Therefore, there is a compelling need to evaluate the resilience of ML applications built using frameworks such as TensorFlow. In this paper, we build a high-level fault injection framework for TensorFlow called TensorFI for evaluating the resilience of ML applications. TensorFI is flexible, easy to use, and portable. It also allows ML application programmers to explore the effects of different parameters and algorithms on error resilience.
- Front Matter
63
- 10.1002/aps3.11371
- Jun 1, 2020
- Applications in Plant Sciences
Plants meet machines: Prospects in machine learning for plant biology
- Supplementary Content
102
- 10.3390/ijerph18042121
- Feb 1, 2021
- International Journal of Environmental Research and Public Health
Objective: To provide a human–Artificial Intelligence (AI) interaction review for Machine Learning (ML) applications to inform how to best combine both human domain expertise and computational power of ML methods. The review focuses on the medical field, as the medical ML application literature highlights a special necessity of medical experts collaborating with ML approaches. Methods: A scoping literature review is performed on Scopus and Google Scholar using the terms “human in the loop”, “human in the loop machine learning”, and “interactive machine learning”. Peer-reviewed papers published from 2015 to 2020 are included in our review. Results: We design four questions to investigate and describe human–AI interaction in ML applications. These questions are “Why should humans be in the loop?”, “Where does human–AI interaction occur in the ML processes?”, “Who are the humans in the loop?”, and “How do humans interact with ML in Human-In-the-Loop ML (HILML)?”. To answer the first question, we describe three main reasons regarding the importance of human involvement in ML applications. To address the second question, human–AI interaction is investigated in three main algorithmic stages: 1. data producing and pre-processing; 2. ML modelling; and 3. ML evaluation and refinement. The importance of the expertise level of the humans in human–AI interaction is described to answer the third question. The number of human interactions in HILML is grouped into three categories to address the fourth question. We conclude the paper by offering a discussion on open opportunities for future research in HILML.
- Research Article
3
- 10.1002/cben.70012
- Jun 2, 2025
- ChemBioEng Reviews
This paper aims to review the machine learning (ML) applications in chemical engineering (ChemE) and provide perspectives for the future. First, the evolution of ML, data structures, and ML applications in ChemE were reviewed; then, the current state of the art in ML and its ChemE applications were summarized. Finally, a perspective for the future developments, including recently popularized tools like generative artificial intelligence (AI) and large language models (LLMs), as well as major challenges and limitations, was provided. Although the initial applications were mainly on fault detection, signal processing, and process modeling, the focus had been extended to other fields involving material development, property estimation, and performance analysis in later years with the use of more complex models and datasets. In future, new developments like LLMs will likely spread more; the other new applications like automated ML, physics‐informed ML, and transfer learning, as well as field‐specific databases, will also get more attention. ML applications in ChemE‐related fields, like new energy technologies, environmental issues, and new material discovery, are expected to grow further.
- Research Article
9
- 10.14778/3352063.3352110
- Aug 1, 2019
- Proceedings of the VLDB Endowment
Developing machine learning (ML) applications is similar to developing traditional software --- it is often an iterative process in which developers navigate within a rich space of requirements, design decisions, implementations, empirical quality , and performance . In traditional software development, software engineering is the field of study which provides principled guidelines for this iterative process. However, as of today, the counterpart of "software engineering for ML" is largely missing --- developers of ML applications are left with powerful tools (e.g., TensorFlow and PyTorch) but little guidance regarding the development lifecycle itself. In this paper, we view the management of ML development life-cycles from a data management perspective. We demonstrate two closely related systems, ease.ml/ci and ease.ml/meter, that provide some "principled guidelines" for ML application development: ci is a continuous integration engine for ML models and meter is a "profiler" for controlling overfitting of ML models. Both systems focus on managing the "statistical generalization power" of datasets used for assessing the quality of ML applications, namely, the validation set and the test set . By demonstrating these two systems we hope to spawn further discussions within our community on building this new type of data management systems for statistical generalization.
- Research Article
2
- 10.2979/esj.2022.a886946
- Dec 1, 2022
- e-Service Journal
ABSTRACT: Developing efficient processes for building machine learning (ML) applications is an emerging topic for research. One of the well-known frameworks for organizing, developing, and deploying predictive machine learning models is cross-industry standard for data mining (CRISP-DM). However, the framework does not provide any guidelines for detecting and mitigating different types of fairness-related biases in the development of ML applications. The study of these biases is a relatively recent stream of research. To address this significant theoretical and practical gap, we propose a new framework—Fair CRISP-DM, which groups and maps these biases corresponding to each phase of an ML application development. Through this study, we contribute to the literature on ML development and fairness. We present recommendations to ML researchers on including fairness as part of the ML evaluation process. Further, ML practitioners can use our framework to identify and mitigate fairness-related biases in each phase of an ML project development. Finally, we also discuss emerging technologies which can help developers to detect and mitigate biases in different stages of ML application development.
- Book Chapter
1
- 10.1007/978-3-030-96993-6_65
- Jan 1, 2022
Nowadays, the industry is actively introducing technologies based on machine learning: predictive analytics, computer vision, industrial robots, etc. In this article authors discuss the possible application of machine learning to improve the operation of nuclear power plant (NPP) power units: diagnostics of the state of equipment (both technological equipment of normal operation systems and equipment of safety systems); definition of irrelevant alarm; determination of the state of the reactor plant; application of machine learning in equipment control algorithms. The report also examines the existing difficulties in introducing machine learning into NPP operation: issues of stability of control systems based on machine learning; the issue of interpretability of solutions issued by systems based on machine learning; small data set size for training machine learning models.KeywordsNuclear power plantMachine learningIndustry 4.0
- Research Article
9
- 10.1080/21650373.2025.2462183
- Feb 1, 2025
- Journal of Sustainable Cement-Based Materials
High-Performance Fiber-Reinforced Cementitious Composite (HPFRCC) represents a family of advanced composite materials with remarkable mechanical properties and durability, but their design and characterization tasks involve unique challenges. Recently, advancements in machine learning techniques have offered new opportunities. This paper reviews the application of machine learning techniques in the design and characterization of HPFRCC. The application of machine learning to the design of HPFRCC is reviewed based on a prediction-optimization framework, and the steps for property prediction and material design considering fresh properties, cracks, and microstructures are elaborated. The latest development of knowledge-guided machine learning approach is discussed. The application of machine learning to the characterization of HPFRCC is reviewed, and the computer vision and deep learning techniques for characterizing HPFRCC are elaborated. The challenges and opportunities for the applications of machine learning methods are discussed, aiming to facilitate applications of machine learning techniques for HPFRCC.
- Research Article
2
- 10.1016/j.ijbiomac.2025.142374
- May 1, 2025
- International journal of biological macromolecules
Application of explainable machine learning in the production of pullulan by Aureobasidium pullulans CGMCCNO.7055.