Machine Learning for Chemical Reactivity: The Importance of Failed Experiments.

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

Assessing the outcomes of chemical reactions in a quantitative fashion has been a cornerstone across all synthetic disciplines. Classically approached through empirical optimization, data-driven modelling bears an enormous potential to streamline this process. However, such predictive models require significant quantities of high-quality data, the availability of which is limited: Main reasons for this include experimental errors and, importantly, human biases regarding experiment selection and result reporting. In a series of case studies, we investigate the impact of these biases for drawing general conclusions from chemical reaction data, revealing the utmost importance of "negative" examples. Eventually, case studies into data expansion approaches showcase directions to circumvent these limitations-and demonstrate perspectives towards a long-term data quality enhancement in chemistry.

Similar Papers
  • Book Chapter
  • Cite Count Icon 5
  • 10.1007/978-3-319-95831-6_8
The Biases of Thinking Fast and Thinking Slow
  • Jan 1, 2018
  • Dirk Streeb + 2 more

Visualization is a human-centric process, which is inevitably associated with potential biases in humans’ judgment and decision-making. While the discussions on humans’ biases have been heavily influenced by the work of Daniel Kahneman as summarized in his book “Thinking, Fast and Slow’, there have also been viewpoints in psychology in favor of heuristics, such as by Gigerenzer. In this chapter, we present a balanced discourse on the humans’ heuristics and biases as the two sides of the same coin. In particular, we examine these two aspects from a probabilistic perspective, and relate them to the notions of global and local sampling. We use three case studies in Kahneman’s book to illustrate the potential biases of human- and machine-centric decision processes. Our discourse leads to a concrete conclusion that visual analytics, where interactive visualization is integrated with statistics and algorithms, offers an effective and efficient means to overcome biases in data intelligence.

  • Research Article
  • Cite Count Icon 4
  • 10.1109/tits.2014.2377552
Model-Based Methodology for Validation of Traffic Flow Detectors by Minimizing Human Bias in Video Data Processing
  • Aug 1, 2015
  • IEEE Transactions on Intelligent Transportation Systems
  • Pushkin Kachroo + 4 more

This paper provides a model-based method for analysis and hypothesis testing for paired data where one source of data has to be validated against another source of data that contains subjective and dynamic errors. This study deals with human-observed flow counts collected from traffic videos of freeway cameras. The available videos are mainly used for the purpose of manual observation by transportation personnel in case of emergency. This amounts to a varying inconsistency of the quality of the videos, which presents an additional challenge when analyzing the data. Video processing cannot be performed due to the mentioned issues with regard to the video quality. The processing has to be manually performed by humans who unfortunately have an inherent bias. If the video data have to be used for validating flow detector sensors, then a technique that performs validation with subjective and dynamic erroneous data as a result of the human bias is needed. This paper presents a methodology to deal with this issue. It is based on statistical testing with heteroscedasticity, which is demonstrated through a case study using data from traffic flow detectors and traffic cameras installed on highways in the Southern Nevada Region. A model for the relationship between the video ratings and the distribution of the human errors is developed taking into consideration the human bias. A method for identification of faulty detectors is also demonstrated based on the developed technique.

  • Research Article
  • Cite Count Icon 78
  • 10.1167/jov.21.3.16
Five points to check when comparing visual perception in humans and machines.
  • Mar 16, 2021
  • Journal of vision
  • Christina M Funke + 5 more

With the rise of machines to human-level performance in complex recognition tasks, a growing amount of work is directed toward comparing information processing in humans and machines. These studies are an exciting chance to learn about one system by studying the other. Here, we propose ideas on how to design, conduct, and interpret experiments such that they adequately support the investigation of mechanisms when comparing human and machine perception. We demonstrate and apply these ideas through three case studies. The first case study shows how human bias can affect the interpretation of results and that several analytic tools can help to overcome this human reference point. In the second case study, we highlight the difference between necessary and sufficient mechanisms in visual reasoning tasks. Thereby, we show that contrary to previous suggestions, feedback mechanisms might not be necessary for the tasks in question. The third case study highlights the importance of aligning experimental conditions. We find that a previously observed difference in object recognition does not hold when adapting the experiment to make conditions more equitable between humans and machines. In presenting a checklist for comparative studies of visual reasoning in humans and machines, we hope to highlight how to overcome potential pitfalls in design and inference.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 54
  • 10.1016/j.isci.2020.101961
Pushing the limits of solubility prediction via quality-oriented data selection.
  • Dec 17, 2020
  • iScience
  • Murat Cihan Sorkun + 2 more

SummaryAccurate prediction of the solubility of chemical substances in solvents remains a challenge. The sparsity of high-quality solubility data is recognized as the biggest hurdle in the development of robust data-driven methods for practical use. Nonetheless, the effects of the quality and quantity of data on aqueous solubility predictions have not yet been scrutinized. In this study, the roles of the size and the quality of data sets on the performances of the solubility prediction models are unraveled, and the concepts of actual and observed performances are introduced. In an effort to curtail the gap between actual and observed performances, a quality-oriented data selection method, which evaluates the quality of data and extracts the most accurate part of it through statistical validation, is designed. Applying this method on the largest publicly available solubility database and using a consensus machine learning approach, a top-performing solubility prediction model is achieved.

  • Research Article
  • Cite Count Icon 9
  • 10.1021/acsengineeringau.2c00002
Discovering Circular Process Solutions through Automated Reaction Network Optimization
  • Apr 25, 2022
  • ACS Engineering Au
  • Jana M Weber + 2 more

The transition toward a circular and biobased chemical industry is needed to cut global CO2 emissions and limit the chemical industry's overall impact on the environment. However, the development of circular chemical reaction systems is challenging as it requires symbiotic sets of novel chemical reaction pathways and involves unconventional processing steps. We present a methodological pipeline for automated reaction network optimization. The tools can guide the development of circular processes on the reaction pathway level. Chemical big data combined with energetic assessment metrics and state-of-the-art decision-making has the potential to efficiently identify the most promising reaction systems. We mine large-scale chemical reaction data from Reaxys database and automate the screening of pathways based on chemical rules. We then approximate thermodynamic properties for exergy calculations of the prescreened pathways and formulate the optimization problem as linear programming and mixed-integer linear programming problem. The methodological workflow is illustrated in a case study on the conversion of β-pinene to citral. Our results show that the tools are well suited to model circular process interactions within different environment scenarios.

  • Conference Article
  • Cite Count Icon 65
  • 10.2118/184822-ms
Shale Analytics: Making Production and Operational Decisions Based on Facts: A Case Study in Marcellus Shale
  • Jan 24, 2017
  • S D Mohaghegh + 2 more

Managers, geologists, reservoir and completion engineers are faced with important challenges and questions when it comes to producing from and operating shale assets. Some of the important questions that need to be answered are: What should be the distance between wells (well spacing)? How many clusters need to be included in each stage? What is the optimum stage length? At what point we need to stop adding stages in our wells (what is the point of diminishing returns)? At what rate and at what pressure do we need to pump the fluid and the proppant? What is the best proppant concentration? Should our completion strategy be modified when the quality of the shale (reservoir characteristics) and the producing hydrocarbon (dry gas, vs. condensate rich, vs. oil) changes in different parts of the field? What is the impact of soak time (starting production right after the completion versus delaying it) on production? Shale Analytics is the collection of the state of the art data driven techniques including artificial intelligence, machine learning, and data mining that addresses the above questions based on facts (field measurements) rather than human biases. Shale Analytics is the fusion of domain expertise (years of geology, reservoir, and production engineering knowledge) with data driven analytics. Shale Analytics is the application of Big Data Analytics, Pattern Recognition, Machine Learning and Artificial Intelligence to any and all Shale related issues. Lessons learned from the application of Shale Analytics to more than 3,000 wells in Marcellus, Utica, Niobrara, and Eagle Ford is presented in this paper along with a detail case study in Marcellus Shale. The case study details the application of Shale Analytics to understand the impact of different reservoir and completion parameters on production, and the quality of predictions made by artificial intelligence technologies regarding the production of blind wells. Furthermore, generating type curves, performing "Look-Back" analysis and identifying best completion practices are presented in this paper. Using Shale Analytics for re-frac candidate selection and design was presented in a previous paper [1].

  • Conference Article
  • Cite Count Icon 12
  • 10.1109/iros.2015.7354228
Towards an imperfect robot for long-term companionship: case studies using cognitive biases
  • Sep 1, 2015
  • M Biswas + 1 more

The research presented in this paper aims to find out what affect cognitive biases play in a robot's interactive behaviour for the goal of developing human-robot long-term companionship. It is expected that by utilising cognitive biases in a robot's interactive behaviours, making the robot cognitively imperfect, will affect how people relate to the robot thereby changing the process of long-term companionship. Previous research carried out in this area based on human-like cognitive characteristics in robots to create and maintain long-term relationship between robots and humans have yet to focus on developing human-like cognitive biases and as such is new to this application in robotics. To start working with cognitive biases ‘misattribution’ and ‘empathic gap’ have been selected which have been shown to be very common biases in humans and as such play a role on human-human interactions and long-term relationships.

  • Research Article
  • Cite Count Icon 4
  • 10.1145/3611313
When Biased Humans Meet Debiased AI: A Case Study in College Major Recommendation
  • Sep 11, 2023
  • ACM Transactions on Interactive Intelligent Systems
  • Clarice Wang + 6 more

Currently, there is a surge of interest in fair Artificial Intelligence (AI) and Machine Learning (ML) research which aims to mitigate discriminatory bias in AI algorithms, e.g., along lines of gender, age, and race. While most research in this domain focuses on developing fair AI algorithms, in this work, we examine the challenges which arise when humans and fair AI interact. Our results show that due to an apparent conflict between human preferences and fairness, a fair AI algorithm on its own may be insufficient to achieve its intended results in the real world. Using college major recommendation as a case study, we build a fair AI recommender by employing gender debiasing machine learning techniques. Our offline evaluation showed that the debiased recommender makes fairer career recommendations without sacrificing its accuracy in prediction. Nevertheless, an online user study of more than 200 college students revealed that participants on average prefer the original biased system over the debiased system. Specifically, we found that perceived gender disparity is a determining factor for the acceptance of a recommendation. In other words, we cannot fully address the gender bias issue in AI recommendations without addressing the gender bias in humans. We conducted a follow-up survey to gain additional insights into the effectiveness of various design options that can help participants to overcome their own biases. Our results suggest that making fair AI explainable is crucial for increasing its adoption in the real world.

  • Conference Article
  • 10.1117/12.2641107
Modeling and optimization of chemical reaction based on XGBoost-PSO
  • May 17, 2022
  • Zhendong Li + 3 more

This paper uses machine learning algorithm and intelligent optimization algorithm for chemical data modeling, build based on XGBoost and PSO of chemical reaction prediction and optimization model, for chemical reaction data, especially in organic reaction is low experimental efficiency, low prediction accuracy, we innovative use XGBoost machine learning model, for small batch chemical reaction data regression modeling, mining data high dimension of small sample characteristics. On the basis of the constructed regression model, the particle swarm optimization algorithm is used to optimize the reaction conditions to find the balance point in the chemical reaction, which overcomes the problems of low product rate and difficult raw material ratio in the chemical reaction.Based on this, we designed a set of universality algorithm for chemical reaction optimization, conducted data modeling through XGBoost, and quickly found the optimal reaction conditions by PSO, and applied them to ethanol-coupled C-4 olefin reaction preparation. Through experimental analysis, the MAE, MSE and R2 scores of our XGBoost model in regression analysis are 29.82, 4.01 and 0.93, all better than other machine learning models, which has certain statistical significance. Secondly, in the comparative literature and experiments, the optimal solution obtained by PSO search conforms to the principle and reality of chemical preparation, which has certain industrial value. The modeling algorithm can be further extended to the fields of biopharmaceutical and machine molecular preparation, to provide the basis for decision-making for researchers and find new experimental ideas and methods. The algorithm has strong general adaptation and popularization significance.

  • Research Article
  • Cite Count Icon 13
  • 10.1177/0256090920050301
Marketing is Marketing—Everywhere!
  • Jul 1, 2005
  • Vikalpa: The Journal for Decision Makers
  • Michael J Baker

The theme of this paper is that in seeking to develop strategies for the future, we should not neglect or overlook hard-won lessons from the past. Learning through direct experience is almost invariably a process of experimentation or trial and error. It is uncertain, time-consuming, inefficient, and often risky. Accordingly, if we encounter a problem new to ourselves, our first reaction should be: “Has anyone encountered this problem before?” If so, then “What did they do, with what results?” Answers to these questions are to be found in the so-called secondary sources that record the knowledge gained by previous generations. Knowledge is distilled experience which has accumulated over time. It represents our current understanding of how the world works and, because it has been recorded, it is usually easily available and often free. Common sense dictates that we should start any problem-solving activity by establishing what we know already. To support this argument, this article reviews the processes of knowledge creation and ‘cumulativity’. Unless and until we have confirmed what is already known about a subject, any effort to solve a new problem can only be a hit-or-miss affair — a case of managerial myopia. Therefore, while addressing an important question such as the role of marketing in emerging economies, we should first define what we mean by ‘emerging economies’ and ‘marketing.’ Marketing is a synthetic discipline that integrates findings from other disciplines like economics, psychology, and sociology into a holistic explanation of commercial exchange behaviour. As for emerging economies, indeed, all the advanced economies were emerging economies once, and it is quite evident that as the Industrial Revolution that started in Great Britain in the 18th century spread through Europe and North America, so each newly indutrialized country, in turn, achieved take-off more quickly by learning from the experience of its predecessors. In conclusion, this paper cites three examples of robust ideas that have stood the test of time and offers important insights into marketing today: Ricardo's ‘Theory of Comparative Advantage’ which argues that countries should specialize in doing what they do best and exchange their surpluses with other countries Darwin‘s theory of evolution and its marketing derivative — the product life cycle Copeland's ‘classification of goods’ that first identified the importance of defining goods and services in terms of needs and benefits. The message is that our knowledge of marketing is universal. Marketing is marketing—everywhere.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 7
  • 10.3389/fnbeh.2021.812939
Identification, Analysis and Characterization of Base Units of Bird Vocal Communication: The White Spectacled Bulbul (Pycnonotus xanthopygos) as a Case Study.
  • Feb 14, 2022
  • Frontiers in Behavioral Neuroscience
  • Aya Marck + 3 more

Animal vocal communication is a broad and multi-disciplinary field of research. Studying various aspects of communication can provide key elements for understanding animal behavior, evolution, and cognition. Given the large amount of acoustic data accumulated from automated recorders, for which manual annotation and analysis is impractical, there is a growing need to develop algorithms and automatic methods for analyzing and identifying animal sounds. In this study we developed an automatic detection and analysis system based on audio signal processing algorithms and deep learning that is capable of processing and analyzing large volumes of data without human bias. We selected the White Spectacled Bulbul (Pycnonotus xanthopygos) as our bird model because it has a complex vocal communication system with a large repertoire which is used by both sexes, year-round. It is a common, widespread passerine in Israel, which is relatively easy to locate and record in a broad range of habitats. Like many passerines, the Bulbul’s vocal communication consists of two primary hierarchies of utterances, syllables and words. To extract each of these units’ characteristics, the fundamental frequency contour was modeled using a low degree Legendre polynomial, enabling it to capture the different patterns of variation from different vocalizations, so that each pattern could be effectively expressed using very few coefficients. In addition, a mel-spectrogram was computed for each unit, and several features were extracted both in the time-domain (e.g., zero-crossing rate and energy) and frequency-domain (e.g., spectral centroid and spectral flatness). We applied both linear and non-linear dimensionality reduction algorithms on feature vectors and validated the findings that were obtained manually, namely by listening and examining the spectrograms visually. Using these algorithms, we show that the Bulbul has a complex vocabulary of more than 30 words, that there are multiple syllables that are combined in different words, and that a particular syllable can appear in several words. Using our system, researchers will be able to analyze hundreds of hours of audio recordings, to obtain objective evaluation of repertoires, and to identify different vocal units and distinguish between them, thus gaining a broad perspective on bird vocal communication.

  • Research Article
  • Cite Count Icon 20
  • 10.1108/ijlss-03-2021-0045
A relationship between bias, lean tools, and waste
  • Dec 16, 2021
  • International Journal of Lean Six Sigma
  • Mahesh Babu Purushothaman + 2 more

PurposeThis study aims to highlight the system-wide potential relationships between forms of human bias, selected Lean tools and types of waste in a manufacturing process.Design/methodology/approachA longitudinal single-site ethnographic case study using digital processing to make a material receiving process Lean was adopted. An inherent knowledge process with internal stakeholders in a stimulated situation alongside process requirements was performed to achieve quality data collection. The results of the narrative analysis and process observation, combined with a literature review identified widely used Lean tools, wastes and biases that produced a model for the relationships.FindingsThe study established the relationships between bias, Lean tools and wastes which enabled 97.6% error reduction, improved on-time accounting and eliminated three working hours per day. These savings resulted in seven employees being redeployed to new areas with delivery time for products reduced by seven days.Research limitations/implicationsThe single site case study with a supporting literature survey underpinning the model would benefit from testing the model in application to different industries and locations.Practical implicationsApplication of the model can identify potential relationships between a group of human biases, 25 Lean tools and 10 types of wastes in Lean manufacturing processes that support decision makers and line managers in productivity improvement. The model can be used to identify potential relationships between forms of human biases, Lean tools and types of wastes in Lean manufacturing processes and take suitable remedial actions. The influence of biases and the model could be used as a basis to counter implementation barriers and reduce system-wide wastes.Originality/valueTo the best of the authors’ knowledge, this is the first study that connects the cognitive perspectives of Lean business processes with waste production and human biases. As part of the process, a relationship model is derived.

  • Conference Article
  • 10.1109/iri.2017.90
Estimating the Prevalence of Religious Content in Intelligent Design Social Media
  • Aug 1, 2017
  • George D Montanez

Can machine learning prove useful in deciding sociological questions that are difficult for humans to judge impartially? We propose that it can, and even simple methods can be useful for evaluating evidence with reduced influence from human bias. Our case study is intelligent design (ID) social media, particularly the detection of religious content therein. Being a polarizing topic, critics of intelligent design claim that all intelligent design output consists of religious content, whereas defenders argue that ID is primarily motivated by scientific, not religious, concerns. To help determine where the truth lies, we use classifiers trained on the topically categorized 20 newsgroups dataset, applying the trained learners to automatically classify ID blog documents. As a control, we perform the same analysis on documents drawn from prominent mainstream evolutionary science blogs. Our classification results demonstrate a significant portion of religious and political content in the intelligent design dataset as judged by a non-human classifier, and a similarity in the proportion of documents assigned to religious and political categories in the evolutionary science blog dataset, likely indicating a dependence of discussion topics within the two communities.

  • Research Article
  • Cite Count Icon 4
  • 10.54660/.ijmrge.2022.3.4.675-689
Applying Predictive Analytics in Project Planning to Improve Task Estimation, Resource Allocation, and Delivery Accuracy
  • Jan 1, 2022
  • International Journal of Multidisciplinary Research and Growth Evaluation
  • Ioluwatobi Akinboboye + 7 more

In complex, large-scale, and remote project environments, accurate task estimation, efficient resource allocation, and precise delivery timelines are critical yet often compromised due to dynamic variables and human biases. This study explores the application of predictive analytics in project planning to enhance the accuracy and reliability of these essential functions. By leveraging historical project data, machine learning models, and statistical forecasting techniques, predictive analytics enables project managers to anticipate potential delays, resource constraints, and scope deviations before they occur. This proactive approach not only refines task duration estimates but also ensures that resources are optimally aligned with project requirements, enhancing both productivity and stakeholder satisfaction. The research highlights key predictive models such as linear regression, decision trees, and time series analysis (ARIMA, exponential smoothing) that support project planning decisions. These models are trained on multidimensional datasets comprising task histories, resource performance metrics, risk profiles, and external project conditions, offering real-time, data-backed insights. The integration of predictive analytics tools with project management platforms (e.g., Microsoft Project, Primavera, Jira) allows seamless scenario modeling and adjustment of plans based on forecasted outcomes. Case studies from enterprise software deployments and infrastructure development projects illustrate how organizations achieved up to 40% improvement in delivery accuracy and a 30% reduction in project overruns by implementing predictive analytics in the planning phase. The study also emphasizes the strategic role of scope clarity achieved through pattern recognition and anomaly detection in historical data, enabling early identification of ambiguous or risky work packages. This paper contributes to the evolving field of data-driven project management by proposing a framework for embedding predictive analytics into traditional and agile project methodologies. It outlines best practices for data collection, model selection, and organizational adoption, particularly for geographically dispersed teams. The findings underscore that predictive analytics is not merely a reactive tool but a transformative enabler of foresight, precision, and planning agility.

  • Research Article
  • Cite Count Icon 318
  • 10.1016/j.addma.2018.09.034
A multi-scale convolutional neural network for autonomous anomaly detection and classification in a laser powder bed fusion additive manufacturing process
  • Oct 6, 2018
  • Additive Manufacturing
  • Luke Scime + 1 more

A multi-scale convolutional neural network for autonomous anomaly detection and classification in a laser powder bed fusion additive manufacturing process

Save Icon
Up Arrow
Open/Close