Abstract

New techniques and technologies, such as the use of large-scale computing, influence research approaches, methods, and scales and are rapidly changing the scientific landscape. Research projects in eScience ∗ thus start with many assumptions and many unknowns and are often complex. While the scientific process is sometimes viewed, at least in hindsight, as a linear progression from one good idea to the next, it is in fact fraught with false starts, wrong assumptions, and dead ends. The increasing reliance on computation adds to the scope of problems that occur. Researchers invest a significant amount of time and effort in their research. Funding agencies similarly make large investments to support such research, on the assumption that most of the research will be successful. When the research assumptions and hypotheses turn out to be false, causing results that are “negative” or “null”, the natural bias is to judge that the research project “failed.” The history of science, however, shows that negative results may be an opportunity to revolutionize a field of study. For example, Fleming noticed that his flu cultures were contaminated by mold, but that there was infection around that mold, leading to his discovery of Penicillin. Similarly, a project today may fail because of the misuse or failure of computational support. Such “failures” actually indicate that there is an opportunity for the cyberinfrastructure research community to improve computing resources and tools. The interaction of these modes of failure is multi-faceted. Negative results have been difficult to find in published papers in all scientific domains. We identify three reasons for this. First, negative results may not be identified as such but simply considered mistakes. Such cases may never be investigated further. Secondly, paper referees may demand a higher standard from such results, because they are more difficult to understand or challenge the conventional narrative. Third, researchers may self-select against publishing such results in light of the previous point. This paper contributes to the discussion about null or negative results in eScience. It also attempts to organize concepts about negative or null results in eScience in the form of a taxonomy. Falsifiability is the concept that a given statement can be refuted by a real-world measurement or observation 4. The empirical sciences are dominated by the construction of such statements and efforts to confirm or refute them. In eScience, such statements are rarely formally presented in a refutable manner. eScience projects typically merge goals from the physical science with computer science and engineering aspects. A failure in eScience may often be attributed to a computer engineering failure (software defects or unresolved performance shortcomings) or a collaboration misfit (the groups never came together). However, many important statements are never answered definitely, such as whether a given computational approach is effective for the physical science investigation. The formalization and confirmation/refutation of such statements have the potential to prevent efforts lost to engineering aspects. Post-mortem analysis of failed experiments provides “clues suggesting deeper lying forces,” as Galison notes in How Experiments End, “Any historical reconstruction that ignores what seems in retrospect to be erroneous will be an inadequate account” 5. This means that the study of errors is not only relevant to students of history, as these “forces” can guide future investigations, suggest fundamental problems in experimental approaches, or even challenge prevailing theories. For example, relational database systems have been a well-accepted solution for information structuring, storage, and retrieval. This model is now being challenged by other database concepts, largely motivated by the need to cope with increasing data volumes. While the boundary between failure and success is not sharp in transitioning from relational to noSQL databases, the transition demonstrates a need to adapt and improve. This is often the normal path in research in computer science and cyberinfrastructure in particular, which could learn a lot from the various “failures” in eScience projects. Similarly, “short-term” examples of such “failures” are abound, such as the limit of being able to be a part of at most 16 Unix groups in NFS. It is likely that someone architecting an eScience collaboration system will face this limitation rather quickly. Are negative results as valuable as positive results in general? As Ayer points out, “What justifies scientific procedure ... is the success of the predictions to which it gives rise” 6. Following this line of thought, negative results are subordinate to the positive results that validate useful predictions. A negative result invalidates a previously held prediction, challenging or demolishing a theory or model. It is, however, incomplete. A negative result is an opportunity to pick up the pieces and fix the theory. Negative results are thus an important reminder of the limitations of science at any given point in time. “We forget about unpredictability when it is our turn to predict,” Taleb says in The Black Swan 7, a book that attempts to analyze tumultuous events, including several scientific cases. Taleb makes the case that studying such cognitive upheavals is worthwhile in its own right. Professionals who act with the history of failed ideas and efforts in mind will be more resilient against similar changes in the future. Taleb describes the social and mental impact of experiencing (repeated) failure, indicating that without support, researchers can easily become demoralized and shy away from challenging, long-term problems. However, he notes that “Your finding nothing is very valuable... —hey, you know where not to look” 7. Venues such as the ERROR workshop are intended to encourage discussion of specific negative results. By co-locating with the eScience conference in Munich, the workshop attracted significant attention from this scientific community, with about 20 participants. The workshop accepted four papers out of six submitted after a peer review process. Each paper was reviewed by two to three members of the program committee, who evaluated the works based on originality, scientific rigor, significance, and presentation. The accepted papers were presented orally at the workshop, followed by a panel discussion on the topic “theory versus practice in eScience: gaps and gaping holes.” The first presentation, by Gomes et al. 9, considered problems regarding interoperability between scientific workflows. Specifically, they discussed the problem of reusing workflows previously developed and implemented using one particular scientific workflow management framework with another one. To solve this problem, the authors developed an “intermediate” workflow language, with the idea that this intermediate language would preserve the workflow's semantic information across frameworks. However, they observed a loss of information about workflow semantics during the translation from a first workflow language to the “neutral” language and from the “neutral” language to the second workflow language. This happens because there are no ideal or standardized semantics for workflow languages, which is the key negative result in this research. A solution proposed to this problem is through the adoption of workflow patterns to describe richer workflow semantics. The second presentation, by Groen and Portgies Zwart 10, provided a high level overview of the authors' experience in constructing a distributed supercomputing system: CosmoGrid. The authors discussed how ambitious ideas can often be stymied by site-local resource allocation decisions. One of the negative results is the conclusion that harnessing multiple large machines is not feasible and therefore one should focus on harnessing a larger number of smaller machines. Additionally, the authors pointed out that a task as simple as getting software installed is significantly difficult at major computational sites, exposing the often overlooked reality of working with large scale computational infrastructures. The third presentation, by Cebrian et al. 11, presented an experience in designing two separate cache stores—for private and shared data— for multicore system architectures. The premise of the work was to improve efficiency by excluding private and shared read-only cache contents from coherency management. From the experiments and analysis of results obtained with this approach, the authors concluded that systems are less efficient with this kind of design, which is a negative result. This is because the overhead of classification mechanisms and increased concentration of access to shared data cause a bandwidth bottleneck to a particular portion of cache, resulting in higher latencies. The fourth presentation, by Jackson et al. 12, discussed an experience of performing an experiment in the context of a larger body of work. The experiment was on latency measurement between nodes within a single cluster, as well as across different clusters. Some of the main takeaways from the experiments as described by the authors were the technical and administrative obstacles faced when the experiment involves dependencies on several independently managed computational systems across administrative boundaries. Despite these obstacles, the authors were able to produce a significant body of latency data. One finding from this dataset was that an exhaustive study of latencies among systems was not necessarily, by itself, a good predictor of actual application performance. The topic of discussion for the panel session was “theory versus practice in eScience: gaps and gaping holes.” This theme emerged from a recurring observation in the submitted papers, in which negative or null results are attributed to a mismatch between expectations, which are based on theory, and what is found in reality. The panelists were Daniel S. Katz, Simon Portegies Zwart, Kyle Chard, Juan M. Cebrián, and Gary Jackson. Each panelist spoke for 2–3 min, and then there was an open discussion between panelists and the audience. The rest of this subsection presents the highlights of the panelists talks and the discussion that followed. Jackson spoke about the importance of not losing research focus because of infrastructure complexities and problems—the proverbial “missing the forest for trees.” Chard said that there are no good definitions of eScience, although we provide one taken from the eScience conference series website in Section 2. Furthermore, he raised questions as to how the scientific process, which had been mostly unchanged for hundreds of years and has long review cycles, has recently been changing with online data publication and open access journals. Katz spoke of the phenomenon of failures among research projects and endeavors by quoting from the opening of Tolstoy's Anna Karenina, “Happy families are all alike; every unhappy family is unhappy in its own way.” He noted that successful research endeavors must have all their critical factors right to be successful and failing even one of them could jeopardize the complete project. This is popularly known as Anna Karenina Principle 13. Katz also emphasized the importance and value of scientific results in general and negative results in particular with the question/statement: “How do we decide if there is value in a result?” Portgies Zwart spoke about the importance of understanding the difference between core computer science and other sciences, as well as the scientists associated with each one. He argued that computer science is currently undergoing a crisis because it is hard to find interesting problems, because of competition with the industry. In particular, computer science is challenged by reproducibility. One solution, he suggested, is an establishment of a software museum to prevent loss of software. The open discussion that followed focused on diverse topics such as software preservation, publication and credit, training, and the definition of negative or null results. It began with participants expressing concerns about issues related to software in particular. Scenarios were discussed that introduce the “gaps between eScience theory and practice” connecting technologies, ideas, and people. One gap is that digital products in general and software in particular, including methods and knowledge (algorithms), can be lost over time, sometimes known as bit rot 14. Software hosting services such as GitHub can address this problem to a certain extent by preserving the files, but they still require much human effort to preserve the function delivered by the software as meaningful. A curation service for algorithms could be another solution, requiring additional effort. Can the software and algorithm hosting services be linked as concepts and implementation? Preservation and publication of negative results are a challenge. While there are no technical barriers, from a publishing culture point of view, there are few or no incentives for publishing negative results. In the presence of such incentives, people would develop the culture about explaining not only what they did but also why they did not do so in some other way. And, if the negative results are actually published, it is likely that the same approach will not be taken by other researchers and groups. Another identified gap is the lack of a comprehensive understanding of negative results because of the lack of a conceptual framework, for example, a taxonomy. The discussion also raised the gap introduced by a lack of a credit model for discovering, identifying, and reporting negative results. For example, can negative results and/or methods to obtain them be patented? For instance, who receives credit if a succession of graduate students working on a problem arrive at a negative result followed by positive result? Or what happens to the positive results obtained before further investigation leads to their negation and nullification? Another gap arises from the lack of proper eScience training of domain scientists—can domain scientists be trained to become eScientists? This also applies to principal investigators, many of whom were trained in an era when science was carried out differently than it is carried out today. Training imparting the knowledge of modern computational methods and capabilities could play a key role in filling such a gap. The difference between incomplete (such as obtained from samples of insufficient size) and negative results can also be unclear. This can often result in negative results that are subject to interpretation. For instance, in an MD simulation 14, it cannot be shown if sampling was sufficient. In the same vein, should incremental competitive results be considered negative? In general, there is no standard on how many simulated timesteps are needed to obtain the correct answer, although sometimes, one can validate against lab experiments. Another topic raised during the discussion compared research in academia versus science in the commercial sector. One prominent sentiment expressed in the discussion was that, in some areas, research done in the commercial domain is “ahead” of research in academia, particularly, where industry has larger-scale problems and data than academia. One possible reason for this could be that there are more negative and null results in academia compared to industry. However, academic research can be transferred to the commercial domain and vice versa. For instance, the patent system is in place to enable commercial contribution to public research. One question that arises here is should there be a distinction between commercial and academic research? How will this distinction manifest itself? It was noted that there is an asymmetry between positive and negative results: With negative results, it is more likely that some error was made. It may be harder to truly prove a negative result. For instance an “existence proof” is sufficient for a positive result while a “non-possible proof” is needed for negative results. The workshop led to concrete outcomes before, during, and as a follow-up of its realization. After the announcement of the workshop, the Mozilla Science Foundation hosted a guest post about the workshop by the organizers 15. The post discussed the importance of the theme of “negative” results and the goals for the workshop. One of the outcomes of the panel discussion was the call for a taxonomy of negative and null results in eScience. We respond to this call in this paper by proposing a taxonomy in Section 5. In Figure 1, we present a taxonomy of eScience results. The three kinds of results in eScience are positive, null, and negative. In the taxonomy, we focus on negative and null results. Negative results may be caused by one or more of the following reasons: technological, technical, human, and domain. For instance, an erroneous result obtained because of insufficient precision resulting from a limitation of a system library is an example of negative result caused by technical and technological limitations. Similarly, a simulation algorithm resulting from a flawed understanding of a natural phenomenon could be considered a negative result caused by domain and human factors. A mismatch between the problem/data size and the technology/methodology used is an example of a technological reason. Examples of technical causes include software bugs and cyberinfrastructure faults. Human causes include both incidental issues such as mistakes in measurements and systemic issues such as a false hypothesis or an insufficient sample size. Null results are obtained because of the lack of discriminating conditions to confirm or refute a hypothesis. Such situation may be caused by similar reasons as for negative results. An example of technological/technical reason is a statistical test has poor performance on the data because the implementation uses limited precision. Null results can also have a human cause, when insufficient samples are used in the experiments or when some bias in the data goes unnoticed. We are aware of two workshops with similar themes in related fields (Information and Communication Technologies). The first is NoISE (Workshop on Negative or Inconclusive Results in Semantic Web) 19. The second is NOPE (Workshop on Negative Outcomes, Post-mortems, and Experiences) 20. Both the workshops were organized for the first time in 2015. Similarly to the current special issue, there have been two special issues in the prominent journals focused on negative or null results: the Journal on Negative results in Empirical Software Engineering 21 and PLOS ONE Collection 22. The PLOS ONE Collection focuses on inconclusive results as a distinct type of negative results in addition to null results. Similarly to the workshops, both special issues were launched for the first time in 2015. In this section, we discuss some of the key implications that are drawn from the previous sections. These are the issues that are directly impacted by the occurrence of negative and null results in the research as conducted by the scientific community. We classify these implications into four categories: technological, technical, cultural, and domain specific. Publication, credit, and citation of the work that has yielded negative results are an important consideration from the research community point of view. Citations and credit are important measures of success for a research publication. Given the current trends of publishing positive results, it is a crucial decision for a researcher to invest efforts in publishing a negative result. Technical issues such as hardware faults and software bugs often go undetected until late in the research work. In these cases, the negative results are not necessarily of the same nature as the science domain unless the domain is computer science itself. It becomes difficult for a domain scientist to draw value from the publication and dissemination of such results. As a consequence, they often are ignored or fixed after the results were obtained and disseminated. For example, a bug in the third party library call up the toolchain of an application that limited the results of the actual science in scale or precision can be considered a negative result. In some cases, the problems are mismatched to the available computational infrastructure. Sometimes, the problems are too small for a given environment, leading to an inefficient use. In others cases, the problems are too large for the infrastructure, leading to generation of incomplete or no results. Publishing details about such cases can benefit the community by allowing it to better understand how to more optimally match problems and solutions. Finally, a cumulative effect from more than one cause is possible. One of the biggest concern about such issues is that they often go unnoticed by the larger community and hence the appropriate correction measures are not adapted. We are grateful to the program committee members, the panelists, and the authors who supported the realization of this first workshop. We also thank the reviewers of this special issue. The work by Katz was supported in part by the National Science Foundation while working at the Foundation. Any opinion, finding, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call