We Should Evaluate Real-World Impact
Abstract The ACL community has very little interest in evaluating the real-world impact of NLP systems. A structured survey of the ACL Anthology shows that perhaps 0.1% of its papers contain such evaluations; furthermore most papers that include impact evaluations present them very sketchily and instead focus on metric evaluations. NLP technology would be more useful and more quickly adopted if we seriously tried to understand and evaluate its real-world impact.
- Supplementary Content
1
- 10.2196/68720
- Mar 5, 2025
- Journal of Medical Internet Research
BackgroundNatural language processing (NLP) has the potential to promote public health. However, applying these technologies in African health systems faces challenges, including limited digital and computational resources to support the continent’s diverse languages and needs.ObjectiveThis scoping review maps the evidence on NLP technologies for public health in Africa, addressing the following research questions: (1) What public health needs are being addressed by NLP technologies in Africa, and what unmet needs remain? (2) What factors influence the availability of public health NLP technologies across African countries and languages? (3) What stages of deployment have these technologies reached, and to what extent have they been integrated into health systems? (4) What measurable impact has these technologies had on public health outcomes, where such data are available? (5) What recommendations have been proposed to enhance the quality, cost, and accessibility of health-related NLP technologies in Africa?MethodsThis scoping review includes academic studies published between January 1, 2013, and October 3, 2024. A systematic search was conducted across databases, including MEDLINE via PubMed, ACL Anthology, Scopus, IEEE Xplore, and ACM Digital Library, supplemented by gray literature searches. Data were extracted and the NLP technology functions were mapped to the World Health Organization’s list of essential public health functions and the United Nations’ sustainable development goals (SDGs). The extracted data were analyzed to identify trends, gaps, and areas for future research. This scoping review follows the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews) reporting guidelines, and its protocol is publicly available.ResultsOf 2186 citations screened, 54 studies were included. While existing NLP technologies support a subset of essential public health functions and SDGs, language coverage remains uneven, with limited support for widely spoken African languages, such as Kiswahili, Yoruba, Igbo, and Zulu, and no support for most of Africa’s >2000 languages. Most technologies are in prototyping phases, with only one fully deployed chatbot addressing vaccine hesitancy. Evidence of measurable impact is limited, with 15% (8/54) studies attempting health-related evaluations and 4% (2/54) demonstrating positive public health outcomes, including improved participants’ mood and increased vaccine intentions. Recommendations include expanding language coverage, targeting local health needs, enhancing trust, integrating solutions into health systems, and adopting participatory design approaches. The gray literature reveals industry- and nongovernmental organizations–led projects focused on deployable NLP applications. However, these projects tend to support only a few major languages and specific use cases, indicating a narrower scope than academic research.ConclusionsDespite growth in NLP research for public health, major gaps remain in deployment, linguistic inclusivity, and health outcome evaluation. Future research should prioritize cross-sectoral and needs-based approaches that engage local communities, align with African health systems, and incorporate rigorous evaluations to enhance public health outcomes.International Registered Report Identifier (IRRID)RR2-doi:10.1101/2024.07.02.24309815
- Conference Article
234
- 10.3115/1557769.1557781
- Jan 1, 2007
The statistical modelling of language, together with advances in wide-coverage grammar development, have led to high levels of robustness and efficiency in NLP systems and made linguistically motivated large-scale language processing a possibility (Matsuzaki et al., 2007; Kaplan et al., 2004). This paper describes an NLP system which is based on syntactic and semantic formalisms from theoretical linguistics, and which we have used to analyse the entire Gigaword corpus (1 billion words) in less than 5 days using only 18 processors. This combination of detail and speed of analysis represents a break-through in NLP technology.
- Conference Article
65
- 10.1109/iicspi.2018.8690387
- Dec 1, 2018
Natural language processing technology is widely used in artificial intelligence fields such as machine translation, human-computer interaction and speech recognition. Natural language processing is a daunting task due to the variability, ambiguity and context-dependent interpretation of human language. The current deep learning technology has made great progress in NLP technology. However, many NLP systems still have practical problems, such as high training complexity, computational difficulties in large-scale content scenarios, high retrieval complexity and lack of probabilistic significance. This paper proposes an improved NLP method based on long short-term memory (LSTM) structure, whose parameters are randomly discarded when they are passed backwards in the recursive projection layer. Compared with baseline and other LSTM, the improved method has better F1 score results on the Wall Street Journal dataset, including the word2vec word vector and the one-hot word vector, which indicates that our method is more suitable for NLP in limited computing resources and high amount of data.
- Video Transcripts
- 10.48448/ed2c-ap93
- Aug 1, 2021
This is an incredible moment for NLP. We all routinely work with models whose capabilities would have seemed like science fiction just two decades ago, powerful organizations eagerly await our latest results, and NLP technologies are playing an increasingly large role in shaping our society. As a result, all of us in the NLP community are likely to participate in research that will contribute (to varying degrees and perhaps only indirectly) to technologies that will impact many people’s lives, with both positive and negative consequences for example, technologies that broaden accessibility, enhance creative self-expression, heighten surveillance, and create propaganda. What can we do to fulfill the social responsibility that this brings? As a (very) partial answer to this question, I will review a number of important recent developments, spanning many research groups, concerning dataset creation, model introspection, and system assessment. Taken together, these ideas can help us more reliably characterize how NLP systems will behave, and more reliably communicate this information to a wider range of potential users. In this way, they can help us meet our obligations to the people whose lives are impacted by the results of our research.
- Conference Article
3
- 10.1109/indiscon54605.2022.9862877
- Jul 15, 2022
Natural disasters such as cyclones and floods recur frequently in certain parts of the world. In this work, we provide a framework to build an easily deployable disaster management application, over five stages. First, we interview four categories of people to understand current problems and approaches in disaster management. Next, we analyze responses and establish that identified disaster management efforts are hitherto unable to effectively harness existing technology. We accordingly build a guided recommendation toolbox of existing Machine Learning (ML), Internet of Things (IOT), and NLP technologies, that satisfies critical system requirements for the identified efforts; this is qualitatively evaluated by senior data scientists and disaster management researchers, and found to reduce development time, as well as increase reliability and user-friendliness. Finally we provide a model to decentralize disaster management. Our work promotes the development of NLP systems tailored for disaster management, bridging the gap between research and real world applications.
- Research Article
2
- 10.1111/gcb.15770
- Jul 16, 2021
- Global Change Biology
Addressing climate change risks requires collaboration and engagement across all sectors of society. In particular, effective partnerships are needed between research scientists producing new knowledge, policy-makers and practitioners who apply conservation actions on the ground. We describe the implementation of a model for increasing the application and useability of biodiversity research in climate adaptation policy and practice. The focus of the program was to increase the ability of a state government agency and natural resource practitioners in Australia to manage and protect biodiversity in a changing climate. The model comprised a five-stage process for enhancing impact (i) initiation of research projects that addressed priority conservation policy and management issues; (ii) co-design of the research using a collaborative approach involving multiple stakeholders; (iii) implementation of the research and design of decision tools and web-based resources; (iv) collaborative dissemination of the tools and resources via government and community working groups; and (v) evaluation of research impact. We report on the model development and implementation, and critically reflect on the model's impact. We share the lessons learnt from the challenges of operating within a stakeholder group with diverse objectives and criteria for success, and provide a template for creating an environmental research program with real world impact.
- Research Article
- 10.3389/fbloc.2025.1564083
- Jun 2, 2025
- Frontiers in Blockchain
The Regenerative Finance (ReFi) movement is gaining traction in the Web3 space, with numerous blockchain-based initiatives claiming alignment with regenerative outcomes. However, many of these claims remain vague or structurally unsubstantiated. This study evaluates 40 self-identified ReFi initiatives to determine the extent to which their design, governance, capital structures, and impact logic align with foundational regenerative principles. Drawing from regenerative economics, living systems theory, and regenerative organizational design, a structured evaluation framework was developed covering six dimensions across three domains: regenerative finance, real-world impact, and regenerative organizational design. The framework informed two scoring-based questionnaires, enabling systematic assessment of regenerative and impact claims. Results revealed significant variation in alignment: 50% of initiatives were categorized as Regenerative Finance (ReFi), 45% as Sustainable DeFi, and 5% as Structurally Misaligned, reflecting limited coherence between regenerative claims and actual practice. The findings showed that team diversity and initiative maturity were positively correlated with regenerative performance, and that a lack of holistic impact evaluation—across thematic dimensions and throughout operational, direct, and indirect value chains—remains a key limitation across the sector. A typology of regenerative alignment and a replicable self-evaluation tool were developed to help funders, practitioners, and protocol developers assess which ReFi initiatives are structurally aligned with regenerative principles and which remain aspirational. This research advances conceptual and practical clarity around the term “regenerative” in Web3, supporting the evolution of more accountable, transparent, and transformation-oriented financial systems in service to the Global Commons.
- Single Report
- 10.18235/0013075
- Sep 6, 2024
MEiRA is a novel method for evaluating learning effectiveness, emphasizing practical knowledge application and learner achievement. It transcends traditional metrics by valuing the learning journey and its outcomes equally. Applicable in organizational training and broader learning contexts, its designed for scenarios where learners may not be part of a known organization. MEiRA follows the learning journey through five pillars: Engagement, Perception and Appreciation, Cognition and Knowledge Acquisition, Intention and Commitment, and Transfer and Impact Stories. It uses open badges as an intrinsic part of the impact evaluation process, making the impact of applied learning visible. MEiRA overcomes conventional evaluation model limitations by measuring acquired knowledge and recognizing its real-world application and impact. It provides a flexible, inclusive, and comprehensive approach to learning evaluation, championing the recognition of both formal and informal learning achievements, and advocating for diverse learning pathways and their professional impacts.
- Research Article
- 10.1128/spectrum.02239-24
- Feb 25, 2025
- Microbiology spectrum
Limitations of culture-based diagnostic approaches in pathogen detection in joint infections (JI) can be overcome by amplification-based, molecular assays. Recently, a syndromic panel PCR (spPCR) assay (Biofire JI panel; BJA) was approved for pathogen identification from synovial fluid (SF). Here, the performance and the clinical impact of the BJA were assessed in comparison to standard of care diagnostics in a prospective cohort of patients presenting with symptoms consistent with JI. One hundred sixty-five synovial fluids underwent analysis using the BJA. The results were compared with culture-based diagnostics. Discrepant results were re-analyzed using species-specific PCRs or 16S-rDNA sequencing. Clinical data from patients were collected to evaluate the impact on patient management. Twenty-seven of 165 (16.3%) synovial fluid cultures grew bacterial pathogens. In 24/27 cases, the BJA results were concordant. In one case, the cultured pathogen was missed, but three additional pathogens were identified. In 11 culture-negative cases, BJA identified a pathogen. Mean turnaround time in culture-positive samples was 14:11 h and 35:17 h in BJA and culture, respectively. In 11 cases, antibiotic therapy was optimized, based on BJA results. This study demonstrates high sensitivity and specificity (96.3% and 97.8%, respectively) of BJA, as well as a shorter turnaround time than culture-based techniques (21 h faster). Based on analysis of clinical data, antibiotic therapy was optimized due to BJA results in 11 cases. Care must be taken, as important pathogens in prosthetic JI are not included in the panel, restricting its value here.IMPORTANCEPathogen detection is critical for targeted management of joint infections; however, cultural detection of pathogens can be challenging. The Biofire Joint Infection Assay (BJA) is a syndromic panel PCR test that allows culture-independent detection of 31 pathogens. The diagnostic performance and clinical impact were evaluated in a cohort of 160 patients with native and prosthetic joint infections. BJA detected concordant pathogens in 24 of 27 culture-positive cases and enabled the detection of additional pathogens in 11 patients. The time to result was significantly shorter than with standard culture-based diagnostics (14 vs 35 h), and BJA allowed optimization of therapy in 11 patients. The data show that BJA is a relevant addition to the diagnostic options for joint infections. Limitations result from incomplete detection of relevant pathogens, especially in prosthetic joint infections. The use of BJA in daily practice must therefore be accompanied by diagnostic stewardship measures.
- Research Article
21
- 10.3390/nu13103534
- Oct 9, 2021
- Nutrients
Suboptimal dietary intake is a critical cause of poor maternal nutrition, with several adverse consequences both for mothers and for their children. This study aimed to (1) assess maternal dietary patterns in India; (2) examine enablers and barriers in adopting recommended diets; (3) review current policy and program strategies to improve dietary intakes. We used mixed methods, including empirical analysis, compiling data from available national and subnational surveys, and reviewing literature, policy, and program strategies. Diets among pregnant women are characterized by low energy, macronutrient imbalance, and inadequate micronutrient intake. Supply- and demand-side constraints to healthy diets include food unavailability, poor economic situation, low exposure to nutrition counselling, food restrictions and taboos, adverse family influence and gender norms, and gaps in knowledge. Intervention strategies with potential to improve maternal diets include food-based programs, behavior change communication, and nutrition-sensitive agriculture interventions. However, strategies face implementation bottlenecks and limited effectiveness in real-world at-scale impact evaluations. In conclusion, investments in systems approaches spanning health, nutrition, and agriculture sectors, with evaluation frameworks at subnational levels, are needed to promote healthy diets for women.
- Conference Article
- 10.1115/imece2011-64419
- Jan 1, 2011
While some debate has existed in the literature regarding the relationship between roof crush and occupant injury, the United States (U.S.) National Highway Traffic Safety Administration (NHTSA) has identified an increased safety benefit in improving roof strength and has mandated new higher roof crush resistance requirements. Frequently, roof impacts occur in rollover crashes when a vehicle travels off the lanes of the roadway and impacts various types of narrow objects along the roadway edge such as light poles, utility poles and/or trees. A previously reported tilt-test device and methodology is presented along with a new pendulum-test device and methodology, both of which allow for dynamic, repeatable impact evaluation of vehicle roof structures with narrow objects. The data collected includes not only residual crush, but also dynamic vehicle instrumentation and high speed video analysis. Two series of full vehicle tests are reported which represent each of the methodologies. The testing conditions for each series was determined based upon analysis of a real-world narrow object rollover impact. Each testing series allows for analysis of the damage resulting from the narrow object impact to the roof structure for a production vehicle as well as one that has been structurally reinforced. Results demonstrate that the reinforced roof structure significantly reduced the roof deformation compared to that of the production roof structure. The input energy of each test and resulting damage patterns can be used as both a reconstruction tool and structural assessment test.
- Video Transcripts
- 10.48448/f2cz-g450
- Aug 1, 2021
Despite inextricable ties between race and language, little work has considered race in NLP research and development. In this work, we survey 79 papers from the ACL anthology that mention race. These papers reveal various types of race-related bias in all stages of NLP model development, highlighting the need for proactive consideration of how NLP systems can uphold racial hierarchies. However, persistent gaps in research on race and NLP remain: race has been siloed as a niche topic and remains ignored in many NLP tasks; most work operationalizes race as a fixed single-dimensional variable with a ground-truth label, which risks reinforcing differences produced by historical racism; and the voices of historically marginalized people are nearly absent in NLP literature. By identifying where and how NLP literature has and has not considered race, especially in comparison to related fields, our work calls for inclusion and racial justice in NLP research practices.
- Research Article
2
- 10.2139/ssrn.3387328
- Jan 1, 2019
- SSRN Electronic Journal
This working paper analyses real-world impact evaluations in development sectors in low- and middle-income countries. Using the example of grants-for impact evaluations given by the International Initiative for Impact Evaluation (3ie), it explores key drivers of costs, and causes for delays. The analyses provide insights into managing impact evaluations for real-world programmes where the programme team and the impact evaluation team are typically different, start with different objectives and have different timelines. It concludes with emerging lessons and directions for researchers, implementers and donors that are keen to build learning and impact evaluations into their programmes.
- Abstract
1
- 10.1182/blood-2018-99-113956
- Nov 29, 2018
- Blood
Real World Use of Extended Half-Life Products and the Impact on Bleeding Events and Joint Health in the United States
- Conference Article
54
- 10.1145/3548606.3559341
- Nov 7, 2022
A transaction fee mechanism (TFM) is an essential component of a blockchain protocol. However, a systematic evaluation of the real-world impact of TFMs is still absent. Using rich data from the Ethereum blockchain, the mempool, and exchanges, we study the effect of EIP-1559, one of the earliest-deployed TFMs that depart from the traditional first-price auction paradigm. We conduct a rigorous and comprehensive empirical study to examine its causal effect on blockchain transaction fee dynamics, transaction waiting times, and consensus security. Our results show that EIP-1559 improves the user experience by mitigating intrablock differences in the gas price paid and reducing users' waiting times. However, EIP-1559 has only a small effect on gas fee levels and consensus security. In addition, we find that when Ether's price is more volatile, the waiting time is significantly higher. We also verify that a larger block size increases the presence of siblings. These findings suggest new directions for improving TFMs.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.