CarD-T: An Automated Pipeline for the Nomination and Analysis of Potential Human Carcinogens.
The identification and classification of carcinogens is critical in cancer epidemiology, necessitating updated methodologies to manage the burgeoning biomedical literature. We introduce the Carcinogen Detection via Transformers (CarD-T) framework, combining transformer-based machine learning with probabilistic analysis to efficiently nominate potential carcinogens from scientific texts. Trained on 60% of established carcinogens, CarD-T correctly identifies all remaining known carcinogens and nominates ∼1,600 potential new carcinogens. Comparative assessment against GPT-4 reveals CarD-T's comparable precision (0.896 vs 0.903), and superior recall (0.853 vs 0.757), implying an improved ability to nominate potential carcinogens for further evaluation. CarD-T associates each nominated entity with relevant scientific literature, allowing for additional analysis of conflicting implications over time through a Bayesian Probabilistic Carcinogen Denomination (PCarD) analysis. The framework also provides rich insights into carcinogenesis associated research, revealing significant shifts in research focus on carcinogenic agents over time, from chemical carcinogens to broader categories including biological agents, environmental factors and lifestyle choices. We establish the CarD-T framework as a locally deployable, computationally inexpensive, and robust tool for identifying and nominating potential carcinogens from vast biomedical literature. This framework enhances the agility of public health responses to carcinogen identification, setting a new benchmark for automated, scalable toxicological investigations.
- Research Article
- 10.1158/1538-7445.am2025-lb108
- Apr 25, 2025
- Cancer Research
The identification and classification of carcinogens is critical in cancer epidemiology. We introduce the Carcinogen Detection via Transformers (CarD-T) framework, combining transformer-based machine learning with probabilistic analysis to efficiently nominate potential carcinogens from scientific texts. Trained on 60% of established carcinogens, CarD-T correctly identifies all remaining known carcinogens and nominates ∼1,600 potential new carcinogens. Comparative assessment against GPT-4 reveals CarD-T's comparable precision (0.894 vs 0.903), and superior recall (0.857 vs 0.705), implying improved ability to classify carcinogens not in major databases. Additionally, CarD-T highlights 554 entities with disputing evidence, analyzed using Bayesian Probabilistic Carcinogenic Denomination (PCarD). The framework reveals significant shifts in research focus from chemical carcinogens to broader categories including environmental factors (18%), biological agents (10%), and emerging threats like COVID-19, supported by 577 publications since 2020. This framework enhances the agility of public health responses to carcinogen identification, setting a new benchmark for automated, scalable toxicological investigations. Citation Format: James (Jamey) ONeill, Parag A. Katira. CarD-T: Interpreting Carcinomic Lexicon via Transformers [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2025; Part 2 (Late-Breaking, Clinical Trial, and Invited Abstracts); 2025 Apr 25-30; Chicago, IL. Philadelphia (PA): AACR; Cancer Res 2025;85(8_Suppl_2):Abstract nr LB108.
- Research Article
- 10.1101/2024.08.13.24311948
- Aug 31, 2024
- medRxiv : the preprint server for health sciences
The identification and classification of carcinogens is critical in cancer epidemiology, necessitating updated methodologies to manage the burgeoning biomedical literature. Current systems, like those run by the International Agency for Research on Cancer (IARC) and the National Toxicology Program (NTP), face challenges due to manual vetting and disparities in carcinogen classification spurred by the volume of emerging data. To address these issues, we introduced the Carcinogen Detection via Transformers (CarD-T) framework, a text analytics approach that combines transformer-based machine learning with probabilistic statistical analysis to efficiently nominate carcinogens from scientific texts. CarD-T uses Named Entity Recognition (NER) trained on PubMed abstracts featuring known carcinogens from IARC groups and includes a context classifier to enhance accuracy and manage computational demands. Using this method, journal publication data indexed with carcinogenicity & carcinogenesis Medical Subject Headings (MeSH) terms from the last 25 years was analyzed, identifying potential carcinogens. Training CarD-T on 60% of established carcinogens (Group 1 and 2A carcinogens, IARC designation), CarD-T correctly to identifies all of the remaining Group 1 and 2A designated carcinogens from the analyzed text. In addition, CarD-T nominates roughly 1500 more entities as potential carcinogens that have at least two publications citing evidence of carcinogenicity. Comparative assessment of CarD-T against GPT-4 model reveals a high recall (0.857 vs 0.705) and F1 score (0.875 vs 0.792), and comparable precision (0.894 vs 0.903). Additionally, CarD-T highlights 554 entities that show disputing evidence for carcinogenicity. These are further analyzed using Bayesian temporal Probabilistic Carcinogenic Denomination (PCarD) to provide probabilistic evaluations of their carcinogenic status based on evolving evidence. Our findings underscore that the CarD-T framework is not only robust and effective in identifying and nominating potential carcinogens within vast biomedical literature but also efficient on consumer GPUs. This integration of advanced NLP capabilities with vital epidemiological analysis significantly enhances the agility of public health responses to carcinogen identification, thereby setting a new benchmark for automated, scalable toxicological investigations.
- Research Article
286
- 10.1158/0008-5472.can-08-2852
- Aug 28, 2008
- Cancer research
The American Association for Cancer Research has been the citadel for communicating research on chemical carcinogens for over a century. It therefore seems appropriate that a review of chemical carcinogenesis inaugurates a series of articles highlighting advances in understanding, treating, and
- Research Article
1
- 10.1093/toxsci/10.4.571
- Jan 1, 1988
- Toxicological Sciences
Genotoxicity of Complex Mixtures: Strategies for the Identification and Comparative Assessment of Airborne Mutagens and Carcinogens from Combustion Sources
- Research Article
56
- 10.1093/jnci/64.1.169
- Jan 1, 1980
- JNCI: Journal of the National Cancer Institute
A two-stage process is proposed for a uniform framework for Federal agency decisions regarding the identification, characterization, and control of potential human carcinogens. Stage I would include the identification, through epidemiologic and/or laboratory studies, of chemicals that represent a potential carcinogenic risk and the characterization of that risk. Stage II would encompass the actual regulatory decision-making process regarding control of potential carcinogens. Stage I relies predominantly on scientific activity and judgment. Centralized management could enhance the efficiency and effectiveness of this process. The new National Toxicology Program may be able to perform this function. Stage II judgments are social and political. Centralization of stage II decision-making is not possible under current law.
- Front Matter
51
- 10.1093/annonc/mdn561
- Oct 1, 2008
- Annals of Oncology
Diet, nutrition and cancer: public, media and scientific confusion
- Research Article
- 10.22037/sdh.v3i3.17682
- Jul 26, 2017
Background: The global burden of cancer due to population growth and aging, and various environmental factors is increasing. Skin cancer is the most common cancer among Iranians and among men, is more common. There is strong evidence from Industrialized and less developed countries that cancer incidence and survival is related to socioeconomic factors. The aim of this study was to investigate the relationship between socioeconomic variables including Human Development Index, unemployment rate and Urbanization ratio with the incidence of skin cancer in Iran. Method: The panel data were for 30 provinces for 6 years) 2007 to 2012(. Data of socioeconomic factors were collected from the Statistical Center of Iran and the data related to the incidence of cancer were collected from the reports on cancer registry of Health and Medical Education Ministry. For data analysis Stata11th version was used. Result: There is no relation between unemployment and the incidence of skin cancer. There is negative relationship between urbanization and incidence of skin cancer in both sexes. There is negative relation between HDI and the incidence of skin cancer in both sexes. Conclusion: Among the three variables selected in this study, the human development index and the urbanization, influenced on the cancer incidence. Therefore, in order to prevent skin cancer, paying attention appears to be necessary for policymakers. Key words : Socioeconomic Factors, Skin Neoplasm, Iran
- Research Article
11
- 10.1093/annonc/mds543
- Apr 1, 2013
- Annals of Oncology
The contribution of molecular epidemiology to the identification of human carcinogens: current status and future perspectives
- Research Article
18
- 10.1080/10408444.2020.1727843
- Jan 2, 2020
- Critical Reviews in Toxicology
The European Centre for Ecotoxicology and Toxicology of Chemicals (ECETOC) organized a workshop “Hazard Identification, Classification and Risk Assessment of Carcinogens: Too Much or Too Little?” to explore the scientific limitations of the current binary carcinogenicity classification scheme that classifies substances as either carcinogenic or not. Classification is often based upon the rodent 2-year bioassay, which has scientific limitations and is not necessary to predict whether substances are likely human carcinogens. By contrast, tiered testing strategies founded on new approach methodologies (NAMs) followed by subchronic toxicity testing, as necessary, are useful to determine if a substance is likely carcinogenic, by which mode-of-action effects would occur and, for non-genotoxic carcinogens, the dose levels below which the key events leading to carcinogenicity are not affected. Importantly, the objective is not for NAMs to mimic high-dose effects recorded in vivo, as these are not relevant to human risk assessment. Carcinogenicity testing at the “maximum tolerated dose” does not reflect human exposure conditions, but causes major disturbances of homeostasis, which are very unlikely to occur at relevant human exposure levels. The evaluation of findings should consider biological relevance and not just statistical significance. Using this approach, safe exposures to non-genotoxic substances can be established.
- Research Article
208
- 10.1016/0165-1110(90)90033-8
- Sep 1, 1990
- Mutation Research/Reviews in Genetic Toxicology
Consideration of both genotoxic and nongenotoxic mechanisms in predicting carcinogenic potential
- Research Article
102
- 10.1016/0272-0590(88)90184-4
- May 1, 1988
- Fundamental and Applied Toxicology
Genotoxicity of complex mixtures: Strategies for the identification and comparative assessment of airborne mutagens and carcinogens from combustion sources
- Book Chapter
- 10.4018/979-8-3693-9730-5.ch008
- Apr 25, 2025
Aging is a complex process influenced by biological mechanisms, lifestyle choices, environmental factors, and genetic predispositions. Key biological hallmarks include genomic instability, mitochondrial dysfunction, and OS, which contribute to age-related diseases such as CVD, NDD and sarcopenia. Lifestyle factors like diet, physical activity, and sleep quality significantly impact health. Genetic variations and epigenetic modifications further modulate individual aging trajectories. OS driven by reactive oxygen species, plays a central role in aging, with antioxidants from natural sources offering protective benefits. This chapter deliberates the overview of the aging process with special emphases on OS and methods for mitigating its effects; it further highlights the necessity for individualized techniques like nutrigenomics and examines treatment approaches. Future studies should prioritize large-scale investigations and include genetic, environmental, and lifestyle factors to promote healthy aging and enhance quality of life.
- Abstract
- 10.1016/s0923-7534(20)30094-6
- Jun 1, 2012
- Annals of Oncology
P-0170 Epidemiology of Digestive Tract Cancers in Western India: Recent Trends and Lesson Learnt
- Research Article
5
- 10.1002/clc.21956
- Jan 25, 2012
- Clinical Cardiology
Has the Genomic Revolution Failed?
- Research Article
57
- 10.1016/s0378-4274(02)00495-2
- Feb 8, 2003
- Toxicology Letters
Genotoxicity—threshold or not? Introduction of cases of industrial chemicals
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.