Published in last 50 years
Articles published on Grade Level
- New
- Research Article
- 10.1080/10810730.2025.2583460
- Nov 7, 2025
- Journal of health communication
- Samuel R Mendez + 3 more
Readability formulas are prominent health communication assessment tools, but they can yield varying estimates. Such variation is often treated as error in computerized tools but can result from text preprocessing decisions in manual and computerized assessments alike. This study illustrates the effect of preprocessing on reading grade level estimates in short-form online content, thereby illustrating the importance of reporting these decisions and the limitations of these formulas.We manually counted words, sentences, and syllables in a sample of 100 Tweets by U.S. state health agencies from 2012 through 2022. We applied the Simplified Measure of Gobbledygook and Flesch-Kincaid formulas under 7 inclusive preprocessing scenarios, differentially including URLs, hashtags, and/or numbers in word counts. We compared resulting estimates to those from a restrictive baseline that excluded these elements. Wilcoxon signed-rank tests revealed significant differences in median grade level estimates. No significant differences were found in the percentage of Tweets meeting an 8th-grade benchmark. Linear regression showed that baseline estimates did not adequately explain observed variation.Despite the potential benefit of interpretability, we conclude that readability formulas are unreliable for short-form online content. Instead, we recommend directly using word, sentence, and syllable counts. We also recommend conducting sensitivity analyses for readability assessments.
- New
- Research Article
- 10.70267/iclpce.202506
- Nov 6, 2025
- Exploring Science Academic Conference Series
- Ying Huang + 3 more
Anxiety is a negative emotion that may influence reading performance. The purpose of this study was to investigate the predictive effects of individual differences on Chinese undergraduate students' English reading anxiety and differences in grammar and vocabulary. The predictive effects of gender, major category, grade level, and current level of English proficiency were explored at the individual level. Word comprehension anxiety, word pronunciation anxiety and grammar anxiety are three sources of English reading anxiety. A total of 247 questionnaires were collected in this study, of which 207 were valid. A quantitative methodology was used in this research to handle the data. A multiple linear regression model was used to investigate these predictive factors, and it was found that gender, major category, and grade had no significant effect on Chinese college students' English reading anxiety, whereas the current level of English proficiency had a significant and uneven effect on English reading anxiety. For anxiety resources, Pearson correlation was used to explore the relationships between total reading anxiety scores and word comprehension anxiety, word pronunciation anxiety, and grammar anxiety to identify their ranks. The results of the study revealed that word comprehension anxiety was the most dominant of the three sources of anxiety, followed by grammar anxiety and word pronunciation anxiety. This study also discusses the reasons for these findings and provides pedagogical suggestions for teachers on how to tailor their teaching to the needs of their students, choose appropriate textbooks, and balance the weighting of grammar and vocabulary in English reading, which has a positive effect on the field of foreign language emotion research.
- New
- Research Article
- 10.1055/a-2737-5287
- Nov 5, 2025
- Journal of reconstructive microsurgery
- Luke J Llaurado + 7 more
Diabetic foot ulcer (DFU) care represents a significant challenge in plastic and reconstructive surgery. Oftentimes, patients encounter complex articles and websites to answer questions about their surgeries, including Anterolateral Thigh (ALT) flaps. Artificial intelligence (AI) represents a new and simplified resource for DFU patients seeking information regarding their care. To assess ChatGPT's utility as a patient resource, we evaluated the accuracy, comprehensiveness, and safety of AI-generated responses to frequently asked questions (FAQs) related to ALT flap surgery for DFU. Ten DFU and ALT flap care FAQs were posed to ChatGPT Model 3.5 in June 2024. Four plastic surgeons evaluated responses using a 10-point Likert scale for accuracy, comprehensiveness, and danger of ChatGPT's answers. Surgeons also provided qualitative feedback. Response readability was assessed using 10 readability indexes, averaged to produce a reading grade level for each response. Overall, ChatGPT answered patient questions with a mean accuracy of 9.1 ± 1.2, comprehensiveness of 8.2 ± 1.5, and danger of 2.0 ± 1.0. ChatGPT answered at a mean grade level of 19.8 ± 20.1. Qualitatively, physician reviewers complimented the organizational clarity of the responses (n=4/10) and the AI's ability to provide information on possible surgical complications (n= 4/10). While 1 response was noted to present explicitly incorrect information about pre-operative preparation protocols and when they should be initiated, the majority of responses (n=6/10) left out key post-operative information, notably dangle protocols and compression. ChatGPT provides accurate and comprehensive responses to FAQs related to patients undergoing ALT flap surgery for the treatment of DFUs. The AI-generated responses were praised for organizational clarity and informative content regarding surgical complications, but lacked essential post-operative care details. Therefore, while ChatGPT is a valuable informational tool, further refinement is necessary to ensure fully comprehensive information is provided to DFU patients.
- New
- Research Article
- 10.3390/nu17213481
- Nov 5, 2025
- Nutrients
- Sandra López-Arana + 1 more
Background: Energy drink (ED) consumption has increased significantly among adolescents worldwide, constituting a high-risk behavior with important public health implications. These beverages are associated with cardiovascular disturbances, sleep disorders, anxiety symptoms, and risky behaviors, especially when combined with alcohol. In Chile, monthly per capita ED consumption among individuals aged 14 to 30 increased ninefold between 2010 and 2020. Objectives: To examine the prevalence of ED consumption and its associated determinants among Chilean adolescents enrolled in grade eight through grade twelve. Methods: Data were drawn from the national representative survey 2023 entitled Fifteenth National Study on the School Population of Chile (ENPE). The final sample included 45,042 adolescents. Besides descriptive analyses, both bivariate and multivariate logistic regression models were used to examine associations between sociodemographic characteristics, parental presence, monitoring, and ED consumption outcomes. Results: Lifetime prevalence of ED consumption was 71.0%, with higher rates in females (72.8%) than males (69.4%). Past-month consumption was reported by 46.2%, with a higher prevalence in males (48.3%) than females (43.9%). Lifetime consumption of ED mixed with alcohol (AmED) was 23.2%, being more frequent among females (26.4%) than males (19.9%). Age, grade level, and indigenous identity were consistently associated with higher odds of consumption. Parental monitoring and involvement indicators were inversely associated with both ED and AmED consumption. Conclusions: This study reveals a high prevalence of ED consumption among Chilean adolescents, with notable gender differences. Family protective factors, particularly parental monitoring and cohesion, emerge as key determinants of this risky behavior and warrant prioritization in public health prevention efforts.
- New
- Research Article
- 10.33418/education.1559434
- Nov 5, 2025
- Educational Academic Research
- Atilla Özdemir + 1 more
This study aimed to develop and validate the Design Thinking Skills Perception Scale (DTSPS) to assess high school students' perceptions of their design thinking (DT) skills. An exploratory sequential mixed-methods design was adopted, integrating qualitative and quantitative approaches. In the qualitative phase, semi-structured interviews were conducted with 15 students (Grades 9–12) to explore their views on empathy, problem identification, creative ideation, prototyping, solution orientation, and teamwork. Thematic analysis revealed empathy and teamwork as essential, while prototyping and testing were challenging due to limited experience. These insights guided the initial item pool. In the quantitative phase, data were collected from 208 students in Central Anatolia. Exploratory and Confirmatory Factor Analyses confirmed a six-dimensional structure, explaining 64% of the total variance. The scale demonstrated strong convergent and discriminant validity, high internal consistency (Cronbach’s α = .73–.88; McDonald’s ω = .77–.91), and measurement invariance across gender and grade levels. Scores range from 20 to 100, with higher scores indicating stronger DT skill perceptions. The scale can be administered individually or in groups, and provides both total and sub-dimension scores. The DTSPS is a reliable and valid instrument offering insights into students' 21st-century skills, including creativity, collaboration, and problem-solving. It addresses a critical gap in DT education by offering a standardized measure for high school students. The findings suggest that the scale can guide educators in supporting students’ development of DT competencies. Future research should validate the DTSPS in more diverse contexts and explore its integration into digital learning environments.
- New
- Research Article
- 10.18844/cerj.v15i4.9561
- Nov 4, 2025
- Contemporary Educational Researches Journal
- Pardis Elahi + 2 more
Emotional intelligence has gained increasing attention in educational research due to its influence on learning outcomes and interpersonal relationships. However, limited evidence exists regarding its development across educational stages among learners of English as a foreign language. This study examined the overall emotional intelligence of senior high school students and explored possible variations across grade levels. A total of 156 participants took part in the investigation. Data were collected using the Self-Report Emotional Intelligence Test to assess learners’ emotional understanding, regulation, and expression. The study employed descriptive and inferential analyses to evaluate differences in emotional intelligence among students in different grades. Findings revealed that participants generally demonstrated a moderate level of emotional intelligence and that emotional intelligence remained relatively consistent across grades. These results suggest that students at the senior secondary level share comparable stages of emotional development. The study highlights the need for educational programs that nurture emotional skills alongside linguistic competence to promote holistic learner development. Keywords: Emotional development; emotional intelligence; English learners; grade level; self-report assessment
- New
- Research Article
- 10.1161/circ.152.suppl_3.4370282
- Nov 4, 2025
- Circulation
- Obaid Khan + 10 more
Background: Large language models (LLMs) are used by atrial fibrillation patients. ChatGPT (OpenAI, San Francisco) and Grok (X.ai, San Francisco) have 450 M, 35 M monthly users, respectively. Grok is the newest LLM, open-sourced, uses Mixture of Experts algorithms, has 314 billion parameters, known for STEM answers and the use of X (formerly Twitter) as a data source. Grok was meant to be conversational in tone. LLMS are trained by data sets initially trained by software engineers and later by AI in part or exclusively. It is not known whether Grok responses about atrial fibrillation queries differ by patient gender and race/ethnicity. Methods: We used the query: “I am a 68-year-old [ethnic/racial group] [male/female] with atrial fibrillation. I had a heart attack 2 years ago with stents. What can I expect from my cardiologist?” Three ethnic groups (White, African American, and Latinx) and male/female gender. Response analysis: Word Count (WC) and Flesch-Kincaid Grade Level (FK). ChatGPT4.5 reviewed the LLM responses for cultural sensitivity. Results: Average WC: ChatGPT= 312.5±110.5, Grok= 830.7±104.7. Average FK: ChatGPT=10.7±0.9, Grok=10.3±1.0. Grok showed high cultural sensitivity, for African American female and Latinx users, e.g. diet, cardiovascular risk factors. Both male and female prompts were treated equitably in tone, depth, and scope. However, Grok did not incorporate culturally relevant content for White male or female users. For the Hispanic prompt, Grok mentioned the existence of “language services” but no website links or related organizations for further help. CHA2DS2-VASc is mentioned by both ChatGPT and Grok. Grok has a lower reading grade level for White males, Black females, Hispanic males than that of ChatGPT which may reflect their use of X (formerly Twitter) data. Grok had the longest response for Black females versus all other ethnic groups in this small study. Conclusion: Grok, the latest LLM, competes well with ChatGPT with its thoroughness and factual medical education answers. Reading level however varies by racial/ethnic group and gender.
- New
- Research Article
- 10.17275/per.25.89.12.6
- Nov 4, 2025
- Participatory Educational Research
- Nazmiye Akyazı + 1 more
Innovative studies in the field of mathematics education, especially in recent years, have focused on problem solving, emphasizing that it is a process skill that should be addressed at every grade level and in every area of mathematics. The fact that many national and international exams try to evaluate students' problem-solving skills is an indication of this importance. Heavy emphasis is placed on problem solving with the aim of improving students’ skills in transferring the knowledge they learn at school to daily life and producing alternative solutions to the problems they may encounter in different situations. One of the methods used to achieve this aim is the drama method. Drama method can enable students to encounter problem situations in classroom environments. Students who encounter problem situations in the classroom environment become active in the process, internalize the problem and can produce appropriate solutions from different perspectives. In this context, the present study was conducted to examine the effect of teaching problems with natural numbers using the drama method on students’ problem-solving achievement and problem-solving attitudes. A quantitative research method was adopted and a pre-test/post-test quasi-experimental design with a control group was used. The study was conducted with two sixth-grade classes of a public school in a province in the north of Turkey. Data were obtained using the Problem-Solving Achievement Test and Mathematics Problem-Solving Attitude Scale. Predictive statistics were used in the analysis of the obtained data. It was concluded that the drama method was effective on students’ problem-solving skills and problem-solving attitudes.
- New
- Research Article
- 10.1177/27527263251389233
- Nov 4, 2025
- Asian Journal for Mathematics Education
- Pragati Bannerjee + 2 more
This study explored high school students’ understanding of rate of change in physical and functional situations involving two covarying quantities, and the nature of difficulties the students exhibited as they reasoned, represented, and made connections between various representations of rate of change in two covarying situations. Ninety-nine students from Grades 9–12 enrolled in Algebra I through AP-Calculus courses in a public high school in the United States, took part in the study. The participants completed a researcher-developed Diagnostic Test on Rate of Change (DTRC) that assessed their understanding of various representations of rate of change involving two covarying quantities in physical and functional situations. A follow-up semi-structured interview was conducted with 16 of the participants to gain more insights into the students’ conceptual understanding and nature of difficulties they have with the topic. Overall, the results indicated that the participants’ understanding of rate of change was weak, did not vary significantly across grade levels, and did not build consistently in a manner that could enable them to develop a solid mathematical structure of functions by the end of Precalculus. Implications of the results for instruction and limitations of the study are provided.
- New
- Research Article
- 10.1161/circ.152.suppl_3.4366319
- Nov 4, 2025
- Circulation
- Rishab Agarwal + 4 more
Introduction: Cardiovascular diseases (CVDs) are the leading cause of death globally, yet medication nonadherence remains high. Artificial intelligence (AI) tools may help improve adherence by counseling patients about their medications. However, concerns about usability and general mistrust of AI-generated health content remain. Assessing AI responses to common patient questions can inform future patient education strategies. Research Question: How does ChatGPT perform in accuracy, completeness, and readability when responding to patient questions about medications used for CVDs? Methods: A standardized set of 11 questions was developed across three categories: therapeutic effects/usage, side effects, and lifestyle changes (Figure 1). Questions were sequentially input into ChatGPT-4o for commonly prescribed CVD drug classes (Figure 2). The AI-generated responses were compiled into a Qualtrics survey and evaluated independently by an internal medicine attending, a board-certified cardiologist, and a cardiac ICU pharmacist. Each rated accuracy and completeness on an 8-point Likert scale (1 = “not at all accurate/complete”; 8 = “extremely accurate/complete”). Flesch-Kincaid Grade Level and Reading Ease scores were calculated to provide accuracy-independent metrics of response quality. Group-wise comparisons evaluated differences in accuracy, completeness, and readability between categories. Results: ChatGPT-generated responses were rated highly for accuracy (mean = 7.89) and completeness (mean = 7.80) across all questions (Table 1). However, the average reading level was at college-grade, far above the recommended 6th-8th grade level for patient materials, indicating a mismatch with typical health literacy levels and poor overall readability. Category-specific analyses yielded that questions regarding therapeutic effects/usage were significantly easier ( p< .001 ) to read than those on side effects or lifestyle changes, potentially due to the relatively subjective nature of the latter two categories. Reviewers also noted key features missing from some responses, including pre-procedural medication counseling, dosing guidance, and layperson-friendly language. Conclusion: ChatGPT responses were overall considered accurate and complete, but low readability scores suggest the information may be inaccessible for the average patient. Future AI tools should prioritize plain language and patient-centered design to enhance accessibility without sacrificing content quality.
- New
- Research Article
- 10.1161/circ.152.suppl_3.4370089
- Nov 4, 2025
- Circulation
- Obaid Khan + 10 more
Background: Large language models (LLMs) are used by atrial fibrillation patients. Cardiovascular outcomes may vary by Asian subgroup. Asians comprise 6% of the American population. However, it is not known whether LLM responses vary for atrial fibrillation when specifying an Asian user in the prompt. Methods: We used in the search prompt the query to ChatGPT, Gemini, Claude.ai, and Meta AI: “I am a 68-year-old [Asian subgroup] [male/female] with atrial fibrillation. I had a heart attack 2 years ago with stents. What can I expect from my cardiologist?” Subgroups used: Chinese, South Asian, Native American and Pacific Islander; male/female gender. Response analysis: Word Count (WC), Flesch-Kincaid Grade Level (FK), and Cosine Similarity Score. Responses were reviewed by ChatGPT4.5 for cultural sensitivity. Results: Average word counts: ChatGPT 407.6, Gemini 917.4, Claude.ai 304.9, Meta AI 245.8 (mean 468.9±273.4). FK scores: ChatGPT 12.0, Gemini 13.4, Claude.ai 42.5, Meta AI 13.5 (mean 20.3±13.4). Gemini produced the longest responses across all groups (WC avg=917.4); Meta AI and Claude.ai generated the shortest word counts. Claude.ai’s responses were the least readable (post-college), while ChatGPT’s were the most accessible (grade 12.0). Cosine similarity scores ranged from 68.1%–80.6% (1.00 = perfect; mean 74.9±3.2). Meta AI showed the least number of cultural sensitivity responses of the LLMs. Claude.ai was the only LLM to mention Indian Health Service for Native Americans. CHA2DS2-VASc and HAS-BLED scores were mentioned in ChatGPT and Gemini, but not in Claude.ai or Meta AI. All LLMs except Meta AI, mentioned use of antiarrhythmics. Anticoagulation medications were mentioned in all 4 LLMs. Catheter ablation was mentioned in ChatGPT and Gemini only. Gemini had the highest word count for Pacific Islander Male/Female prompts. Claude.ai had the highest reading level for Pacific Islanders. Conclusion: The LLMs answers for atrial fibrillation were beyond 6th grade, at college or beyond. Claude.ai used the most complicated medical terms. ChatGPT and Gemini answered the questions for the atrial fibrillation patients most completely.
- New
- Research Article
- 10.1093/jhps/hnaf066
- Nov 4, 2025
- Journal of Hip Preservation Surgery
- Brendan Gleason + 5 more
ABSTRACT Hip and knee preservation have rapidly expanded over the past decade. As technology and surgeon familiarity have increased, patient interest in this field has also grown. Health literacy data demonstrates that up to 43% of US adults have basic or below-basic reading skills. Federal and medical agencies recommend patient facing materials (PFMs) be written no higher than a sixth to eighth grade level. We identified PFMs pertaining to hip and knee preservation surgery from top orthopaedic institutions, as ranked by the 2025 US News & World Report. PFMs were assessed using validated readability formulas. We then used Spearman rank correlation to examine the association between the hospital’s ranking and the readability of its PFMs. A multivariate analysis was also conducted to explore whether institutional characteristics had any significant relationship with readability scores. PFMs for hip and knee preservation surgery provided by highly ranked orthopaedic institutions were commonly written well above the recommended sixth to eighth grade reading level. There was no significant statistical relationship between institutional characteristics and readability. PFMs for hip and knee preservation surgery are widely available but are written at a much higher reading level than is recommended. To increase accessibility, enhance healthcare literacy, and improve patient outcomes, institutions should simplify these materials to meet recommended readability standards. The gap between the reading level of available materials and the literacy of the target patient population may hinder patient comprehension, engagement, and informed consent, potentially impacting healthcare outcomes.
- New
- Research Article
- 10.1111/ajad.70101
- Nov 4, 2025
- The American journal on addictions
- Vitor M Campos + 4 more
AI chatbots are increasingly used in patient education. For opioid use disorder (OUD), content must be readable and non-stigmatizing. We compared ChatGPT responses with U.S. health organization FAQs on readability, complexity, and stigma. We analyzed 50 OUD FAQs paired with ChatGPT GPT-4o responses. Outcomes included word and sentence counts, lexical density, and six readability indices. Paired differences were tested with t tests or Wilcoxon signed-rank tests. ChatGPT responses were longer than FAQs, with a mean word count of 253.7 vs. 76.6 (difference 177; 95% CI, 151-203) and sentence count of 18.2 vs. 9.0 (difference 9.2; 95% CI, 7.6-10.9). Lexical density was higher by 6.5 percentage points (95% CI, 4.0-9.0), with more characters (0.55; 95% CI, 0.40-0.70) and syllables per word (0.19; IQR, 0.14-0.24). Readability grade levels were consistently higher: Coleman-Liau +3.43, Gunning Fog +3.47, SMOG + 2.96, Flesch-Kincaid +3.61, Automated Readability Index +4.33, and Flesch Reading Ease -20.4 (all p < .05). Stigmatizing term frequency was similar, 0.98 vs. 0.28 per answer (95% CI, -1.3 to +3.3). ChatGPT responses were longer and more complex than FAQs, although the frequency of stigmatizing language was similar. ChatGPT produced more comprehensive but less readable content than FAQs, revealing a gap with health literacy standards. While stigmatizing terms were uncommon unless simplified, length and complexity may hinder use. Findings quantify readability and stigma trade-offs in AI-generated OUD education and emphasize the need for plain language prompting and human review.
- New
- Research Article
- 10.17275/per.25.80.12.6
- Nov 4, 2025
- Participatory Educational Research
- Ruhan Karadağ Yılmaz + 2 more
Learning loss indicates reduced knowledge and abilities, or delays in academic advancement, resulting from prolonged breaks or disruptions. Several individual and environmental factors affecting the educational process constitute an essential cause of learning loss in students. In recent years, crises such as the pandemic and earthquake in our country have profoundly affected the educational process and caused severe student learning losses. This study aims to evaluate whether teachers' awareness of learning losses at the primary school level and the intervention strategies they apply to identify and eliminate them. In this research, a case study approach was implemented as one of the qualitative research methods, with data gathered through an open-ended interview. This study used a mixed-purpose sampling method. In this context, convenience and maximum diversity sampling techniques were used to determine the participants. 29 primary school teachers from different regions of Türkiye who teach at different grade levels participated in the study. Descriptive and content analysis techniques were employed for data analysis. The study's findings showed that teachers had superficial knowledge about learning losses. Still, they did not have enough knowledge about the intervention strategies that should be implemented for students facing learning challenges. It was revealed that teachers needed more professional training in terms of preparing and implementing intervention programs to prevent learning losses.
- New
- Research Article
- 10.1161/circ.152.suppl_3.4369507
- Nov 4, 2025
- Circulation
- Hrishi Paliath-Pathiyal + 10 more
Background: Large language models (LLMs) are used by patients seeking information about atrial fibrillation. More than 1 billion monthly users use 4 common LLMs: ChatGPT, Gemini, Claude.ai, and Meta AI. It is not known, however, how LLM responses to atrial fibrillation inquiries differ by patient gender and ethnic group/race. Methods: The following query was posed to these 4 LLMs: “I am a 68-year-old [racial/ethnic group and gender] with atrial fibrillation. I had a heart attack 2 years ago with coronary artery stents. What can I expect from my cardiologist?” Three ethnic/racial groups (White, African American, and Latinx) and male/female gender were studied . Response analysis: Word Count, Flesch-Kincaid Grade Level (FK), and Cosine Similarity Score. ChatGPT4.5 was used to rate cultural sensitivity. Results: Average word counts: ChatGPT= 312.5, Gemini= 937.7, Claude.ai= 262.5, Meta AI=240 (mean 438.2±304.3). FK scores: ChatGPT=10.7, Gemini=13.3, Claude.ai=30.7, Meta AI=12.4 (mean 16.8±8.5). Meta AI generated the least culturally sensitive (CS) content across all demographic prompts. Word count analysis showed Meta AI and Claude.ai with the shortest responses, Gemini the longest. Cosine score ranged from 71.7%–78.2% (1.00 = perfect; mean 74.5±3.0). Readability analysis showed Claude.ai's responses had the lowest health literacy (beyond college), while ChatGPT’s were most accessible (10th-grade level). ChatGPT and Gemini mentioned CHA2DS2-VASc scores. All LLMs mentioned anticoagulation and antiarrhythmic medications. None mentioned catheter ablation. Of the 4 LLMs, Meta AI mentioned to the lowest extent systemic barriers/social determinants of health relevant to African American or Latinx patients. All except ChatGPT included cultural sensitivity and health issues for Black women. No LLMS included cultural issues for White women. Conclusion: The four LLMs are unique in their responses to queries about atrial fibrillation. As LLMs evolve it will be important to consider these variations to understand their strengths and limitations.
- New
- Research Article
- 10.3390/jcm14217804
- Nov 3, 2025
- Journal of Clinical Medicine
- Kunjan Patel + 1 more
Background/Objectives: Artificial Intelligence (AI)-based chatbots such as ChatGPT are easily available and are quickly becoming a source of information for patients as opposed to traditional Google searches. We assessed the quality of information on bladder cancer, provided by various AI chatbots such as ChatGPT 4o, Google Gemini 2.0 flash, Grok 3, Claude Sonnet 3.7 and DeepSeek R1. Their responses were analysed in terms of Readability Indices, and two consultant urologists rated the quality of information provided using the validated DISCERN tool. Methods: The top 10 most frequently asked questions about bladder cancer were identified using Google Trends. These questions were then provided to five different AI chatbots, and their responses were collected. No prompts were used, reflecting natural language queries that patients would use. The responses were analysed in terms of their readability using five validated indices: Flesch Reading Ease (FRE), the Flesch–Kincaid Reading Grade Level (FKRGL), the Gunning Fog Index, the Coleman–Liau Index and the SMOG index. Two consultant urologists then independently assessed the responses of various AI chatbots using the DISCERN tool, which rates the quality of the health information on a five-point LIKERT scale. Inter-rater agreement was calculated using Cohen’s Kappa and the intraclass correlation coefficient (ICC). Results: ChatGPT 4o was the overall winner in readability scores, with the highest Flesch Reading Ease score (59.4) and the lowest average reading grade level (7.0) required to understand the material. Grok 3 was a close second (FRE 58.3, grade level 8.7). Claude 3.7 Sonnet used the most complex language in its answers and therefore scored the lowest FRE score of 44.9, with the highest grade level (9.5) and also the highest complexity on other indices. In the DISCERN analysis, Grok 3 received the highest average score (52.0), followed closely by ChatGPT 4o (50.5). The inter-rater agreement was highest for ChatGPT 4o (ICC: 0.791; Kappa: 0.437), while it was lowest for Grok 3 (ICC: 0.339, Kappa 0.0, Weighted Kappa 0.335). Conclusions: All AI chatbots can provide generally good-quality answers to questions about bladder cancer with zero hallucinations. ChatGPT 4o was the overall winner, with the best readability metrics, strong DISCERN ratings and highest inter-rater agreement.
- New
- Research Article
- 10.21666/muefd.1658595
- Nov 3, 2025
- Muğla Sıtkı Koçman Üniversitesi Eğitim Fakültesi Dergisi
- H Beyza Albayrak
The aim of the study is to examine the relationship between pre-service elementary school teachers’ mathematical literacy self-efficacy, mathematical modeling self-efficacy and attitudes. The study was conducted with relational survey model, one of the quantitative research designs. The study was conducted with 257 pre-service elementary school teachers’ studying at the 1st, 2nd, 3rd and 4th grade level. The data were collected with the Mathematics Literacy Self-Efficacy Scale, Mathematical Modeling Self-Efficacy Scale and Mathematical Modeling Attitude Scale. Frequency, arithmetic mean, standard deviation and difference tests were used to analyze the data. According to the results obtained, mathematical literacy self-efficacy and mathematical modeling self-efficacy of pre-service elementary school teachers were determined as medium level, and mathematical modeling attitudes were determined as high level. It was determined that pre-service elementary school teachers’ mathematical literacy self-efficacy and mathematical modeling self-efficacy differed significantly according to grade level, but mathematical modeling attitudes did not differ significantly according to grade level. It was determined that there was a positive and moderately significant relationship between pre-service elementary school teachers’ mathematical literacy self-efficacy, mathematical modeling self-efficacy and mathematical modeling attitudes.
- New
- Research Article
- 10.3390/psychiatryint6040135
- Nov 3, 2025
- Psychiatry International
- Filipe Prazeres
Background: Dispelling myths is crucial for policy and health communication because misinformation can directly influence public behavior, undermine trust in institutions, and lead to harmful outcomes. This study aims to assess the effectiveness and differences between OpenAI’s ChatGPT and Google Gemini in dispelling myths about children’s and adolescents’ mental health. Methods: Using seven myths about mental health from the UNICEF & WHO Teacher’s Guide, ChatGPT-4o and Gemini were asked to “classify each sentence as a myth or a fact”. Responses of each LLM for word count, understandability, readability and accuracy were analyzed. Results: Both ChatGPT and Gemini correctly identified all 7 statements as myths. The average word count of ChatGPT’s responses was 60 ± 11 words, while Gemini’s responses averaged 60 ± 29 words, a statistically non-significant difference between the LLMs. The Flesch–Kincaid Grade Level averaged 11.7 ± 2.2 for ChatGPT and 10.2 ± 1.3 for Gemini, also a statistically non-significant difference. In terms of readability, both ChatGPT and Gemini’s answers were considered difficult to read, with all grades exceeding the 7th grade level. The findings should nonetheless be interpreted with caution due to the limited dataset. Conclusions: The study adds valuable insights into the strengths of ChatGPT and Gemini as helpful resources for people seeking medical information about children’s and adolescents’ mental health, although the content may not be as easily accessible to those below a college reading level.
- New
- Research Article
- 10.1037/dev0002104
- Nov 3, 2025
- Developmental psychology
- Nicole Llewellyn + 3 more
Social goals (i.e., objectives that individuals aim to attain or avoid in their peer relationships through their actions; Emmons, 1996; Parkhurst & Asher, 1985) are important for a variety of adjustment outcomes, but little is known about the factors that foster them. Exposure to peer victimization, either at a young and formative age or as a long-term burden, has a significant impact on how youth negotiate their social world, and may alter social goal orientation during the adolescent transition. The present study examined the hypothesis that exposure to higher early (second grade) levels of victimization and smaller decreases or larger increases in victimization over time (second-seventh grade) would uniquely predict lower levels of social mastery goals and elevated levels of social performance goals in the seventh grade. Longitudinal growth curve analysis was employed in a sample of 636 youth (298 boys, 338 girls; Mage = 7.97, SD = 0.37) followed from second to seventh grade. Youh reported annually on peer victimization and reported on social goals in second and seventh grades. Compared to youth experiencing lower early levels and/or normative declines in peer victimization over this span, youth experiencing elevated early levels and/or smaller decreases/larger increases in victimization between second and seventh grade tended to endorse greater seventh-grade performance-approach and performance-avoidance goals. No association was evident for mastery goals. These results provide evidence that peer victimization can have resounding implications for the development of social goals, which can have long-lasting effects on social health and well-being. (PsycInfo Database Record (c) 2025 APA, all rights reserved).
- New
- Research Article
- 10.3390/antiox14111327
- Nov 3, 2025
- Antioxidants
- Qianwen Sun + 8 more
This study investigated the epigenetic mechanisms through which graded levels of dietary methionine (Met) regulates growth, muscle quality, and health in juvenile Megalobrama amblycephala fed an all-plant-protein diet. Dietary Met supplementation improved growth performance in a dose-dependent manner and enhanced muscle nutritional quality, particularly protein content and amino acid composition. Optimal Met intake also strengthened hepatic antioxidant defenses, stabilized mitochondrial function, and modulated plasma metabolite profiles, including metabolites associated with antioxidant, anti-inflammatory, and antimicrobial activity. Epigenetic analysis revealed that dietary Met influenced hepatic N6-methyladenosine (m6A) RNA methylation and the expression of genes involved in autophagy and apoptosis, suggesting that these molecular pathways contribute to the observed physiological benefits. Collectively, these findings indicate that appropriate dietary Met of 10.1 g/kg not only supports growth and nutrient utilization but also promotes metabolic and cellular homeostasis through epigenetic regulation. This work provides novel insights into the nutritional and molecular strategies for improving the health and performance of juvenile M. amblycephala under plant-based feeding regimes, with potential implications for sustainable aquaculture practices.