Limited Number Of Criteria Research Articles

Story writing is a valuable skill for EFL learners, as it allows them to express their creativity and practice their language proficiency. However, assessing story writing can be challenging and time-consuming for teachers, especially when they have to deal with large classes and multiple criteria. Therefore, some researchers have explored the use of artificial intelligence (AI) tools to automate the assessment of story writing and provide feedback to learners. However, the reliability of these tools is still questionable. This study aimed to compare the intra- and inter-rater reliability of three AI tools for assessing EFL learners' story writing: Poe.com, Bing, and Google Bard. The study utilized quantitative tools to answer the research questions, namely, calculating the Fleiss' Kappa coefficient using the Datatab software program (available on datatab.com). The study sampled 14 written pieces by EFL Libyan adult learners, the pieces used were stories built around a prompt provided by the teacher. The assessment was done using two criteria, one including the measurement of students' creativity, and the second was done focusing only on the linguistic aspect of the students' writings. With the creativity criterion, the results of the study show that Poe's intra-rater reliability was 0.01 (slight), while Bing's was 0.2 (fair), Bard's was 0.2 (fair). This shows that Poe is the least reliable assessment tool among the three. For the inter-rater reliability, there were three assessments done to the same 14 sampled pieces to check the consistency of the results. In the first attempt the inter-rater reliability was 0.04 (slight), the second assessment it was 0.01 (slight), on the third time it was -0.03 (no agreement). There was a decrease in the consistency and reliability of scores over time. Without the creativity criterion, the results show that Poe's inter-rater reliability level was 0.05 (slight), while Bing's was -0.02 (no agreement), and Bard's was 0.01 (slight). Here, it is shown that Bing was the least reliable. For the inter-rater reliability, the three assessments made by the three software applications were compared. There were three assessments done on the same 14 sampled pieces to check the consistency of the results. In the first attempt, the inter-rater reliability was 0 (slight), the second assessment it was -0.1 (no agreement), on the third time it was -0.13 (no agreement). There was a decrease in the consistency and reliability of scores over time. The three applications performed in a reliable way to a certain extent without the exclusion of the creativity criterion, this goes against the common belief that AI software cannot assess creativity. Still, the results of the reliability measurements with the creativity criterion show that the assessment scores are not statistically significant, and there's a high probability that the observed agreement is due to random chance. Some limitations of this study were the small sample size, the limited number of criteria, and the lack of human raters for comparison. Future research could involve more participants, more criteria, more AI tools, and human raters to provide a more comprehensive and reliable evaluation of AI tools for assessing EFL story writing.

Read full abstract

See RECORDING. Current societal challenges demand more cooperation and a multidisciplinary approach from scientists. Putting into practice the shared ambition in Dutch academia requires a modernisation of the system of Recognition & Rewards. The modernisation should improve the quality of each of the key areas in academia: education, research, impact, leadership and (for university medical centres) patient care. Why do we think a change in recognition and rewards is needed? We see a mismatch between what we deem important in academic work and what we reward academic staff for. Careers depend heavily on research performance, which is measured by a limited number of criteria. We need a better balance in how we recognize and reward academics to help us achieve excellent education, research, impact, and leadership, as well as the highest level of patient care in our university medical centres. We want to make Room for everyone’s talent! In the Dutch Recognition & Rewards programme we work in cooperation with all Dutch research universities, university medical centres, reputable research institutes, the Royal Academy, and research funders. We aim for a healthy and inspiring environment for all our staff, where all talents are valued. In this presentation I will explain what we want to change in academic career assessment in the Netherlands, and how we hope to initiate the desired cultural change. This change is a fundamental change of beliefs, not just a change in the rules of the game. To achieve this, a broad dialogue in academia is needed. We think that sharing best practices and experimenting will initiate the desired movement. Every university and research organisation in the Netherlands has set up a high-level Recognition & Rewards committee. The various committees are working enthusiastically to stimulate the intended culture change at an institutional level. There is a great and inspiring diversity of approaches. Experimenting, inspiring, co-creating, sharing best practices, and learning from each other are central in the joint programme. At a national level, we also monitor the commonality and ensure connections between the developments at the universities, university medical centres, research institutes, and research funders. We are aware that the Netherlands is only a very small country. We cannot change academic career assessment on our own. We need to work together all over the world to change the recognition and rewards of academics. We hope you are curious, critical, and open about your concerns and become involved!

Read full abstract

Limited Number Of Criteria Research Articles

Articles published on Limited Number Of Criteria

Using Fleiss’ Kappa Coefficient to Measure the Intra and Inter-Rater Reliability of Three AI Software Programs in the Assessment of EFL Learners’ Story Writing

Making Room for Everyone’s Talent

A proposed multi criteria indexing and ranking model for documents and web pages on large scale data

Control as part of the management system of internal processes

Methodical approaches to choosing the sites for underground energy facilities

Patients with high left ventricular filling pressure may be missed applying 2016 echo guidelines: a pilot study.

Adaptive Routing Protocol for Lifetime Maximization in Multi-Constraint Wireless Sensor Networks

Using Partial Least Squares Structural Equation Modeling in Tourism Research

Selecting Natural Fibers for Bio-Based Materials with Conflicting Criteria

Combined Multi-criteria Evaluation Stage Technique as an Agro Waste Evaluation Indicator for Polymeric Composites: Date Palm Fibers as a Case Study

The Case For Yet <italic>Another</italic> Digital Preservation Evaluation Tool

Can we simplify the hospital accreditation process? Predicting accreditation decisions from a reduced dataset of focus priority standards and quality indicators: results of predictive modelling

Using ANP and AHP for the supplier selection in the construction and civil engineering companies; Case study of Iranian company

The Use of Partial Least Squares Structural Equation Modeling in Strategic Management Research: A Review of Past Practices and Recommendations for Future Applications

A Statistical Approach to the Validation and Optimisation of Geoheritage Assessment Procedures

Eurocode 8 Compliant Real Record Sets for Seismic Analysis of Structures

Definition of criteria for overall assessment of animal welfare

Biological and ecological characteristics of invasive species: a gammarid study

Scholarly use of internet‐based electronic resources

Scholarly use of Internet-based electronic resources

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Limited Number Of Criteria Research Articles

Articles published on Limited Number Of Criteria

Using Fleiss’ Kappa Coefficient to Measure the Intra and Inter-Rater Reliability of Three AI Software Programs in the Assessment of EFL Learners’ Story Writing

Making Room for Everyone’s Talent

A proposed multi criteria indexing and ranking model for documents and web pages on large scale data

Control as part of the management system of internal processes

Methodical approaches to choosing the sites for underground energy facilities

Patients with high left ventricular filling pressure may be missed applying 2016 echo guidelines: a pilot study.

Adaptive Routing Protocol for Lifetime Maximization in Multi-Constraint Wireless Sensor Networks

Using Partial Least Squares Structural Equation Modeling in Tourism Research

Selecting Natural Fibers for Bio-Based Materials with Conflicting Criteria

Combined Multi-criteria Evaluation Stage Technique as an Agro Waste Evaluation Indicator for Polymeric Composites: Date Palm Fibers as a Case Study

The Case For Yet &lt;italic&gt;Another&lt;/italic&gt; Digital Preservation Evaluation Tool

Can we simplify the hospital accreditation process? Predicting accreditation decisions from a reduced dataset of focus priority standards and quality indicators: results of predictive modelling

Using ANP and AHP for the supplier selection in the construction and civil engineering companies; Case study of Iranian company

The Use of Partial Least Squares Structural Equation Modeling in Strategic Management Research: A Review of Past Practices and Recommendations for Future Applications

A Statistical Approach to the Validation and Optimisation of Geoheritage Assessment Procedures

Eurocode 8 Compliant Real Record Sets for Seismic Analysis of Structures

Definition of criteria for overall assessment of animal welfare

Biological and ecological characteristics of invasive species: a gammarid study

Scholarly use of internet‐based electronic resources

Scholarly use of Internet-based electronic resources

The Case For Yet <italic>Another</italic> Digital Preservation Evaluation Tool