Human Expertise Outperforms Artificial Intelligence in Medical Education Assessments: MCQ Creation Highlights the Irreplaceable Role of Teachers

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

IntroductionMultiple-choice questions (MCQs) are vital tools for assessment in education because they allow for the direct measurement of various knowledge, skills, and competencies across a wide range of disciplines. While artificial intelligence (AI) holds promise as a supplementary tool in medical education, particularly for generating large volumes of practice questions, it cannot yet replace the nuanced and expert-driven process of question creation that human educators provide. This study seeks to close the gap, particularly with regard to difficulty index, discrimination index, and distractor efficiency.Materials and MethodsA total of 50 medical students received a set of fifty randomized, blinded, validated MCQs by human physiology experts. Of these, 25 were made by AI, and the remaining 25 were made by qualified, experienced professors. Using the item response theory (IRT) framework, we calculated key metrics like item reliability, difficulty index, discrimination index, and distractor functionality.ResultsThe results demonstrated that the difficulty index of AI-generated MCQs (mean = 0.62, SD = 0.14) was comparable to that of expert-generated questions, with no statistically significant difference observed (p = 0.45). However, significant differences emerged in other key quality metrics. The discrimination index, which reflects a question's ability to distinguish between high- and low-performing students, was notably higher for expert-created MCQs (Mean = 0.48, SD = 0.12) than for those generated by AI (Mean = 0.32, SD = 0.10), indicating a moderate-to-large effect (p = 0.0082, Chi-square = 11.7, df = 3). Similarly, distractor efficiency (DE), which evaluates the effectiveness of incorrect answer options, was significantly greater in expert-authored questions (Mean = 0.24, SD = 7.2) compared to AI-generated items (Mean = 0.4, SD = 8.1), with a moderate effect size (p = 0.0001, Chi-square = 26.2, df = 2). These findings suggest that while AI can replicate human-level difficulty, expert involvement remains crucial for ensuring high-quality discrimination and distractor performance in MCQ design.ConclusionThe findings suggest that AI holds promise, particularly in generating questions of appropriate difficulty, but human expertise remains essential in crafting high-quality assessments that effectively differentiate between levels of student performance and challenge students' critical thinking. As AI technology continues to evolve, ongoing research and careful implementation will be essential in ensuring that AI contributes positively to medical education.

Similar Papers
  • Research Article
  • Cite Count Icon 4
  • 10.18203/issn.2454-5929.ijohns20190779
Development of multiple choice question bank in otorhinolaryngology by item analysis: a cross-sectional study
  • Feb 23, 2019
  • International Journal of Otorhinolaryngology and Head and Neck Surgery
  • Ajeet Kumar Khilnani + 2 more

<p class="abstract"><strong>Background:</strong> Multiple choice questions (MCQs) are routinely used for formative and summative assessment in medical education. Item analysis is a process of post validation of MCQ tests, whereby items are analyzed for difficulty index, discrimination index and distractor efficiency, to obtain a range of items of varying difficulty and discrimination indices. This study was done to understand the process of item analysis and analyze MCQ test so that a valid and reliable MCQ bank in otorhinolaryngology is developed.</p><p class="abstract"><strong>Methods:</strong> 158 students of 7<sup>th</sup> Semester were given an 8 item MCQ test. Based on the marks achieved, the high achievers (top 33%, 52 students) and low achievers (bottom 33%, 52 students) were included in the study. The responses were tabulated in Microsoft Excel Sheet and analyzed for difficulty index, discrimination index and distractor efficiency. </p><p class="abstract"><strong>Results:</strong> The mean (SD) difficulty index (Diff-I) of 8 item test was 61.41% (11.81%). 5 items had a very good difficulty index (41% to 60%), while 3 items were easy (Diff-I >60%). There was no item with Diff-I <30%, i.e. a difficult item, in this test. The mean (SD) discrimination index (DI) of the test was 0.48 (0.15), and all items had very good discrimination indices of more than 0.25. Out of 24 distractors, 6 (25%) were non-functional distractors (NFDs). The mean (SD) distractor efficiency (DE) of the test was 74.62% (23.79%).</p><p class="abstract"><strong>Conclusions:</strong> Item analysis should be an integral and regular activity in each department so that a valid and reliable MCQ question bank is developed.</p>

  • PDF Download Icon
  • Research Article
  • 10.53555/jaz.v45i3.4138
Analysis of Difficulty Index, Discrimination Index and Distractor Efficiency in Multiple Choice Questions
  • Mar 8, 2024
  • Journal of Advanced Zoology
  • Sahar M Yakout + 7 more

Background: The item analysis of multiple-choice questions (MCQs) is an essential tool that can provide input on validity and reliability of items. It helps to identify items, which can be revised or discarded, thus building a quality MCQ bank. Aim: The aim of the study to evaluate the quality of MCQs by analyzing DIF, DI and DE and to find out the association of MCQs having good difficulty and discrimination indices with DE. Method: The study was conducted in the Department of Maternity and Child Health Nursing at the College of Nursing during the academic year 2023, second semester. Students of level six who took the final exam of one-course second-semester academic year 2023, A total of 152 third-year Bachelor of Nursing. The item analysis explored the difficulty index (DIF I) and discrimination index (DI) with distractor effectiveness (DE). Data was kept confidential and the descriptive and inferential statistics will be computed using SPSS 22.0. Results: A total of 50 MCQs, the majority, that is, 74 (82%) MCQs had a good/acceptable level of difficulty with a mean DIF I of 94.94 ± 6.36 (mean ± SD); this indicates that, on average, the MCQs were relatively easy for the students, as the difficulty index is close to 100. The discrimination index, which assesses the ability of the MCQs to differentiate between high and low-performing students, had a mean of 0.11 and a standard deviation of 0.47. This relatively low mean value suggests that the MCQs had limited discriminatory power, meaning they did not effectively distinguish between students with different levels of performance. 150 (75%) were categorized as functional distractors (FDs). These FDs effectively served their purpose by distracting students from selecting the correct answer, challenging their knowledge and understanding. On the other hand, 50 distractors (25%) were classified as non-functional distractors (NFDs). Conclusion: The categorization of the MCQs based on distractor efficiency highlights the importance of having well-constructed and effective distractors. The presence of functional distractors challenges students' decision-making, while non-functional distractors need to be revised or replaced to improve the overall quality of the MCQs. This finding indicates a potential area for improvement in the construction and selection of MCQs to enhance their ability to differentiate student abilities

  • Research Article
  • Cite Count Icon 1
  • 10.31489/2023ped2/114-118
The analysis of multiple-choice tests in a Professional English course at Astana IT University: difficulty, discrimination and distractor efficiency indexes
  • Jun 30, 2023
  • Bulletin of the Karaganda university Pedagogy series
  • Zh.K Tleshova

The objective of this study is to analyse the quality of multiple-choice questions by applying such evaluative tools as a difficulty index (DIF), a discrimination index (DI), and a distractor efficiency (DE). Our analysis was based on the results of quizzes conducted in Trimester III of 2020-2021 for the course of Professional English at Astana IT University. DIF, DI, and DE of midterm quiz results of first year students with different language levels were analysed using Microsoft Excel and Moodle LMS. This study will examine multiple-choice questions (MCQs) in terms of their effectiveness with recommendations for improvement. The main three research questions are: What is the difficulty index of midterm multiple-choice questions? What is the discrimination index of midterm MCQ quizzes? What is the distractor efficiency of midterm MCQ distractors? As the study showed, the difficulty index of MCQs was below average as only 43 % (n=13) reached an acceptable level. Therefore, 57 % (n=17) of the questions that fall under the category of “too easy” should be revised in the test. The discrimination index evaluation revealed 100 % efficiency. As for the distractor efficiency, the study results equalled 88 %, which was a more than satisfactory level. By applying the DIF, DE and DI tools on a regular basis MCQ quizzes and distractors might be improved, leading to an annual accumulation of a better-quality pool of questions for a more effective student assessment.

  • Research Article
  • Cite Count Icon 1
  • 10.37185/lns.1.1.53
Item Analysis of Multiple-Choice Questions at the Department of Community Medicine, Wah Medical College, Pakistan
  • Apr 14, 2020
  • Life and Science
  • Musarat Ramzan + 4 more


 
 
 Objective: The objective of the study was to assess the quality of multiple-choice questions (MCQs) of three different assessments in the subject of Community Medicine by computing the difficulty index, discrimination index and reliability and to estimate the relationship between difficulty and discrimination indices. Study Design: Retrospective observational study. Place and Duration of Study: Department of Community Medicine at Wah Medical College from August to December 2018. Materials and Methods: Three sets of MCQs were included in the study. Mean and standard deviation of difficulty and discrimination indices were calculated and one-way analysis of variance and Kruskal Wallis test were applied on difficulty and discrimination indices. The association was determined by Pearson correlation and considered significant at p value of < 0.05. Results: The mean difficulty index of first term, second term and send-up examination were 41.5, 48.8 and 51.9 respectively. Mean discrimination indices were 0.28, 0.27 and 0.26 and reliability were 0.83, 0.81 and 0.79. In the study, 72% MCQs of the first term, 61.5 % of the second term and 63% of the send-up examinations were in the range 30-70% of difficulty. There was a significant difference in the difficulty indices of the three question papers. The correlation between discrimination and difficulty indices was curvilinear and positively correlated. Conclusion: It is concluded that all three question papers have acceptable reliability, more than 65% MCQs have acceptable difficulty index and about 69% have good discriminatory power.
 
 

  • Research Article
  • Cite Count Icon 2
  • 10.1016/j.jmir.2025.101896
AI in radiography education: Evaluating multiple-choice questions difficulty and discrimination.
  • Jul 1, 2025
  • Journal of medical imaging and radiation sciences
  • Emre Emekli + 1 more

AI in radiography education: Evaluating multiple-choice questions difficulty and discrimination.

  • Research Article
  • Cite Count Icon 1
  • 10.18231/j.ijcbr.2019.072
Item analysis of biochemistry multiple choice questions - A retrospective Study & scope of improvement in method of assessment of MBBS students
  • Oct 15, 2019
  • International Journal of Clinical Biochemistry and Research
  • Janki Desai + 1 more

Item analysis of biochemistry multiple choice questions - A retrospective Study & scope of improvement in method of assessment of MBBS students - IJCBR- Print ISSN No: - 2394-6369 Online ISSN No:- 2394-6377 Article DOI No:- 10.18231/j.ijcbr.2019.072, International Journal of Clinical Biochemistry and Research-Int J Cli

  • Research Article
  • Cite Count Icon 19
  • 10.1186/s12909-024-05433-y
Item analysis: the impact of distractor efficiency on the difficulty index and discrimination power of multiple-choice items
  • Apr 24, 2024
  • BMC Medical Education
  • Assad Ali Rezigalla + 8 more

BackgroundDistractor efficiency (DE) of multiple-choice questions (MCQs) responses is a component of the psychometric analysis used by the examiners to evaluate the distractors’ credibility and functionality. This study was conducted to evaluate the impact of the DE on the difficulty and discrimination indices.MethodsThis cross-sectional study was conducted from April to June 2023. It utilizes the final exam of the Principles of Diseases Course with 45 s-year students. The exam consisted of 60 type A MCQs. Item analysis (IA) was generated to evaluate KR20, difficulty index (DIF), discrimination index (DIS), and distractor efficiency (DE). DIF was calculated as the percentage of examinees who scored the item correctly. DIS is an item’s ability to discriminate between higher and lower 27% of examinees. For DE, any distractor selected by less than 5% is considered nonfunctional, and items were classified according to the non-functional distractors. The correlation and significance of variance between DIF, DI, and DE were evaluated.ResultsThe total number of examinees was 45. The KR-20 of the exam was 0.91. The mean (M), and standard deviation (SD) of the DIF of the exam was 37.5(19.1), and the majority (69.5%) were of acceptable difficulty. The M (SD) of the DIS was 0.46 (0.22), which is excellent. Most items were excellent in discrimination (69.5%), only two were not discriminating (13.6%), and the rest were of acceptable power (16.9%). Items with excellent and good efficiency represent 37.3% each, while only 3.4% were of poor efficiency. The correlation between DE and DIF (p = 0.000, r= -0.548) indicates that items with efficient distractors (low number of NFD) are associated with those having a low difficulty index (difficult items) and vice versa. The correlation between DE and DIS is significantly negative (P = 0.0476, r=-0.259). In such a correlation, items with efficient distractors are associated with low-discriminating items.ConclusionsThere is a significant moderate negative correlation between DE and DIF (P = 0.00, r = -0.548) and a significant weak negative correlation between DE and DIS (P = 0.0476, r = -0.259). DIF has a non-significant negative correlation with DIS (P = 0.7124, r = -0.0492). DE impacts both DIF and DIS. Items with efficient distractors (low number of NFD) are associated with those having a low difficulty index (difficult items) and discriminating items. Improving the quality of DE will decrease the number of NFDs and result in items with acceptable levels of difficulty index and discrimination power.

  • Research Article
  • 10.1152/advan.00197.2025
Faculty versus artificial intelligence chatbot: a comparative analysis of multiple-choice question quality in physiology.
  • Dec 1, 2025
  • Advances in physiology education
  • Anupkumar D Dhanvijay + 8 more

Multiple-choice questions (MCQs) are widely used for assessment in medical education. While human-generated MCQs benefit from pedagogical insight, creating high-quality items is time intensive. With the advent of artificial intelligence (AI), tools like DeepSeek R1 offer potential for automated MCQ generation, though their educational validity remains uncertain. With this background, this study compared the psychometric quality of Physiology MCQs generated by faculty and an AI chatbot. A total of 200 MCQs were developed following the standard syllabus and question design guidelines: 100 by the Physiology faculty and 100 by the AI chatbot DeepSeek R1. Fifty questions from each group were randomly selected and administered to undergraduate medical students in 2 hours. Item analysis was conducted postassessment using difficulty index (DIFI), discrimination index (DI), and nonfunctional distractors (NFDs). Statistical comparisons were made using t tests or nonparametric equivalents, with significance at P < 0.05. Chatbot-generated MCQs had a significantly higher DIFI (0.64 ± 0.22) than faculty MCQs (0.47 ± 0.19; P < 0.0001). No significant difference in DI was found between the groups (P = 0.17). Faculty MCQs had significantly fewer NFDs (median 0) compared to chatbot MCQs (median 1; P = 0.0063). AI-generated MCQs demonstrated comparable discrimination ability but were generally easier and contained more ineffective distractors. While chatbots show promise in MCQ generation, further refinement is needed to improve distractor quality and item difficulty. AI can complement but not yet replace human expertise in assessment design.NEW & NOTEWORTHY This study contributes to the growing research on artificial intelligence (AI)- versus faculty-generated multiple-choice questions in Physiology. Psychometric analysis showed that AI-generated items were generally easier but had comparable discrimination ability to faculty-authored questions, while containing more nonfunctional distractors. By focusing on Physiology, this work offers discipline-specific insights and underscores both the potential and current limitations of AI in assessment development.

  • Research Article
  • 10.64455/xmma0012
Item Analysis of Single Best Answer Questions in Medical Genetic Module End Test for Medical Year One Students
  • Dec 22, 2025
  • Myanmar Medical Journal
  • The Mon La + 2 more

Background: The Module End Assessment is an important component of learning and assessment in modular system of our newly implemented outcome-based integrated curriculum, where single best answer (SBA) questions are used as key assessment tool. Method: This cross-sectional study was conducted to evaluate the quality of 60 single best answer (SBA) questions from the Genetics Module End Test administered to 353 Medical Year One students (Batch 5/2023). The analysis focused on key parameters including the discrimination index (DI), difficulty index (DFI), distracter efficiency (DE), and their relation to assess the overall effectiveness of test items in student performance. Results: Among 60 items, 18 (30%) had DI &lt; 0.20 and 42 (70%) had DI ≥ 0.20, indicating most items effectively discriminated high and low performers. According to DFI, the proportion of good (DFI 0.30 – 0.70), difficult (DFI &gt; 0.30), easy (DFI &gt; 0.70) items was 27 (45%), 1 (1.67%), and 32 (53.33%) respectively. A strong negative correlation (r = -0.6198, p &lt; 0.00001) was found between DFI and DI. In distractor analysis, 49 items had no non-functional distractors (NFDs). All 11 items with NFDs had DFI &gt; 0.70, and 10 items had DI &lt; 0.20. Conclusion: This study highlights that item analysis is the essential step to improve the quality of SBA used in assessment of medical education. Keywords: single best answer questions, difficulty index, discrimination index, distracter efficiency

  • Research Article
  • Cite Count Icon 75
  • 10.56026/imu.3.1.2
The Levels Of Difficulty And Discrimination Indices In Type A Multiple Choice Questions Of Pre-clinical Semester 1 Multidisciplinary Summative Tests
  • Jan 1, 2009
  • International e-Journal of Science, Medicine &amp; Education
  • Mitra N K

Item analysis is the process of collecting, summarizing and using information from students' responses to assess the quality of test items.Difficulty index (P) and Discrimination index (D) are two parameters which help evaluate the standard of MCQ questions used in an examination, with abnormal values indicating poor quality.In this study, 120 test items of 12 Type A MCQ tests of Foundation 1 multi-disciplinary summative assessment from M2 / 2003 to M2 / 2006 cohorts of International Medical University were selected and their P-scores in percent and D-scores were estimated using Microsoft Office Excel.The relationship between the item difficulty index and discrimination index for each test item was determined by Pearson correlation analysis using SPSS 11.5.Mean difficulty index scores of the individual summative tests were in the range of 64% to 89%.One-third of total test items crossed the difficulty index of 80% indicating that those items were easy for the students.Sixty seven percent of the test items showed acceptable (> 0.2) discrimination index.Forty five out of 120 test items showed excellent discrimination index.Discrimination index correlated poorly with difficulty index (r = -0.325).In conclusion, a consistent level of test difficulty and discrimination indices was maintained from 2003 to 2006 in all the twelve summative type A MCQ tests.

  • Research Article
  • Cite Count Icon 2
  • 10.18203/2319-2003.ijbcp20183004
Item analysis to identify quality multiple choice questions/items in an assessment in Pharmacology of II MBBS students in Guntur Medical College of Andhra Pradesh, India
  • Jul 23, 2018
  • International Journal of Basic &amp; Clinical Pharmacology
  • Vijaya K Suryadevara + 1 more

Background: In medical education, multiple choice questions/Items are the more frequently used assessment tools to assess the knowledge abilities and skills of medical students, for being their objectivity, wide coverage in less time. However only the Quality Items gives a valid and reliable assessment. The quality of an Item is determined by difficulty index (DIF I), Discrimination Index (DI) and Distractor efficiency (DE). Aim of the study was to know the quality of Items in pharmacology by Item analysis and to develop a MCQs bank with quality Items.Methods: The present study was conducted on 150 II MBBS students of Guntur Medical College, AP, India. A class test containing 50 Items with 150 distractors from topic chemotherapy was conducted. Item with the correct choice/response was awarded with one mark and with the wrong choice zero marks, no negative marks. Each test Item was analysed with DIF I, DI and DE and the results were tabulated and tested statistically, with unpaired "t" test.Results: Mean DIF I, DI, DE values with standard deviations in the present study are 44.72+17.63%, 0.30+0.12%, 84.48+24.65 respectively. DIF I of 32 (64%) items was good to excellent range (31%-60%) 9 (18%) Items were easy (&gt;61%) and 9(18%) Items were difficult (&gt;30%). DI of 10 (20%) Items was good (0.15 to 0.24.) 29 (58%) Items were excellent with DI &gt; 0.25 and 11 (22%) Items were poor with DI &lt;0.15. Among 150 distractors, 127 (85%) were functional distractors (FDs) and 23 (15%) were non-functional distractors (NFDs). DE of 33 (66%) items with nil NFDs was 100%, for 12 (24%) Items with one NFD, was 66.6%, for 4 (8%) items with 2 NFDs was 33.3% and for 1 (2%) Item with 3NFDs DE was 0%. When unpaired "t" test was applied to the means of "difficult" and "easy" Items, 96.22+11.33% SD, 51.44+29.31% SD respectively, the p-value obtained was 0.00058, which was highly significant.Conclusions: The study showed that Item analysis is a valid tool to identify quality Items, which assess, the students’ knowledge abilities and discriminate different levels of performance abilities of students effectively.

  • Research Article
  • Cite Count Icon 1
  • 10.33193/jeahs.20.2023.325
Analysis of MCQs in Summative Exam in English: Difficulty Index, Discrimination Index and Relationship between them
  • Feb 22, 2023
  • Journal of Educational and Human Sciences
  • Rabab Mohammad Alareifi

This paper aims to assess the quality of a summative test items to improve its ability to measure students' knowledge acquisition. This test was used in the English subject for11th grade students. This study was administered at a Western district secondary school in Saudi Arabia. The test consisted of &nbsp;22 multiple-choice questions used to collect data from 94 students randomly. The Kuder-Richardson Formula 20 (KR-20) was used for test items, to determine the internal consistency reliability, which reaches a good reliability of α = 0.70. Difficulty and discrimination indices were used as well to evaluate the quality of the test. In addition, the relationships between difficulty and discrimination indices are measured. The difficulty index analysis showed that 50% of the items are in the average level, while the rest of the items fluctuate among too difficult, moderately difficult, and too easy levels. Moreover, the difficulty index analysis showed that 45.0% of the items are in the good level, while the other items ranged differently among poor, acceptable, and excellent levels. The Pearson correlation coefficient (r) to estimate the relationship between the difficulty index and the discrimination index has a value of (-0.936), which indicates that there is a statistically significant relationship at the level (α≤0.05) between the difficulty index and discrimination index of the multiple-choice question summative test. To enhance the quality of this test, to better assess students' knowledge acquisition, this study recommends that items with too difficult and too easy levels of difficulty index, and items with poor discrimination index are to be reviewed and modified by English experts. Moreover, reevaluation of the content validity by an English teacher could further improve its quality as well.

  • Research Article
  • Cite Count Icon 4
  • 10.17496/kmer.2012.14.2.102
Development and Validation of a Clinical Critical Thinking Skills Scale
  • Oct 31, 2012
  • Korean Medical Education Review
  • Su-Jin Shin + 3 more

This study developed a test entitled Clinical Critical Thinking Skills (CCTS) by using item response theory (IRT) and examined its validity. In the development stage, data obtained from a convenience sample of 627 undergraduate nursing students were analyzed using a discrimination and difficulty index with the TestAn 1.0 program. In the evaluation stage, data from a convenience sample of 284 nursing and non-nursing students were analyzed using a discrimination and difficulty index with the BILOG-MG program. Criterion validity was verified by the group comparison method. Five items received low discrimination index scores according to the IRT results. In terms of the criterion validity, CCTS scores differed according to major (t = 2.21, p = 0.028), location of high school (F = 4.35, p = 0.014), clinical experience (t = 5.66, p = 0.004), and grade point average (F = 7.17, p &lt; 0.001). The CCTS can be used to measure critical thinking skills in the clinical setting.

  • Research Article
  • Cite Count Icon 1
  • 10.53350/pjmhs221621120
Item Analysis of Multiple Choice Questions of Anatomy at Aziz Fatimah Medical and Dental College, Faisalabad
  • Feb 26, 2022
  • Pakistan Journal of Medical and Health Sciences
  • Quddus Ur Rehman + 5 more

Objective: The aim of our study was to evaluate MCQs in send up exam of 2nd year MBBS. To discard or change poor items with low discriminatory index, very easy and very difficult items and items with more than one non functional distracters. Study design: The cross-sectional study. Place: Department of Anatomy, Aziz Fatimah Medical and Dental College. Faisalabad. Duration of study: One month (December 2021) Methodology: Total 45 MCQs for the subject of Anatomy were constructed for 100 students of 2nd year MBBS send up Examination 2021. The MCQs consisted of one statement with four distractor &amp; one correct option. Data of MCQs attempted by the students was analyzed by using Microsoft Excel 2010. The difficulty index, discrimination index and distractor efficiency were the parameters used for analysis. Results: Total 180 distractors were evaluated. The average number for difficulty index was 55.57 while that for discrimination index was 0.44. Out of 180 distractors, 41 nonfunctional distractors were seen in 26 items. Conclusion: Most MCQs fall within an average to easy difficulty index level i.e. 30 out of 45 MCQs. Most MCQs fulfill the criteria of very good discrimination index i.e. 27 out of 45. Most MCQs have either nil, or one or two non-functional distractors. i.e. 43 out of 45. Key Words: MCQs, Difficulty index, Nonfunctional distractors, Discrimination index, Distractor efficiency.

  • Research Article
  • 10.5455/njppp.2023.13.09458202304102023
Item analysis of multiple-choice questions in pharmacology among medical undergraduates
  • Jan 1, 2023
  • National Journal of Physiology, Pharmacy and Pharmacology
  • Mangala Srinivas + 3 more

Background: Assessment plays an essential role in the evaluation of learning, and multiple-choice questions (MCQs) are one of the components of examinations. Item analysis enables identifying good MCQs based on difficulty index (DIF I), discrimination index (DI), and distractor effectiveness (DE). The assessment of learning will be more meaningful with routine item analysis. Aims and Objectives: The item analysis helps us to assess the validity and reliability of MCQs and to generate a valid MCQ question bank for future use. Materials and Methods: A set of 30 single best response type MCQs or items was used for assessment in 155 phase II MBBS students at a medical college in Mangalore. Each item was pre-validated before and later analyzed for its DIF I, DI, and DE. Results: In the study, the means of DIF I were 55.91 ± 18.48%, DI, was 0.35 ± 0.16%, and distractor efficiency was 72.19 ± 29.15%. Out of 30 items, 22 had a “good to acceptable” level of DIF I, and 18 items had “good to excellent” discrimination power. 81% were functional distractors among 120 distractors. A correlation r = 0.1968 indicates a positive correlation between DIF I and DI. Conclusion: The majority of the MCQs in this study fell within the acceptable range when measured against the three factors analyzed. This suggests that the key to developing valid and reliable MCQs is possible only when regular item analysis is done for all MCQ-based tests.

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.