Abstract— Educational Data Mining (EDM) referes to the research designed to classify, analyze, and predict the students’ academic performance from the data collected from educational setting. Data collection and data processing are an important task in any research such as EDM. In this research article, data collection and data processing task are explained in detailed to build the model for predicting students’ performance and provide the recommendation in Educational Data Mining. In data collection step, we have collected the result ledgers in PDF form related to Four Year Computer Science and Engineering (CSE) course from university. The PDF ledgers for two academic years 2014-15 and 2015-16 of Four Years - First Year, Second Year, Third Year, and Final Year are downloaded from site http://www.sus.ac.in/examination/Online-Result-(Ledger) or https://su.digitaluniversity.ac/Content.aspx?ID=29445 to prepare the dataset to predict students’ performance in Educational Data Mining (EDM). In current study, Syllabus structure of Four Year course of Computer Science and Engineering, Credit system pattern, attributes required for preparing dataset, and types of assessment methods such as Types of assessment methods such as Theory + Practical, Theory + Practical + Practical Oral Exam (POE), Practical + POE, Practical + OE, Practical, Term Work, and Theory are explained in detailed. So original data downloaded from university site for two academic years 2014-15 and 2015-16 of Four Years CSE course from Sem-I to Sem-VIII is prepared with the help of Excel and contain approximately 10,616 students data with 544 number of attributes. For data processing, Microsoft Excel is used. Microsoft Excel features such as Text to Column – Delimited, Text to Column - Fixed width, Filter, and Conditional Formatting – Highlight Cells Rules – Text that contains – are considered for preparation of dataset. Also various functions such as SUM, IF, COUNTIF, MOD and % are employed for processing the data. After data processing step, final dataset for two academic years 2014-15 and 2015-16 from Sem-I to Sem-VIII consists of 6906 students data with 970 number of attributes.In addition to the data collection and processing, research gaps related to the dataset size, etc. are also identified and mentioned the same in this article. These two steps - data collection and processing discussed in detailed in this research article will help the researcher working in EDM to prepare the dataset to build the model so that more work in education sector related to students’ performance can be carried out to improve the teaching-learning process. Keywords—Data Collection, Data Processing, Microsoft Excel, Educational Data Mining (EDM)
Read full abstract