The Role of Machine Learning in Identifying Students At-Risk and Minimizing Failure

Creative Commons License

Pek R. Z., Ozyer S. T., Elhage T., Ozyer T., ALHAJJ R.

IEEE Access, vol.11, pp.1224-1243, 2023 (SCI-Expanded) identifier identifier

  • Publication Type: Article / Article
  • Volume: 11
  • Publication Date: 2023
  • Doi Number: 10.1109/access.2022.3232984
  • Journal Name: IEEE Access
  • Journal Indexes: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Compendex, INSPEC, Directory of Open Access Journals
  • Page Numbers: pp.1224-1243
  • Keywords: Predictive models, Data models, Machine learning, Stacking, Machine learning algorithms, Prediction algorithms, Data mining, At-risk students, classification, dropout prediction, hybrid model, machine learning techniques, stacking ensemble model, student performance prediction
  • Istanbul Medipol University Affiliated: Yes


Education is very important for students' future success. The performance of students can be supported by the extra assignments and projects given by the instructors for students with low performance. However, a major problem is that students at-risk cannot be identified early. This situation is being investigated by various researchers using Machine Learning techniques. Machine learning is used in a variety of areas and has also begun to be used to identify students at-risk early and to provide support by instructors. This research paper discusses the performance results found using Machine learning algorithms to identify at-risk students and minimize student failure. The main purpose of this project is to create a hybrid model using the ensemble stacking method and to predict at-risk students using this model. We used machine learning algorithms such as Naïve Bayes, Random Forest, Decision Tree, K-Nearest Neighbors, Support Vector Machine, AdaBoost Classifier and Logistic Regression in this project. The performance of each machine learning algorithm presented in the project was measured with various metrics. Thus, the hybrid model by combining algorithms that give the best prediction results is presented in this study. The data set containing the demographic and academic information of the students was used to train and test the model. In addition, a web application developed for the effective use of the hybrid model and for obtaining prediction results is presented in the report. In the proposed method, it has been realized that stratified k-fold cross validation and hyperparameter optimization techniques increased the performance of the models. The hybrid ensemble model was tested with a combination of two different datasets to understand the importance of the data features. In first combination, the accuracy of the hybrid model was obtained as 94.8% by using both demographic and academic data. In the second combination, when only academic data was used, the accuracy of the hybrid model increased to 98.4%. This study focuses on predicting the performance of at-risk students early. Thus, teachers will be able to provide extra assistance to students with low performance.