IMPACT OF STACKING ENSEMBLE DEPTH ON GENERALIZATION ABILITY OF ACADEMIC PERFORMANCE PREDICTION MODELS

PYLYPENKO PYLYPENKO

doi:10.30857/2786-5371.2026.1.7

Authors

PYLYPENKO PYLYPENKO Kyiv National University of Technologies and Design, Ukraine

DOI:

https://doi.org/10.30857/2786-5371.2026.1.7

Keywords:

stacking ensemble, ensemble depth, generalization ability, academic performance prediction, machine learning, ensemble methods, Python

Abstract

Purpose. The research is aimed at a comprehensive analysis of the impact of stacking ensemble depth on the generalization ability of academic performance prediction models and determining the optimal stacking depth to achieve maximum performance and reliability of predictions. The goal of the work is to develop a methodology for assessing the relationship between stacking ensemble depth and model generalization metrics, as well as determining recommendations for selecting optimal ensemble architecture for academic performance prediction tasks.

Methodology. The research methodology is based on experimental analysis of the performance of stacking ensembles of different depths (from 1 to 5 levels) for predicting student academic performance. Base models include logistic regression, Random Forest, Gradient Boosting, Support Vector Machine, and neural networks. Generalization ability assessment is performed using accuracy, F1-score, AUC-ROC, and coefficient of determination metrics on independent test samples. Stratified cross-validation is applied to assess result stability and analyze the impact of stacking depth on model variance and bias.

Findings. Experimental results demonstrate a non-trivial relationship between stacking ensemble depth and model generalization ability. For single-level stacking (depth 1), generalization ability is 0.82 by F1-score metric, for two-level stacking (depth 2) – 0.87, for three-level (depth 3) – 0.89, for four-level (depth 4) – 0.88, for five-level (depth 5) – 0.86. The optimal stacking depth is identified at level 3, where maximum generalization ability is achieved without significant increase in model complexity. At depths greater than 3 levels, a decrease in generalization ability is observed due to error accumulation and meta-model overfitting. It is established that stacking depth affects the balance between model bias and variance, with optimal depth ensuring minimum generalization error.

Originality. A comprehensive assessing classification model stability depending on training sample size is developed, including theoretical analysis of the relationship between sample size and variance component of generalization error, empirical methods for determining saturation points, and comparative analysis of the effectiveness of different stability improvement methods. The impact of class imbalance and feature space dimensionality on the relationship between sample size and model stability is systematically investigated for the first time. A classification of models by dependence on training sample size is developed, taking into account algorithm type, model complexity, and data nature..

Practical value. The obtained results allow justifying the choice of optimal stacking ensemble depth for academic performance prediction tasks, ensuring high prediction accuracy with minimal model complexity. The developed recommendations can be applied in educational process management systems, early detection systems for at-risk students, and adaptive educational platforms. Determining optimal stacking depth allows optimizing the use of computational resources and ensuring high prediction reliability in practical applications.

Downloads

Download data is not yet available.

Author Biography

PYLYPENKO PYLYPENKO, Kyiv National University of Technologies and Design, Ukraine

https://orcid.org/0000-0002-2761-4817

Scopus Author ID: 58089336700

References

Aria, M., Cuccurullo, C., & Gnasso, A. (2021). A comparison among interpretative proposals for Random Forests. Machine Learning with Applications, 6, 100094. DOI: https://doi.org/10.1016/j.mlwa.2021.100094.

Chen, T., & Guestrin, C. (2016, August). Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining (pp. 785–794).

Dey, R., & Mathur, R. (2023, May). Ensemble learning method using stacking with base learner, a comparison. In International conference on data analytics and insights (pp. 159–169). Singapore: Springer Nature Singapore.

Friedman, J. et al. (2021). Package ‘glmnet’. CRAN R Repositary, (595), 874.

Hastie, T., Tibshirani, R., & Wainwright, M. (2015). Statistical learning with sparsity. Monographs on statistics and applied probability, 143(143), 8. DOI: https://doi.org/10.1201/b18401.

James, G., Witten, D., Hastie, T., & Tibshirani, R. (2021). Linear model selection and regularization. In An introduction to statistical learning: with applications in R (pp. 225–288). New York, NY: Springer US.

Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., ... & Liu, T. Y. (2017). Lightgbm: A highly efficient gradient boosting decision tree. Advances in neural information processing systems, 30.

Kuhn, M., & Johnson, K. (2019). Feature engineering and selection: A practical approach for predictive models. Chapman and Hall/CRC.

Odegua, R. (2019, March). An empirical study of ensemble techniques (bagging, boosting and stacking). In Proc. conf.: deep learn. indabaXAt. sn.

Hao, J., & Ho, T. K. (2019). Machine learning made easy: a review of scikit-learn package in python programming language. Journal of educational and behavioral statistics, 44(3), 348–361. DOI: https://doi.org/10.3102/1076998619832248.

Probst, P., Boulesteix, A. L., & Bischl, B. (2019). Tunability: Importance of hyperparameters of machine learning algorithms. Journal of Machine Learning Research, 20(53), 1–32. DOI: https://doi.org/10.48550/arXiv.1802.09596.

Sweeney, M., Rangwala, H., Lester, J., & Johri, A. (2016). Next-term student performance prediction: A recommender systems approach. arXiv preprint arXiv:1604.01840. DOI: https://doi.org/10.48550/arXiv.1604.01840.

Zheng, J. (2025). Secret of review timing: The interaction of personality, emotion, and topics in response time (Doctoral dissertation, Iowa State University).

Zhang, H., Dai, Y., Li, H., & Koniusz, P. (2019). Deep stacked hierarchical multi-patch network for image deblurring. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 5978–5986).

Zhou, Z. H. (2025). Ensemble methods: foundations and algorithms. Chapman and Hall/CRC.

IMPACT OF STACKING ENSEMBLE DEPTH ON GENERALIZATION ABILITY OF ACADEMIC PERFORMANCE PREDICTION MODELS

Authors

DOI:

Keywords:

Abstract

Downloads

Author Biography

PYLYPENKO PYLYPENKO, Kyiv National University of Technologies and Design, Ukraine

References

Downloads

Published

How to Cite

Issue

Section

License

Language