Woldekidan Gudelo Dike, Mohammed Abebe Yimer and Tekle Dergaso Degu
Adv. Know. Base. Syst. Data Sci. Cyber., 2 (2):276-290
Woldekidan Gudelo Dike : Department of Computer Science, Dilla University, Dilla, Ethiopia.
Mohammed Abebe Yimer : Faculty of Computing, Arba Minch University, Arba Minch, Ethiopia
Tekle Dergaso Degu : Department of Computer Science, Wolaita Sodo University, Sodo, Ethiopia.
DOI: https://dx.doi.org/10.54364/cybersecurityjournal.2025.1114
Article History: Received on: 19-Jun-25, Accepted on: 26-Jul-25, Published on: 02-Aug-25
Corresponding Author: Woldekidan Gudelo Dike
Email: wolde0916@gmail.com
Citation: Woldekidan Gudelo Dike (2025). Boosting Agricultural Outcomes with a Hybrid Ensemble Model for Crop Prediction under Climatic and Soil Variability. Adv. Know. Base. Syst. Data Sci. Cyber., 2 (2 ):276-290
Agriculture remains dominant in Ethiopia’s economy; however, even now there is widespread food insecurity due to consistently low crop yields. A major factor is the challenge of selecting crops that match with the heterogeneous soil and climatic conditions across regions. Even if farmers have valuable indigenous knowledge, decisions often lack accuracy due to the absence of scalable, data-driven decision support systems. Manual mapping of cropsoil-climate is labour-intensive and increasingly difficult, highlighting the importance of intelligent decision-support tools. Thus, we propose a hybrid ensemble learning model that combines predictions from state-of-the-art machine learning algorithms to predict optimal crops based on soil and meteorological data. The main architecture constitutes stacking, blending, and voting classifiers, alongside an artificial neural network (ANN). Base learners include support vector machine (SVM), Gaussian Naive Bayes (GaussianNB), decision tree, extra trees, k-nearest neighbours (KNNs), random forest (RF) classifiers, and using the logistic regression (LR) model as a blender. To prevent model overfitting and enhance model generalization and interpretability, the pipeline includes Principal Component Analysis (PCA), feature importance analysis, and hyperparameter optimization via grid search and random search, and each model is validated through K-Fold cross-validation. In addition to predictive performance analysis, this work focuses on model transparency and trust. To achieve these goals, SHAP (SHapley Additive exPlanations) and Permutation Importance are employed to measure feature contributions, revealing how soil pH, nitrogen content, phosphorous, rainfall, altitude, soil type, humidity, temperature, etc. influence crop predictions. These tools help to transform the model from a black box into a transparent system that stakeholders can trust and understand. Feature selection techniques, parameter optimization methods, and model training and validation were conducted on datasets from the Arba Minch Agricultural Research Centre and the Gamo Zone Agricultural Directive Office.