Next Article in Journal
UAV-Based Photogrammetry and Infrared Thermography Applied to Rock Mass Survey for Geomechanical Purposes
Previous Article in Journal
Impact of High-Cadence Earth Observation in Maize Crop Phenology Classification
Previous Article in Special Issue
Using Sentinel-2 for Simplifying Soil Sampling and Mapping: Two Case Studies in Umbria, Italy
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Comparison of Model Averaging Techniques to Predict the Spatial Distribution of Soil Properties

1
Department of Geosciences, Soil Science and Geomorphology, University of Tübingen, 72070 Tübingen, Germany
2
CRC 1070 Resource Cultures, University of Tübingen, 72074 Tübingen, Germany
3
DFG Cluster of Excellence “Machine Learning”, University of Tübingen, 72070 Tübingen, Germany
4
Department of Soil Science, College of Agriculture, Isfahan University of Technology, Isfahan 8415683111, Iran
5
Henan Key Laboratory of Earth System Observation and Modeling, Henan University, Kaifeng 475004, China
6
College of Geography and Environmental Science, Henan University, Kaifeng 475004, China
7
Department of Plant, Food, and Environmental Sciences, Faculty of Agriculture, Dalhousie University, Truro, NS B2N 5E3, Canada
*
Author to whom correspondence should be addressed.
Remote Sens. 2022, 14(3), 472; https://doi.org/10.3390/rs14030472
Submission received: 9 November 2021 / Revised: 10 January 2022 / Accepted: 15 January 2022 / Published: 19 January 2022
(This article belongs to the Special Issue Remote Sensing of Soil Properties)

Abstract

:
This study tested and evaluated a suite of nine individual base learners and seven model averaging techniques for predicting the spatial distribution of soil properties in central Iran. Based on the nested-cross validation approach, the results showed that the artificial neural network and Random Forest base learners were the most effective in predicting soil organic matter and electrical conductivity, respectively. However, all seven model averaging techniques performed better than the base learners. For example, the Granger–Ramanathan averaging approach resulted in the highest prediction accuracy for soil organic matter, while the Bayesian model averaging approach was most effective in predicting sand content. These results indicate that the model averaging approaches could improve the predictive accuracy for soil properties. The resulting maps, produced at a 30 m spatial resolution, can be used as valuable baseline information for managing environmental resources more effectively.

Graphical Abstract

1. Introduction

In recent years, rapid population growth and the increasing demand for food have had undesirable consequences on the environment. These consequences include, but are not limited to, land degradation, desertification, water pollution, and soil pollution. Therefore, there is a need to explore and recognize the factors related to sustainable agriculture and soil and water resources management. Hence, one of the most basic pieces of information related to land resource management includes maps of soil properties [1]. Soil properties vary both temporally and spatially and from small- to large-scale, and are affected by environmental characteristics, such as topography, and soil management practices, such as fertilization and agronomic practices [2].
In Iran, where 85% of the country is arid or semiarid [3], the intrinsic properties of soil, such as SOM, CCE, gypsum content, soil texture, electrical conductivity (EC), soil pH, and soil reactivity have been shown to be related to soil quality and are commonly considered the main factors in soil quality assessments [4]. However, these properties are highly variable in space and time [5]—especially in agricultural systems, due to the processes related to soil redistribution and agriculture practices. Lastly, parts of the region are challenged by data scarcity where soil information lacks detail or is not available.
Because soil information is essential, digital soil mapping (DSM) has been an area of research over the past few decades [1]. Whereas traditional soil mapping methods were time-consuming and expensive to carry out [6], DSM techniques can overcome these limitations by integrating soil information and environmental variables obtained from remote sensing and other geospatial datasets [1,7,8]. DSM approaches operate by establishing correlations between a set of environmental covariates and soil properties of georeferenced sample points in the study area. The resulting predictive models are then applied to unsampled locations. Until recently, most DSM studies have been carried out in easily accessible regions in Iran to predict a variety of soil properties; for instance, pH, EC, soil organic matter (SOM), phosphorus, particle size distribution, and calcium carbonate equivalent (CCE) [8,9,10]. However, few studies have investigated the spatial soil properties in regions with limited soil data or in difficult-to-access areas.
Machine learning (ML) techniques have increasingly been compared for identifying the best performing model for predicting soil variability [11]. Of the many ML algorithms currently used in DSM, studies have included the use of multiple linear regression [12], logistic regression [13], Random Forests [14,15,16,17], classification trees [18], support vector machines [17,19], and artificial neural networks [20]. However, with increasing computational power, more sophisticated and complex algorithms, such as convolutional neural networks, which are based on data-hungry, deep learning approaches, have been used to solve highly complex soil-landscape problems and to improve the prediction accuracy and decrease the uncertainty of digital soil maps [21,22,23,24].
An approach to improving the predictive capability and decreasing the variance of ML models is through model averaging [25]. Model averaging is a technique where multiple individual learners (i.e., base learners) are trained and combined to solve the same problem. This technique assumes that each base learner will have its own strengths and weaknesses and compile a final model with the strengths of the individual models. As a result, model averaging techniques are expected to produce predictions with similar or better accuracy when compared to their individual constituents. In addition to the increased accuracy, model averaging has the potential to improve the reliability, stability, and robustness of models [26]. These techniques have recently gained attention in environmental sciences, atmospheric sciences, and statistics literature for predicting and solving highly complex problems [21].
A few DSM studies have demonstrated the effectiveness of model averaging for predicting various soil properties, such as available soil water, soil organic carbon, soil texture, and soil pH [7]; hydrologic properties [25]; and soil classes [27]. However, to the best of our knowledge, there are no DSM studies that have performed a comprehensive comparison of model averaging techniques; hence, providing the impetus for this study.
Given the need for detailed soil information for the arid, remote, and data-scarce regions of Iran, this study aimed to compare and evaluate methods for producing maps of soil properties using ML and model averaging techniques. The specific objectives were as follows: (1) to investigate and compare the use of different single-model learners, such as support vector regression (SVR), k-nearest neighbor (kNN), artificial neural network (ANN), deep neural network (DNN), Random Forest (RF), adaptive network-based fuzzy inference system (ANFIS), and extreme gradient boosting (XGB); and (2) to compare the single-model learners with several model averaging techniques, such as Bates–Granger averaging (BGA), equal weights averaging (XBEWA), Bayesian information criterion (BIC), Akaike’s information criterion (AIC), Bayesian model averaging (BMA), Granger–Ramanathan averaging (GRA), and Mallows model averaging (MMA).

2. Materials and Methods

2.1. Study Area and Soil Sampling

The research area is 110,000 km2 and is located in the central Iranian province of Isfahan (Figure 1). The elevation varies from 700 to 2600 m above mean sea level. The mean annual precipitation and temperature are 117 mm and 25 °C, respectively. According to the geology map, quaternary sediments cover a considerable portion of the Isfahan province. Sedimentary rocks, such as limestones, sandstones, conglomerate, and shale, are common in the southern and western regions of the study area [28].
A total of 251 topsoil samples were collected at 0–20 cm depth increments, utilizing a stratified random sampling with 20 × 20 km stratification blocks. Soil samples within the blocks were selected based on parent material by taking one sample from the dominant parent material within each grid cell. This ensured that sedimentary, volcanic, and metamorphic rocks were represented. Each sample consisted of five subsamples, which were randomly collected from within a 20 × 20 m (400 m2) area in each grid. The geographical distribution of sample locations within the study area is shown in Figure 1. Soil samples were air-dried and sieved using a 2 mm sieve. Using a 2:1 water to soil ratio extract, the soil pH [29] and electrical conductivity (EC) [30] were measured. In addition, the SOM content (wet combustion method) [31], CCE (titration method) [32], particle size distribution (hydrometer method) [33], the gypsum content (oven-drying method) were also measured [34].

2.2. Environmental Covariates

A digital elevation model (DEM; Figure 2) with a 30 m spatial resolution was used to calculate terrain attributes [35], such as elevation, catchment aspect, catchment slope, catchment area, topographic openness, profile curvature, topographic wetness index, and planform curvature (Table 1). Additionally, Landsat 8 Operational Land Imager (OLI) data were taken during the summer of 2012. After performing geometric and radiometric corrections on the Landsat images, the median values of bands were used to derive a suite of remote sensing covariates, such as brightness index, salinity index, gypsum index, carbonate index, and clay index (Table 1). Lastly, the mean annual temperature and mean annual precipitation (Figure 2) were calculated from the monthly precipitation and temperature values.

2.3. Variable Importance Analysis Using a Genetic Algorithm

Many environmental covariates are often employed in DSM studies, thus making it difficult to understand the correlations between soils and the environment due to a large number of covariates. Brungard et al. [44] recommended that applying fewer covariates could benefit and improve the efficiency of the model process. To overcome this issue, genetic algorithms (GA) have been used to determine the optimal subsets of covariates to create simpler models while maintaining model performance [9,45]. Genetic algorithms are biologically inspired computational models based on evolutionary processes, such as selection, crossover, and mutation, and are designed to search for functions that best fit the experimental data set [45]. Here, a GA was applied to select the most important variables for each soil property using the caret package (version 6.0–90) in R [46].

2.4. Base Learners

Nine base learners were tested in establishing the relationships between the environmental covariates and target variables. These models included k-nearest neighbor (kNN), genetic programming (GP), support vector regression (SVR), least absolute shrinkage and selection operator (LASSO), artificial neural network (ANN), deep neural network (DNN), Random Forest (RF), adaptive network-based fuzzy inference system (ANFIS), extreme gradient boosting (XGB). Modeling was implemented using the caret package (version 6.0–90) [47] in R 3.2.5 [46] and RStudio (version 0.99.903) [48].

2.5. Model Averaging Techniques

Seven model averaging techniques were tested: Akaike’s information criterion, equal weights averaging, Bates–Granger averaging, Bayes’ information criterion, Mallows model averaging, Granger–Ramanathan averaging, and Bayesian model averaging. Here, we summarize each approach and refer readers to the references for full descriptions of each model averaging technique.
In the equal weights averaging (EWA) method, the final prediction is obtained by assigning the same weight to each model. In effect, this would be the mean predicted value amongst all base learners.
Bates–Granger averaging (BGA) technique was proposed by Bates and Granger [49]. In the BGA technique, each model is weighed by 1/σi2, where σi2 is the prediction variance.
In the information criterion averaging techniques (AIC and BIC), weights are calculated using the following equations [50]:
β ^ = e x p ( I i 2 ) j = 1 k e x p ( I i 2 )  
where Ii is an information criterion (the fit of the model), where
I i = 2 l o g ( L i ) + q ( p i )
and Li is the (maximized) likelihood of model i, and q(pi) is a penalty for increasing the number of parameters, pi, which needs to be estimated for model i. In the AIC averaging technique, the penalty, q(p), is 2p, while the penalty for the BIC averaging technique is q(p) = plog(n), where n is the training sample size.
Hoeting et al. [51] first proposed the Bayesian model averaging (BMA) technique, which assigns a conditional probability density function to each model prediction. Raftery et al. [52] provide an excellent overview of the theoretical background behind the different BMA techniques.
Claeskens and Hjort [53] and Hjort and Claeskens [54] proposed the Mallows model averaging (MMA) technique and concluded that there is no best model; instead, an appropriate model should depend on the objective. In the following equation:
C n ( β ) = t = 1 n ( Y t β X t ) 2 + 2 j = 1 k β j p j S 2
pj is the number of parameters of model j, and S2 is an estimate of the variance, σ2, of εt. In this study, S2 was taken to be the smallest observed RMSE for any individual model among the set of models.
Granger and Ramanathan [55] first proposed the Granger–Ramanathan (GR) approach. It assumes that the final prediction is calculated from a combination of different model predictions using an ordinary least squares method.

2.6. Accuracy Assessment and Uncertainty Analysis

The dataset was randomly split into 70% (n = 170) and 30% (n = 81) for model training and testing, respectively. Leave-one-out cross-validation was also used to tune the hyperparameters of models using the 70% training dataset. The coefficient of determination (R2), mean absolute error (MAE), the root mean squared error (RMSE), and the normalized root mean squared error (nRMSE) were used to assess model performance:
R 2 = 1 i = 1 n ( y i y ^ i ) 2 i = 1 n ( y i y ¯ i ) 2
RMSE = 1 n i = 1 n ( y i y ^ i ) 2
MAE = i = 1 n | y ^ i y i | n
nRMSE = RMSE   X ¯
where in Equations (4)–(7), y is the measured value, y ^ is the predicted value, n is the number of observations, and X ¯ is the average of observed values.
To assess the uncertainty of the models, a leave-one-out cross-validation method was used. This method resulted in 170 predicted soil property maps. Based on the predicted maps, the mean and standard deviation (SD) of soil properties for each pixel were calculated. Given a confidence level of 90%, the upper and lower boundary of the predictions (i.e., prediction interval) were calculated (mean ± 1.64 SD). Finally, the proportion of measured soil values that fell within the 90% prediction interval (i.e., prediction interval coverage probability; PICP) and mean prediction interval (MPI: upper prediction limit minus the lower prediction limit) were calculated as two measures of the quality of the uncertainty estimates.

3. Results and Discussion

3.1. Descriptive Statistics of Soil Properties

Descriptive statistics of soil properties are presented in Table 2. The SOM, CCE, gypsum, silt, sand, and EC values varied widely in the study area; for example, the CCE ranged from 0.2% to 80.0%, with a mean value of 27.8%. Due to the limestone-enriched parent materials, most soils are highly calcareous throughout the region [28], and because of the low precipitation in arid and semiarid regions, calcium carbonates tend to accumulate in the surficial soils [56]. SOM was low with a mean value of 0.4 %, which was also attributed to the arid and semiarid climate of the study area. Gypsum, sand, and EC values ranged substantially; however, the mean gypsum and EC remained low. Regions with high gypsum, sand, and EC values were located in the arid parts of the study area with low precipitation and high temperatures. The SOM, gypsum, and EC values were positively skewed, whereas the lime, clay silt, sand, and pH values followed a normal distribution (Table 2).

3.2. Variable Importance Analysis

As illustrated in Figure 3, temperature, elevation, rainfall, NDVI, and SAVI indices were the most important covariates in predicting SOM content. Compared to other studies, Zeraatpisheh et al. [8] indicated that RVI, elevation, and SAVI were the most important covariates for SOM prediction in Iran. In contrast, Wang et al. [57] and Ayoubi et al. [58] concluded that topographic attributes significantly influenced SOM due to its effects on runoff, drainage, and soil erosion. Additionally, several studies in Iran demonstrated a strong relationship between vegetation cover and soil properties whereby vegetation indices were effective in capturing the variability in soil properties, especially SOM [59].
The most important predictors of silt content were the MRVBF, clay, and brightness indices. In comparison, the prediction of sand contents relied on rainfall, clay index, and elevation, whereas clay predictions were more reliant on rainfall, temperature, and elevation were the most important covariates (Figure 3). Thus, in the study area, climatic factors such as rainfall and temperature, along with topography attributes, could reflect the soil redistribution process due to water and wind [60]. For example, Brierley et al. [61] reported that in inter-rill soil erosion, the selective removal process led to the redistribution of silt and clay particles. Mosleh et al. [62] indicated that effective predictors of silt variability in Iran included diffuse radiation and wetness index, while the most important predictors of clay content were aspect, duration of solar radiation, and stream power index (SPI). They believed that this was possibly related to the covariates being able to better represent the effects of vertical and lateral movements of soil particles through erosion and deposition processes in their study area. The importance of topographic predictors in mapping particle size fractions in Iran has also been demonstrated in studies, such as Zeraatpisheh et al. [8], which reported that curvature and profile curvature were important controls on water flow in the landscape and thereby explained most of the spatial distribution of clay content. Elsewhere, Adhikari et al. [63] showed that land use, soil, and landscape types were more important in predicting silt and also indicated that the distribution of fine and coarse sand fractions was effectively predicted by slope, elevation, and geology in Denmark. Nath [64] reported that the stream power index and topographic wetness index were the key predictors of sand content in the Northwest Iowa plain. In Nigeria, Akpa et al. [65] demonstrated that topographic variables (e.g., SPI, elevation, and slope), vegetative indices, and climatic variables were the most important predictors of soil particle size.
The prediction of gypsum was controlled mainly by rainfall, temperature, and elevation, while CCE predictions were controlled by salinity index as well as Bands 5 and 7 from the Landsat 8 data (Figure 3). Perhaps, this was due to the effect of climate and the different solubility rates of gypsum and calcium carbonates, where the lower solubility rate of calcium carbonates resulted in its presence on the soil surface, thus making it more visible in satellite imagery [66]. When predicting EC, the most influential predictors were temperature, Band 2, and salinity index (Figure 3), which was in contrast to Mosleh et al. [62], where they reported that elevation, curvature, planform curvature, and profile curvature were the key predictors of EC. For pH predictions, the Tasseled Cap Bands 1, 2, and NDVI were the most important predictor, which was in contrast to Mosleh et al. [62] and Nath [64], which reported the importance of planform curvature.
Overall, climatic parameters (e.g., rainfall and temperature), elevation, and RS data were the most important covariates for predicting the soil properties of the region. For CCE, EC, and pH, remote sensing data was particularly effective due to the accumulation of salts at the surface of the soil, which was easily detected by RS imagery. Meier et al. [67] selected 10 covariates for soil mapping, including four topographic covariates, three images from Landsat, two climatic maps, and the map of Euclidean distance from the drainage network. This study showed that MRVBF, temperature, rainfall, and TWI were the most important covariates for soil mapping (Figure 3). Mosleh et al. [62] concluded that terrain attributes were the main predictors for predicting soil properties, while other studies demonstrated the importance of remotely sensed vegetation parameters in the semiarid regions of Iran [9,17].

3.3. Comparison of Base Learners

Among the eight soil properties, the ANN model performed the best in predicting SOM, CCE, and gypsum content, while the RF model performed the best in predicting EC and pH. The best performing model for particle size classes varied, where ANFIS, XGB, and DNN were the most effective in predicting the sand, silt, and clay fractions, respectively (Table 3). Khaledian and Miller [68] concluded that the ANN model would likely produce the best results for large datasets, although the computational time could drastically increase compared to the other models, such as RF and SVR. However, in this study, the efficiency of the computational process was not a serious issue due to the limited size of our dataset. Mosleh et al. [62] and Were et al. [69] also found that the ANN model showed superior performance in predicting SOM compared to others. For predicting sand contents, similar results were reported in Taghizadeh-Mehrjardi et al. [70], where it was found that the ANFIS model had better performance when compared to multiple linear regression and ANN. Although kNN learner was ineffective in predicting soil properties (Table 3), other studies have demonstrated its effectiveness, such as Khaledian and Miller [68]. It is difficult to explain the reasons for these differences; however, the differences could be related to the different extents of the study areas, topography, sampling densities, or the quantity and quality of the environmental covariates used. Furthermore, this suggests that there is no single ‘best’ ML algorithm and that multiple models should be compared to identify the most appropriate model.
The results showed that among the best individual models to predict soil properties, the highest and the lowest prediction accuracies were obtained for pH (nRMSE = 0.03) and gypsum (nRMSE = 1.10) using RF and ANN models, respectively (Table 3). Several studies concluded that RF and ANN were also effective in predicting soil properties in the arid and semiarid regions of Iran [8,9,10].

3.4. Comparison of Model Averaging Techniques

This study compared seven model averaging approaches to the individual base learners (Table 3). Among these techniques, GRA showed the highest prediction accuracy for SOM, CCE, Silt, and pH; BMA was most effective at predicting sand and clay contents; and BIC and BGA were most effective in predicting gypsum and EC, respectively (Table 3). BGA and GRA resulted in the least and the most accurate prediction for EC (nRMSE = 1.82) and pH (nRMSE = 0.03), respectively (Table 3). Diks and Vrugt [25] found that the BGA method produced the highest accuracy for hydrologic systems compared with the other model averaging methods (e.g., EWA, BGA, BMA, and MMA).
The results of this study confirmed our original expectations that, compared to the individual base learners, all model averaging techniques resulted in similar or more accurately predicted soil properties [25]. Notably, the success of the model averaging techniques highly depended on having a diverse set of base learners when making a final prediction. This might be one reason for why model averaging techniques were consistently more effective than the base learners regardless of the predicted soil property. Similarly, Malone et al. [71] compared four techniques for model averaging and recommended the GRA approach for DSM applications; furthermore, their study also showed that model averaging could increase the accuracy and robustness of the individual base learners. The effectiveness of model averaging in DSM has subsequently been demonstrated in multiple other studies [72]. Although it was not tested here, the application of stacked generalization techniques using the SuperLearner algorithm has shown that combining the predictions of multiple base learners into an ensemble learner often resulted in similar or better predictions [73]. In Taghizadeh-Merhjardi et al. [73], the SuperLearner and the EWA techniques consistently outperformed 12 base learners when predicting 12 soil properties for the Urmia Lake region of Iran.

3.5. Uncertainty Analysis

To assess the uncertainty of the models, the proportion of measured soil values of the validation data that fell within the 90% prediction interval (i.e., prediction interval coverage probability; PICP) and mean prediction interval (MPI: upper prediction limit minus the lower prediction limit) were calculated. Theoretically, 90% of the observations should fall within the defined prediction interval with a confidence level of 90% and MPI should be as narrow as possible. Among the eight soil properties and nine base learners, the ANN model achieved the highest PICP in predicting SOM, CCE, and gypsum content, while the RF model achieved the highest PICP in predicting EC and pH. The best performing model with the lowest uncertainty for particle size classes varied where ANFIS, XGB, and DNN were most effective in predicting the sand, silt, and clay fractions, respectively (Table 4). Furthermore, the uncertainty analysis showed the trend that the model averaging techniques generally produced higher PICP values and that were closer to the nominal 90% for all soil properties in comparison to the base learners. For example, the PICP values of the GRA model were 91% and 86%, respectively, for SOM and CCE. In terms of MPI (Table 5), for eight soil properties and for all ML models, the estimated mean prediction interval for the model averaging techniques were always smaller than those for the base learners. For example, MPI obtained for SOM ranged from 1.0 to 1.4% for the base learners, while it ranged from 0.7 to 0.9% for the model averaging techniques. This further indicated that the model averaging techniques decreased the uncertainty of the models for predicting soil properties. Notably, there was some uncertainty in the predicted values that may have been related to the high variability in soil properties; low precision of predictions; the inherently poor relationships between soil properties and covariates; and errors in modeling.

3.6. Spatial Prediction of Soil Properties

The spatial predictions of the target soil properties are illustrated in Figure 4. Based on a visual assessment, the soil maps were consistent with our expert knowledge of the soil patterns for the region and our understanding of the relationships between soil properties, geology, and climate. As expected, the spatial patterns of the SOM predictions were similar to the mean annual precipitation patterns, where the highest amounts of rainfall occur in the western, northwestern, southwestern, and southern parts of the Isfahan province and at higher elevations and lower temperatures, hence facilitating SOM accumulation.
The spatial variability of lime in the soils of the Isfahan province did not match the climatic patterns and instead followed the geological patterns of the study area. Lower amounts of lime were observed along with a northwestern to southwestern corridor within the study area, where the parent material of soils was derived mainly from volcanic rocks [28]. Similarly, low amounts of lime were also predicted within the western region of the Isfahan province, where the parent materials are derived from metamorphic rocks [28].
Soil salinity and gypsum levels increased along an eastern gradient within the Isfahan province. Due to the higher elevation, the western and southern regions of the study area experience greater humidity; hence, gypsum and other soluble minerals are easily leached from the soil profile. In contrast, the regions that were predicted with the highest amounts of gypsum and other soluble salts (Figure 4) corresponded to the regions with lower precipitation and higher temperature, which provide climatic conditions that are conducive for evaporation and, as a result, the formation of gypsum and other soluble salts. Gypsum is often found in soils with calcite and other soluble salts [28]. The parent material types and evaporation were the main factors for the accumulation of gypsum, calcite, and other soluble salts in the arid and semiarid regions [74]. However, it should be highlighted that the mechanism of salinization is quite complicated and may be affected by other factors [75,76]; for example, the accumulation of salt on the soil surface and soil profile may be significantly affected by the spatiotemporal dynamics of soil water content [77,78].
The variability in soil pH was limited, with predicted values ranging between 7.35 and 8.33 and with soil pH being the lowest in the western and southern regions of the province. Similar to the soil salinity and gypsum predictions, we believe that the spatial pattern of soil pH was partially controlled by the climate, where the higher precipitation levels led to the leaching of soluble minerals, thereby decreasing the pH (Figure 4).
Clay and silt contents were highest in the western, southwestern, and southern parts of the Isfahan province, while the opposite trends were observed for sand contents (Figure 4). It appears that both parent material and climate were effective predictors of particle size fractions. In the central, southeastern, northern, and eastern parts of the province, sand dunes, Quaternary sediments, andesite, granite, and diorite were the dominant parent materials from which the resulting soils formed from these parent materials would have a correspondingly high sand content. Furthermore, wind erosion in the eastern region of the province causes an increased loss of finer soil particles and increased sand contents. In the western, southwestern, and southern regions of the Isfahan province, the dominant parent materials consist of sedimentary rocks, such as marls, limestone, shale, and sandstones, thus resulting in soils with higher clay and silt contents. Furthermore, the higher moisture in these regions facilitates higher weathering rates on the soil parent materials, contributing to clay and silt particle production.

4. Conclusions

This study evaluated multiple base learners and model averaging approaches for predicting the spatial distribution of soil properties in central Iran. We concluded that among the base learners, the ANN and RF models were the most consistent in predicting soil properties and had a higher accuracy than the other base learners. Furthermore, when comparing the model averaging techniques against the individual base learners, model averaging consistently performed better than the best performing base learner regardless of model averaging techniques and soil property. This might be related to the fact that the model averaging techniques combined the strengths of the base learners in order to obtain a better predictive performance and make the ensemble models more robust than their constituents. Specifically, the GRA and BMA approaches performed the best for all soil properties. The uncertainty analysis showed similar trends in the ML models for predicting soil properties where the model averaging methods had higher PICP and lower MPI values than the base learners. The resulting maps, produced at a 30 m spatial resolution, can be used as valuable baseline information for the effective management of environmental resources. These maps will support the sustainable management of the region’s soil resources and facilitate land evaluation activities.

Author Contributions

Conceptualization, R.T.-M. and F.K.; methodology, investigation and formal analysis, R.T-M., F.K. and M.Z.; software, R.T.-M.; validation, R.T.-M. and F.K.; resources, H.K. and F.K.; data curation, H.K. and F.K.; writing—original draft preparation, R.T.-M., H.K., F.K., M.Z., B.H. and T.S.; writing—review and editing, R.T.-M., H.K., F.K., M.Z., B.H. and T.S.; visualization, R.T.-M.; supervision, H.K. and T.S.; project administration, H.K. and T.S.; funding acquisition, H.K. and F.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data are not publicly available due to privacy restrictions.

Acknowledgments

R.T.-M. and T.S. thank the German Research Foundation (DFG) for supporting this research through the Collaborative Research Center (SFB 1070) ‘ResourceCultures’ (subprojects Z, S, and B02) and the DFG Cluster of Excellence “Machine Learning—New Perspectives for Science”, EXC 2064/1, project number 390727645. We acknowledge support from the Open Access Publishing Fund of the University of Tübingen.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. McBratney, A.B.; Mendonça Santos, M.L.; Minasny, B. On digital soil mapping. Geoderma 2003, 117, 3–52. [Google Scholar] [CrossRef]
  2. Quine, T.A.; Zhang, Y. An investigation of spatial variation in soil erosion, soil properties, and crop production within an agricultural field in Devon, United Kingdom. J. Soil Water Conserv. 2002, 57, 55–65. [Google Scholar]
  3. Khosravi, H.; Zehtabian, G.R.; Ahmadi, H.; Azarnivand, H. Hazard assessment of desertification as a result of soil and water recourse degradation in Kashan Region, Iran. Desert 2014, 19, 45–55. [Google Scholar]
  4. Zeraatpisheh, M.; Bakhshandeh, E.; Hosseini, M.; Alavi, S.M. Assessing the effects of deforestation and intensive agriculture on the soil quality through digital soil mapping. Geoderma 2020, 363, 114139. [Google Scholar] [CrossRef]
  5. Bogunovic, I.; Pereira, P.; Brevik, E.C. Spatial distribution of soil chemical properties in an organic farm in Croatia. Sci. Total Environ. 2017, 584–585, 535–545. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  6. Forkuor, G.; Hounkpatin, O.K.L.; Welp, G.; Thiel, M. High Resolution Mapping of Soil Properties Using Remote Sensing Variables in South-Western Burkina Faso: A Comparison of Machine Learning and Multiple Linear Regression Models. PLoS ONE 2017, 12, e0170478. [Google Scholar] [CrossRef] [PubMed]
  7. Taghizadeh-Mehrjardi, R.; Schmidt, K.; Amirian-Chakan, A.; Rentschler, T.; Zeraatpisheh, M.; Sarmadian, F.; Valavi, R.; Davatgar, N.; Behrens, T.; Scholten, T. Improving the Spatial Prediction of Soil Organic Carbon Content in Two Contrasting Climatic Regions by Stacking Machine Learning Models and Rescanning Covariate Space. Remote Sens. 2020, 12, 1095. [Google Scholar] [CrossRef] [Green Version]
  8. Zeraatpisheh, M.; Ayoubi, S.; Jafari, A.; Tajik, S.; Finke, P. Digital mapping of soil properties using multiple machine learning in a semi-arid region, central Iran. Geoderma 2019, 338, 445–452. [Google Scholar] [CrossRef]
  9. Taghizadeh-Mehrjardi, R.; Nabiollahi, K.; Kerry, R. Digital mapping of soil organic carbon at multiple depths using different data mining techniques in Baneh region, Iran. Geoderma 2016, 266, 98–110. [Google Scholar] [CrossRef]
  10. Zeraatpisheh, M.; Jafari, A.; Bagheri Bodaghabadi, M.; Ayoubi, S.; Taghizadeh-Mehrjardi, R.; Toomanian, N.; Kerry, R.; Xu, M. Conventional and digital soil mapping in Iran: Past, present, and future. Catena 2020, 188, 104424. [Google Scholar] [CrossRef]
  11. Wadoux, A.M.J.C.; Minasny, B.; McBratney, A.B. Machine learning for digital soil mapping: Applications, challenges and suggested solutions. Earth-Sci. Rev. 2020, 210, 103359. [Google Scholar] [CrossRef]
  12. Besalatpour, A.A.; Ayoubi, S.; Hajabbasi, M.A.; Mosaddeghi, M.R.; Schulin, R. Estimating wet soil aggregate stability from easily available properties in a highly mountainous watershed. Catena 2013, 111, 72–79. [Google Scholar] [CrossRef] [Green Version]
  13. Zeraatpisheh, M.; Ayoubi, S.; Jafari, A.; Finke, P. Comparing the efficiency of digital and conventional soil mapping to predict soil types in a semi-arid region in Iran. Geomorphology 2017, 285, 186–204. [Google Scholar] [CrossRef]
  14. Pahlavan-Rad, M.R.; Khormali, F.; Toomanian, N.; Brungard, C.W.; Kiani, F.; Komaki, C.B.; Bogaert, P. Legacy soil maps as a covariate in digital soil mapping: A case study from Northern Iran. Geoderma 2016, 276, 141–148. [Google Scholar] [CrossRef]
  15. Mohammed, S.; Al-Ebraheem, A.; Holb, I.J.; Alsafadi, K.; Dikkeh, M.; Pham, Q.B.; Linh, N.T.T.; Szabo, S. Soil management effects on soil water erosion and runoff in central Syria—A comparative evaluation of general linear model and random forest regression. Water 2020, 12, 2529. [Google Scholar] [CrossRef]
  16. Zhang, X.; Zeraatpisheh, M.; Rahman, M.M.; Wang, S.; Xu, M. Texture Is Important in Improving the Accuracy of Mapping Photovoltaic Power Plants: A Case Study of Ningxia Autonomous Region, China. Remote Sens. 2021, 13, 3909. [Google Scholar] [CrossRef]
  17. Zeraatpisheh, M.; Garosi, Y.; Owliaie, H.R.; Ayoubi, S.; Taghizadeh-Mehrjardi, R.; Scholten, T.; Xu, M. Improving the spatial prediction of soil organic carbon using environmental covariates selection: A comparison of a group of environmental covariates. Catena 2022, 208, 105723. [Google Scholar] [CrossRef]
  18. Adhikari, K.; Minasny, B.; Greve, M.B.; Greve, M.H. Constructing a soil class map of Denmark based on the FAO legend using digital techniques. Geoderma 2014, 214–215, 101–113. [Google Scholar] [CrossRef] [Green Version]
  19. Kovačević, M.; Bajat, B.; Gajić, B. Soil type classification and estimation of soil properties using support vector machines. Geoderma 2010, 154, 340–347. [Google Scholar] [CrossRef]
  20. Jafari, A.; Ayoubi, S.; Khademi, H.; Finke, P.A.; Toomanian, N. Selection of a taxonomic level for soil mapping using diversity and map purity indices: A case study from an Iranian arid region. Geomorphology 2014, 201, 86–97. [Google Scholar] [CrossRef]
  21. Patel, H.; Upla, K.P. A shallow network for hyperspectral image classification using an autoencoder with convolutional neural network. Multimed. Tools Appl. 2021, 1–20. [Google Scholar] [CrossRef]
  22. Peterson, K.T.; Sagan, V.; Sidike, P.; Hasenmueller, E.A.; Sloan, J.J.; Knouft, J.H. Machine learning-based ensemble prediction of water-quality variables using feature-level and decision-level fusion with proximal remote sensing. Photogramm. Eng. Remote Sens. 2019, 85, 269–280. [Google Scholar] [CrossRef]
  23. Sellami, A.; Tabbone, S. Deep neural networks-based relevant latent representation learning for hyperspectral image classification. Pattern Recognit. 2022, 121, 108224. [Google Scholar] [CrossRef]
  24. Wang, J.; Shi, T.; Yu, D.; Teng, D.; Ge, X.; Zhang, Z.; Yang, X.; Wang, H.; Wu, G. Ensemble machine-learning-based framework for estimating total nitrogen concentration in water using drone-borne hyperspectral imagery of emergent plants: A case study in an arid oasis, NW China. Environ. Pollut. 2020, 266, 115412. [Google Scholar] [CrossRef]
  25. Diks, C.G.H.; Vrugt, J.A. Comparison of point forecast accuracy of model averaging methods in hydrologic applications. Stoch. Environ. Res. Risk Assess. 2010, 24, 809–820. [Google Scholar] [CrossRef] [Green Version]
  26. O’Rourke, S.M.; Stockmann, U.; Holden, N.M.; McBratney, A.B.; Minasny, B. An assessment of model averaging to improve predictive power of portable vis-NIR and XRF for the determination of agronomic soil properties. Geoderma 2016, 279, 31–44. [Google Scholar] [CrossRef]
  27. Taghizadeh-Mehrjardi, R.; Minasny, B.; Toomanian, N.; Zeraatpisheh, M.; Amirian-Chakan, A.; Triantafilis, J. Digital Mapping of Soil Classes Using Ensemble of Models in Isfahan Region, Iran. Soil Syst. 2019, 3, 37. [Google Scholar] [CrossRef] [Green Version]
  28. Khayamim, F.; Wetterlind, J.; Khademi, H.; Robertson, A.H.J.; Cano, A.F.; Stenberg, B.; Sveriges, l. Using Visible and near Infrared Spectroscopy to Estimate Carbonates and Gypsum in Soils in Arid and Subhumid Regions of Isfahan, Iran. J. Near Infrared Spectrosc. 2015, 23, 155–165. [Google Scholar] [CrossRef] [Green Version]
  29. McLean, E. Chemical and microbiological properties. In Methods of Soil Analysis Part 2; American Society of Agronomy, Soil Science Society of America: Madison, WI, USA, 1982; Volume 2, pp. 199–224. [Google Scholar]
  30. Page, A.; Miller, R.; Keeney, D. Nitrogen—Inorganic Forms. In Methods of Soil Analysis Part 2; American Society of Agronomy, Soil Science Society of America: Madison, WI, USA, 1982; Volume 2, pp. 643–698. [Google Scholar]
  31. Nelson, D.W.; Sommers, L.E. Total Carbon, Organic Carbon, and Organic Matter. In Methods of Soil Analysis; Page, A.L., Ed.; American Society of Agronomy, Soil Science Society of America: Madison, WI, USA, 1983; pp. 539–579. [Google Scholar]
  32. Sumner, M.E.; Miller, W.P. Cation exchange capacity and exchange coefficients. In Methods Soil Anal: Part 3 Chemical Methods; American Society of Agronomy, Soil Science Society of America: Madison, WI, USA, 1996; Volume 5, pp. 1201–1229. [Google Scholar]
  33. Gee, G.; Bauder, J. Particle size analysis. In Methods Soil Anal, Part 1; Klute, A., Ed.; Agron. Monogr. No. 9; American Society of Agronomy, Soil Science Society of America: Madison, WI, USA, 1986; pp. 383–411. [Google Scholar]
  34. Richards, L.A. Diagnosis and Improvement of Saline and Alkali Soils; The United States Department of Agriculture: Washington, DC, USA, 1954.
  35. Olaya, V. A Gentle Introduction to SAGA GIS; The SAGA User Group eV: Gottingen, Germany, 2004. [Google Scholar]
  36. Gallant, J.C.; Dowling, T.I. A multiresolution index of valley bottom flatness for mapping depositional areas. Water Resour. Res. 2003, 39, 1347. [Google Scholar] [CrossRef]
  37. Wulder, M.A.; White, J.C.; Loveland, T.R.; Woodcock, C.E.; Belward, A.S.; Cohen, W.B.; Fosnight, E.A.; Shaw, J.; Masek, J.G.; Roy, D.P. The global Landsat archive: Status, consolidation, and direction. Remote Sens. Environ. 2016, 185, 271–283. [Google Scholar] [CrossRef] [Green Version]
  38. Peng, J.; Biswas, A.; Jiang, Q.; Zhao, R.; Hu, J.; Hu, B.; Shi, Z. Estimating soil salinity from remote sensing and terrain data in southern Xinjiang Province, China. Geoderma 2019, 337, 1309–1319. [Google Scholar] [CrossRef]
  39. Allbed, A.; Kumar, L.; Aldakheel, Y.Y. Assessing soil salinity using soil salinity and vegetation indices derived from IKONOS high-spatial resolution imageries: Applications in a date palm dominated region. Geoderma 2014, 230, 1–8. [Google Scholar] [CrossRef]
  40. Metternicht, G.I.; Zinck, J. Remote sensing of soil salinity: Potentials and constraints. Remote Sens. Environ. 2003, 85, 1–20. [Google Scholar] [CrossRef]
  41. Boettinger, J.; Ramsey, R.; Bodily, J.; Cole, N.; Kienast-Brown, S.; Nield, S.; Saunders, A.; Stum, A. Landsat spectral data for digital soil mapping. In Digital Soil Mapping with Limited Data; Springer: Berlin/Heidelberg, Germany, 2008; pp. 193–202. [Google Scholar]
  42. Rouse, J.W.; Haas, R.H.; Schell, J.A.; Deering, D.W. Monitoring vegetation systems in the Great Plains with ERTS. NASA Spec. Publ. 1974, 351, 309. [Google Scholar]
  43. Fick, S.E.; Hijmans, R.J. WorldClim 2: New 1-km spatial resolution climate surfaces for global land areas. Int. J. Climatol. 2017, 37, 4302–4315. [Google Scholar] [CrossRef]
  44. Brungard, C.W.; Boettinger, J.L.; Duniway, M.C.; Wills, S.A.; Edwards, T.C. Machine learning for predicting soil classes in three semi-arid landscapes. Geoderma 2015, 239–240, 68–83. [Google Scholar] [CrossRef] [Green Version]
  45. Tajik, S.; Ayoubi, S.; Shirani, H.; Zeraatpisheh, M. Digital mapping of soil invertebrates using environmental attributes in a deciduous forest ecosystem. Geoderma 2019, 353, 252–263. [Google Scholar] [CrossRef]
  46. R Development Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2015. [Google Scholar]
  47. Kuhn, M.; Weston, S.; Keefer, C.; Coulter, N.; Quinlan, R.K. Cubist: Rule-and Instance-Based Regression Modeling; CRAN; R package version 0.0, 13; 2013. Available online: https://cran.r-project.org/web/packages/Cubist/vignettes/cubist.html (accessed on 9 November 2021).
  48. RStudio: Integrated Development for R; Computer Software v0.98.1074; RStudio, Inc.: Boston, MA, USA, 2015.
  49. Bates, J.M.; Granger, C.W.J. The Combination of Forecasts. J. Oper. Res. Soc. 1969, 20, 451–468. [Google Scholar] [CrossRef]
  50. Buckland, S.T.; Burnham, K.P.; Augustin, N.H. Model Selection: An Integral Part of Inference. Biometrics 1997, 53, 603–618. [Google Scholar] [CrossRef]
  51. Hoeting, J.A.; Madigan, D.; Raftery, A.E.; Volinsky, C.T. Bayesian model averaging: A tutorial (with comments by M. Clyde, David Draper and E. I. George, and a rejoinder by the authors. Stat. Sci. 1999, 14, 382–417. [Google Scholar] [CrossRef]
  52. Raftery, A.E.; Gneiting, T.; Balabdaoui, F.; Polakowski, M. Using Bayesian model averaging to calibrate forecast ensembles. Mon. Weather Rev. 2005, 133, 1155–1174. [Google Scholar] [CrossRef] [Green Version]
  53. Claeskens, G.; Hjort, N.L. The Focused Information Criterion. J. Am. Stat. Assoc. 2003, 98, 900–916. [Google Scholar] [CrossRef]
  54. Hjort, N.L.; Claeskens, G. Frequentist Model Average Estimators. J. Am. Stat. Assoc. 2003, 98, 879–899. [Google Scholar] [CrossRef] [Green Version]
  55. Granger, C.W.J.; Ramanathan, R. Improved methods of combining forecasts. J. Forecast. 1984, 3, 197–204. [Google Scholar] [CrossRef]
  56. Khormali, F.; Abtahi, A. Origin and distribution of clay minerals in calcareous arid and semi-arid soils of Fars Province, southern Iran. Clay Miner. 2003, 38, 511–527. [Google Scholar] [CrossRef]
  57. Wang, B.; Waters, C.; Orgill, S.; Gray, J.; Cowie, A.; Clark, A.; Liu, D.L. High resolution mapping of soil organic carbon stocks using remote sensing variables in the semi-arid rangelands of eastern Australia. Sci. Total Environ. 2018, 630, 367–378. [Google Scholar] [CrossRef]
  58. Ayoubi, S.; Mokhtari Karchegani, P.; Mosaddeghi, M.R.; Honarjoo, N. Soil aggregation and organic carbon as affected by topography and land use change in western Iran. Soil Tillage Res. 2012, 121, 18–26. [Google Scholar] [CrossRef]
  59. Mahmoudabadi, E.; Karimi, A.; Haghnia, G.H.; Sepehr, A. Digital soil mapping using remote sensing indices, terrain attributes, and vegetation features in the rangelands of northeastern Iran. Environ. Monit. Assess. 2017, 189, 500. [Google Scholar] [CrossRef]
  60. Rodrigo-Comino, J.; Keshavarzi, A.; Zeraatpisheh, M.; Gyasi-Agyei, Y.; Cerdà, A. Determining the best ISUM (Improved stock unearthing Method) sampling point number to model long-term soil transport and micro-topographical changes in vineyards. Comput. Electron. Agric. 2019, 159, 147–156. [Google Scholar] [CrossRef]
  61. Brierley, G.; Fryirs, K.; Jain, V. Landscape connectivity: The geographic basis of geomorphic applications. Area 2006, 38, 165–174. [Google Scholar] [CrossRef]
  62. Mosleh, Z.; Salehi, M.H.; Jafari, A.; Borujeni, I.E.; Mehnatkesh, A. The effectiveness of digital soil mapping to predict soil properties over low-relief areas. Environ. Monit. Assess. 2016, 188, 195. [Google Scholar] [CrossRef] [PubMed]
  63. Adhikari, K.; Kheir, R.B.; Greve, M.B.; Bøcher, P.K.; Malone, B.P.; Minasny, B.; McBratney, A.B.; Greve, M.H. High-Resolution 3-D Mapping of Soil Texture in Denmark. Soil Sci. Soc. Am. J. 2013, 77, 860–876. [Google Scholar] [CrossRef]
  64. Nath, D.A. Soil Landscape Modeling in the Northwest Iowa Plains Region of O’Brien County, Iowa; Iowa State University: Ames, IA, USA, 2016. [Google Scholar]
  65. Akpa, S.I.C.; Odeh, I.O.A.; Bishop, T.F.A.; Hartemink, A.E. Digital Mapping of Soil Particle-Size Fractions for Nigeria. Soil Sci. Soc. Am. J. 2014, 78, 1953–1966. [Google Scholar] [CrossRef] [Green Version]
  66. Sarmast, M.; Farpoor, M.H.; Esfandiarpour Boroujeni, I. Comparing Soil Taxonomy (2014) and updated WRB (2015) for describing calcareous and gypsiferous soils, Central Iran. Catena 2016, 145, 83–91. [Google Scholar] [CrossRef]
  67. Meier, M.; Souza, E.D.; Francelino, M.R.; Fernandes Filho, E.I.; Schaefer, C.E.G.R. Digital Soil Mapping Using Machine Learning Algorithms in a Tropical Mountainous Area. Rev. Bras. Cienc. Solo 2018, 42, 1–22. [Google Scholar] [CrossRef] [Green Version]
  68. Khaledian, Y.; Miller, B.A. Selecting appropriate machine learning methods for digital soil mapping. Appl. Math. Model. 2020, 81, 401–418. [Google Scholar] [CrossRef]
  69. Were, K.; Bui, D.T.; Dick, Ø.B.; Singh, B.R. A comparative assessment of support vector regression, artificial neural networks, and random forests for predicting and mapping soil organic carbon stocks across an Afromontane landscape. Ecol. Indic. 2015, 52, 394–403. [Google Scholar] [CrossRef]
  70. Taghizadeh-mehrjardi, R.; Toomanian, N.; Khavaninzadeh, A.R.; Jafari, A.; Triantafilis, J. Predicting and mapping of soil particle-size fractions with adaptive neuro-fuzzy inference and ant colony optimization in central Iran. Eur. J. Soil Sci. 2016, 67, 707–725. [Google Scholar] [CrossRef]
  71. Malone, B.P.; Minasny, B.; Odgers, N.P.; McBratney, A.B. Using model averaging to combine soil property rasters from legacy soil maps and from point data. Geoderma 2014, 232-234, 34–44. [Google Scholar] [CrossRef]
  72. Nussbaum, M.; Spiess, K.; Baltensweiler, A.; Grob, U.; Keller, A.; Greiner, L.; Schaepman, M.E.; Papritz, A. Evaluation of digital soil mapping approaches with large sets of environmental covariates. Soil 2018, 4, 1–22. [Google Scholar] [CrossRef] [Green Version]
  73. Taghizadeh-Mehrjardi, R.; Hamzehpour, N.; Hassanzadeh, M.; Heung, B.; Goydaragh, M.G.; Schmidt, K.; Scholten, T. Enhancing the accuracy of machine learning models using the super learner technique in digital soil mapping. Geoderma 2021, 399, 115108. [Google Scholar] [CrossRef]
  74. Khademi, H.; Mermut, A.R. Micromorphology and classification of Argids and associated gypsiferous Aridisols from central Iran. Catena 2003, 54, 439–455. [Google Scholar] [CrossRef]
  75. Wang, J.; Hu, X.; Shi, T.; He, L.; Hu, W.; Wu, G. Assessing toxic metal chromium in the soil in coal mining areas via proximal sensing: Prerequisites for land rehabilitation and sustainable development. Geoderma 2022, 405, 115399. [Google Scholar] [CrossRef]
  76. Taghizadeh-Mehrjardi, R.; Minasny, B.; Sarmadian, F.; Malone, B. Digital mapping of soil salinity in Ardakan region, central Iran. Geoderma 2014, 213, 15–28. [Google Scholar] [CrossRef]
  77. Wang, J.; Ding, J.; Yu, D.; Teng, D.; He, B.; Chen, X.; Ge, X.; Zhang, Z.; Wang, Y.; Yang, X. Machine learning-based detection of soil salinity in an arid desert region, Northwest China: A comparison between Landsat-8 OLI and Sentinel-2 MSI. Sci. Total Environ. 2020, 707, 136092. [Google Scholar] [CrossRef]
  78. Wang, J.; Ding, J.; Yu, D.; Ma, X.; Zhang, Z.; Ge, X.; Teng, D.; Li, X.; Liang, J.; Lizaga, I. Capability of Sentinel-2 MSI data for monitoring and mapping of soil salinity in dry and wet seasons in the Ebinur Lake region, Xinjiang, China. Geoderma 2019, 353, 172–187. [Google Scholar] [CrossRef]
Figure 1. The location of Iran and the Isfahan province (left) and the spatial distribution of sampling points (right).
Figure 1. The location of Iran and the Isfahan province (left) and the spatial distribution of sampling points (right).
Remotesensing 14 00472 g001
Figure 2. Examples of covariates for the Isfahan province.
Figure 2. Examples of covariates for the Isfahan province.
Remotesensing 14 00472 g002
Figure 3. Importance of covariates for soil properties.
Figure 3. Importance of covariates for soil properties.
Remotesensing 14 00472 g003
Figure 4. Spatial distribution soil properties for the Isfahan province at a 30 m spatial resolution. The maps shown are produced using the best-performing model averaging technique.
Figure 4. Spatial distribution soil properties for the Isfahan province at a 30 m spatial resolution. The maps shown are produced using the best-performing model averaging technique.
Remotesensing 14 00472 g004
Table 1. List of environmental covariates used (*L is a canopy background adjustment factor).
Table 1. List of environmental covariates used (*L is a canopy background adjustment factor).
CovariatesDefinitionCodeSource and Ref.
Elevation X01_ElevDEM SRTM
Catchment Aspect X02_Catch.Asp
Catchment Slope X03_Catch.Slop
Catchments area X04_Catch.Area
Openness (PosOpen) X05_Openness
Profile curvature X06_Prf.Curv
Plan curvature X07_Pl.Curv
Wetness index X08_Wetness.In
Valley depth X09_Valley.Dep
Slope length X11_Slop.Leng
Total insolation X12_Total.Inso
Multi-resolution valley bottom flatness index X10_MrVBFDEM SRTM, [36]
BlueB2: 0.45–0.51 µmX13_B1Landsat 8, [37]
GreenB3: 0.53–0.59 µmX14_B2
RedB4: 0.64–0.67 µmX15_B3
Near-infraredB5: 0.85–0.88 µmX16_B4
Short-wave infrared-1B6: 1.57–1.65 µmX17_B5
Short-wave infrared-2B7: 2.11–2.29 µmX18_B7
TASSELED CAP 1The overall brightness of the imageX19_TSC 1Landsat 8, [38]
TASSELED CAP 2The overall greenness of the imageX20_TSC 2
TASSELED CAP 3The overall wetness of the imageX21_TSC 3
Salinity index(B1 − B3)/(B1 + B3)X22_Salinity.InLandsat 8, [39]
Brightness index((B3)2 + (B4)2)0.5X23_Bright.InLandsat 8, [40]
Gypsum index(B5 − B4)/(B5 + B4)X24_Gypsum.InLandsat 8, [41]
Clay indexB5/B7X25_Clay.In
Carbonate indexB3/B2X26_Carbon.In
Ratio vegetation index(B4/B3)/(B2 + B3)X27_RVILandsat 8, [42]
Enhanced vegetation index(B4 − B3)/(B4 + C1 × B3 − C2 × B1 + L)X28_EVI
Infrared percentage vegetation indexB4/(B4 + B3)X29_IPVI
Normalized difference vegetation index(B4 − B3)/(B4 + B3)X30_NDVI
Soil adjusted vegetation index*(1+ L) × (B4 − B3)/(B4 + B3 + L)X31_SAVI
Annual mean temperatureIt is derived from the monthly temperature valuesX32_TempWordClim, [43]
Annual mean precipitationIt is derived from the monthly rainfall valuesX33_Rainfall
Table 2. Summary statistics of soil properties.
Table 2. Summary statistics of soil properties.
ParameterNumberUnitMinimumMaximumMeanSDSkewnessKurtosis
SOM251%0.02.50.40.51.93.6
CCE251%0.280.027.817.70.5−0.6
Gypsum251%0.061.75.47.73.416.0
Clay251%2.038.812.97.50.7−0.1
Silt251%2.085.031.416.90.70.3
Sand251%0.294.755.620.2−0.3−0.7
EC251dS/m0.178.73.39.04.828.1
pH251−log(H+)7.18.77.90.20.21.1
Table 3. Summary of accuracy metrics for base learners and model averaging techniques.
Table 3. Summary of accuracy metrics for base learners and model averaging techniques.
Soil PropertyValidationBase LearnerModel Averaging Technique
kNNSVRGPLassoANNDNNANFISRFXGBEWABGAAICBICBMAMMAGRA
SOMRMSE0.320.290.280.270.240.260.260.260.260.250.250.250.250.250.250.23
R20.690.740.730.760.770.750.770.770.750.770.760.770.820.750.810.84
MAE0.220.180.170.170.160.180.170.170.180.160.180.170.170.180.170.15
nRMSE0.680.620.590.580.520.560.560.550.560.540.540.540.540.540.530.51
CCERMSE12.1912.1414.7711.8211.4812.1411.9111.8411.9811.8012.0011.4911.7911.4511.4611.42
R20.550.550.340.560.590.560.550.570.580.570.580.630.590.620.630.65
MAE9.429.2411.278.968.649.639.209.219.428.889.378.989.488.859.338.24
nRMSE0.440.440.530.420.410.440.430.430.430.420.430.410.420.410.410.41
GypsumRMSE6.146.277.146.486.106.166.336.475.955.685.675.735.425.635.445.63
R20.370.340.190.370.430.400.370.390.440.440.470.460.540.500.520.45
MAE3.763.574.193.613.503.773.863.883.923.413.723.563.783.423.503.69
nRMSE1.111.131.291.171.101.111.141.171.081.031.021.030.981.020.981.02
SandRMSE15.2415.0916.3415.5014.8614.9814.6614.9715.0014.8315.0015.1814.8414.3814.7314.87
R20.460.450.390.440.480.460.470.390.480.460.480.470.510.560.550.54
MAE12.0411.7212.9512.4111.6512.1112.0412.7012.2811.9112.2112.3012.3111.9312.0812.02
nRMSE0.270.270.290.280.270.270.270.270.270.270.270.270.270.260.270.27
ClayRMSE6.086.176.185.995.985.945.985.965.925.875.925.805.935.685.835.96
R20.380.340.390.400.420.400.460.480.430.460.460.470.490.540.510.50
MAE4.794.824.814.654.704.654.614.644.654.644.674.644.784.634.714.83
nRMSE0.470.480.480.460.460.460.460.460.460.450.460.450.460.440.450.46
SiltRMSE14.6814.3216.1914.0313.9513.6613.8813.9513.6214.0413.3613.8413.6313.6713.6513.59
R20.300.340.160.360.360.390.360.390.420.370.450.420.410.410.440.50
MAE11.0810.8112.3110.8410.5110.4910.6210.8110.6910.8910.6810.8510.8010.9810.9410.83
nRMSE0.470.450.510.440.440.430.440.440.430.440.420.440.430.430.430.43
ECRMSE9.229.1211.138.548.458.258.467.577.878.057.167.287.537.777.428.04
R20.390.410.170.400.420.340.480.470.460.530.640.500.540.590.590.47
MAE3.724.494.984.263.874.443.754.164.473.904.083.664.324.014.434.55
nRMSE2.342.312.822.172.162.092.151.922.002.041.821.851.911.971.882.04
pHRMSE0.210.210.220.210.210.210.210.220.210.210.210.210.200.210.200.20
R20.120.110.060.100.180.130.130.200.180.200.210.230.300.250.310.38
MAE0.160.150.170.160.160.160.160.170.160.160.160.160.160.160.160.14
nRMSE0.030.030.030.030.030.030.030.030.030.030.030.030.030.030.030.03
Table 4. Uncertainty of the models for predicting soil properties (prediction interval coverage probability; PICP).
Table 4. Uncertainty of the models for predicting soil properties (prediction interval coverage probability; PICP).
Soil PropertyBase Learner (%)Model Averaging Technique (%)
kNNSVRGPLassoANNDNNANFISRFXGBEWABGAAICBICBMAMMAGRA
SOM26323336574042454366676772778791
CCE38383561634044515775717781788186
Gypsum35292328595028545563826590848285
Sand50433542565552676273706886808783
Clay31292636494756535671776868888687
Silt26392345526668626974908484878590
EC24262028293532423946656262516364
pH50524553535555717089898482818291
Table 5. Uncertainty of the models for predicting soil properties (mean prediction interval; MPI).
Table 5. Uncertainty of the models for predicting soil properties (mean prediction interval; MPI).
Soil PropertyBase Learner Model Averaging Technique
kNNSVRGPLassoANNDNNANFISRFXGBEWABGAAICBICBMAMMAGRA
SOM1.41.31.21.20.91.21.31.01.20.80.90.90.80.70.80.8
CCE87.477.180.580.965.267.084.968.172.755.850.157.953.757.745.746.5
Gypsum36.538.534.439.232.931.737.933.2 34.631.230.630.028.831.529.429.9
Sand92.976.178.483.087.379.488.976.287.861.969.971.174.469.861.772.4
Clay32.433.131.333.027.930.029.829.033.226.726.625.724.223.024.922.1
Silt73.866.270.778.574.877.767.370.666.859.355.153.063.853.056.552.7
EC37.525.034.331.237.031.429.222.225.721.315.318.314.016.916.018.4
pH0.80.80.70.80.70.70.70.70.80.70.50.60.70.70.50.5
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Taghizadeh-Mehrjardi, R.; Khademi, H.; Khayamim, F.; Zeraatpisheh, M.; Heung, B.; Scholten, T. A Comparison of Model Averaging Techniques to Predict the Spatial Distribution of Soil Properties. Remote Sens. 2022, 14, 472. https://doi.org/10.3390/rs14030472

AMA Style

Taghizadeh-Mehrjardi R, Khademi H, Khayamim F, Zeraatpisheh M, Heung B, Scholten T. A Comparison of Model Averaging Techniques to Predict the Spatial Distribution of Soil Properties. Remote Sensing. 2022; 14(3):472. https://doi.org/10.3390/rs14030472

Chicago/Turabian Style

Taghizadeh-Mehrjardi, Ruhollah, Hossein Khademi, Fatemeh Khayamim, Mojtaba Zeraatpisheh, Brandon Heung, and Thomas Scholten. 2022. "A Comparison of Model Averaging Techniques to Predict the Spatial Distribution of Soil Properties" Remote Sensing 14, no. 3: 472. https://doi.org/10.3390/rs14030472

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop