Document Type


Publication Date



We evaluated how the spatial resolution of environmental variables (n = 47) altered their ability to predict soil organic carbon (SOC) stocks (0–30 cm depth) using training data from Gridded Soil Survey Geographic-gSSURGO and SoilGrids databases. Training and validation subsamples (1,629) were selected using a conditioned Latin hypercube sampling (cLHS) design based on environmental variables in Vermont, U.S. The predictive relationships between environmental variables and SOC stock (t C ha−1) were developed using machine learning algorithms. The algorithms were trained (70 %) and evaluated (30 %) using a random subset of database subsamples, respectively, with an additional evaluation step using local, independent SOC reference data (n = 272). The Random Forest (RF) algorithm outperformed other algorithms at all spatial resolutions in estimating SOC stocks. As spatial resolution increased, model performance with the gSSURGO database increased (R2 = 0.33–0.62 and RMSE = 42.42–34.92), while no such trend was observed for the SoilGrids database. The best SOC stock model prediction using the SoilGrids database was achieved with a 10 m resolution (R2 = 0.54 and RMSE = 4.67). Evaluation of modeled results using the external, or independent, reference data showed a significant decrease compared to the internal validation in prediction accuracy (R2 = 0.11–0.14 for gSSURGO and, R2 = −0.19 for SoilGrids). The gSSURGO database showed that soil maps (including suborders, drainage classes, temperature, and moisture) and geology/landform maps had a greater influence than other environmental variables at all spatial resolution scales. In contrast, climatic- and DEM-related variables were more significant for the SoilGrids database. Our study suggested that the origin of the SOC stock database and the sampling scheme largely affects the importance of environmental variables assigned in the machine learning algorithm. Our results confirmed that the variable and data sources, model type, and combination of environmental variables significantly influenced prediction accuracy. In conclusion, DSM products should be re-evaluated with local references when used for spatial extents that are different from those for which they were initially designed.


Available for download on Sunday, May 11, 2025

Link to Article at Publisher Website