Software pipelines for processing soil water data and predicting plant available water using approaches from the SoilWaterNow project
Access status:
Open Access
Type
DatasetAbstract
The SoilWaterNow project utilised both Python and R to create software pipelines that process soil moisture data and can be used for predicting plant available water. There is 4 different parts, processing CosmOz surveys, SMAP data assimilation, a water balance model, and a data ...
See moreThe SoilWaterNow project utilised both Python and R to create software pipelines that process soil moisture data and can be used for predicting plant available water. There is 4 different parts, processing CosmOz surveys, SMAP data assimilation, a water balance model, and a data driven model for predicting agriculture systems. Within each section, there is a working script with an example dataset. This allows users to repeat the analysis with the example data in order to understand the data inputs and formats required to run the analysis on their own study area. CosmOz survey - Data processing: This pipeline transforms raw neutron count data from cosmic ray probe survey data into soil moisture measurements. The pipeline was used with the associated CosmOz survey data and soil data. The R code is available along with an example dataset. SMAP Data Assimilation: This pipeline assimilates Soil Moisture Active Passive (SMAP) satellite estimates of soil moisture into an API model for soil moisture reanalysis. Sample data is provided for CosmOz sites and their locations of these points are in "cosmoz_site_info.csv". The Python code for the API model is available. The forcing data used was GPM rainfall and air temperature anomalies. These parameters were calibrated and are listed in "API_parameters.csv". Water balance model: There are two models available, one that is point-based and used for running on small datasets, e.g. soil moisture probes, and one that is raster-based, which is faster and can be used for obtaining maps of soil moisture. Both models rely on daily evapotranspiration (ET), bucket size, and rainfall as the 3 inputs. These data for these 3 inputs can be accessed publicly: 8-day MODIS evapotranspiration data can be downloaded from the USGS website or directly from google earth engine, rainfall data can be accessed from SILO through the Long Paddock website, and soil data can be accessed from the eSoil website. 5 bucket sizes were used for both models, 0-5 cm, 5-15 cm, 15-30 cm, 30-60 cm, and 60-100 cm. The "ET&rain4WBmodel.r" file contains code that organises daily data for model execution. Alternatively, provided example datasets can be used to run the model. R code is available for both models. Data-driven approach: This pipeline uses a Gaussian Process regression model/workflow that can be used to predict soil moisture in space and time. This model uses a complex base function that can capture underlying trends in soil data. Each workflow consists of 4 steps: 1. data-preprocessing 2. feature analysis and selection 2. model training, optimisation, evaluation, and selection 4. generating prediction and uncertainty maps. Python code is available for this model and an example dataset is available that is already pre-processed. The "Methods.pdf" file discusses feature selection and model details and the "README.md" file contains in depth information about how this model works and gives example outputs. The software pipelines are stored in a public GitHub repository (https://github.com/thomasfabishop/soilwaternow) and are also stored on the USYD-RDS at \\shared.sydney.edu.au\research-data\PRJ-soilwaternowarchive. The pipelines are open access under a creative commons license (CC-BY 4.0). Please contact Dr Patrick Filippi ([email protected]) for further information.
See less
See moreThe SoilWaterNow project utilised both Python and R to create software pipelines that process soil moisture data and can be used for predicting plant available water. There is 4 different parts, processing CosmOz surveys, SMAP data assimilation, a water balance model, and a data driven model for predicting agriculture systems. Within each section, there is a working script with an example dataset. This allows users to repeat the analysis with the example data in order to understand the data inputs and formats required to run the analysis on their own study area. CosmOz survey - Data processing: This pipeline transforms raw neutron count data from cosmic ray probe survey data into soil moisture measurements. The pipeline was used with the associated CosmOz survey data and soil data. The R code is available along with an example dataset. SMAP Data Assimilation: This pipeline assimilates Soil Moisture Active Passive (SMAP) satellite estimates of soil moisture into an API model for soil moisture reanalysis. Sample data is provided for CosmOz sites and their locations of these points are in "cosmoz_site_info.csv". The Python code for the API model is available. The forcing data used was GPM rainfall and air temperature anomalies. These parameters were calibrated and are listed in "API_parameters.csv". Water balance model: There are two models available, one that is point-based and used for running on small datasets, e.g. soil moisture probes, and one that is raster-based, which is faster and can be used for obtaining maps of soil moisture. Both models rely on daily evapotranspiration (ET), bucket size, and rainfall as the 3 inputs. These data for these 3 inputs can be accessed publicly: 8-day MODIS evapotranspiration data can be downloaded from the USGS website or directly from google earth engine, rainfall data can be accessed from SILO through the Long Paddock website, and soil data can be accessed from the eSoil website. 5 bucket sizes were used for both models, 0-5 cm, 5-15 cm, 15-30 cm, 30-60 cm, and 60-100 cm. The "ET&rain4WBmodel.r" file contains code that organises daily data for model execution. Alternatively, provided example datasets can be used to run the model. R code is available for both models. Data-driven approach: This pipeline uses a Gaussian Process regression model/workflow that can be used to predict soil moisture in space and time. This model uses a complex base function that can capture underlying trends in soil data. Each workflow consists of 4 steps: 1. data-preprocessing 2. feature analysis and selection 2. model training, optimisation, evaluation, and selection 4. generating prediction and uncertainty maps. Python code is available for this model and an example dataset is available that is already pre-processed. The "Methods.pdf" file discusses feature selection and model details and the "README.md" file contains in depth information about how this model works and gives example outputs. The software pipelines are stored in a public GitHub repository (https://github.com/thomasfabishop/soilwaternow) and are also stored on the USYD-RDS at \\shared.sydney.edu.au\research-data\PRJ-soilwaternowarchive. The pipelines are open access under a creative commons license (CC-BY 4.0). Please contact Dr Patrick Filippi ([email protected]) for further information.
See less
Date
2025-09-01Source title
GRDC project number UOS2002-001RTXFunding information
GRDC
Licence
Creative Commons Attribution 4.0Rights statement
Data is publicly available to third parties under creative commons licence with attributionFaculty/School
Faculty of Science, Sydney Institute of Agriculture (SIA)Faculty of Science, School of Life and Environmental Sciences
Share