Floods are considered the most damaging of natural hazards, and their frequency and damage is predicted to increase in the future. This research aims to develop an automated methodology using statistical and machine learning approaches that can perform a probabilistic monthly flood forecast. The methodology was tested to handle multiple variables as predictors. The significance of the spatial variability of the predictors was determined through model maps using 222 hydrological reference stations in Australia. Variable screening to forecast the upper 10th percentile of flow was based on the ten best scoring variables using Random Forests (RF), and flexible forecast models were developed using Generalized Additive Models (GAM).
Results showed that the methodology can be used to sort through many variables (i.e. past streamflow, rainfall, Southern Oscillation Index (SOI), El Niño/ Southern Oscillation Modoki Index (EMI), and Pacific Sea Surface Temperatures (SST)) as predictors. It can be easily updated and it can vary spatially. The basic conceptual model assumed that Flow was a function of Antecedent conditions (=Lag rainfall), Flow memory (=Seasonality + autocorrelation), Climate effects (=SST indices) and random noise.
Lagged flow, lagged rainfall, and lagged NIÑO 1+2 were the most important predictors using a monthly one-out cross-validation (OOCV) process and a forward cross-validation process (FCV). The Gilbert Skill Score indicated that using transformed flow data performed better than using non-transformed flow data or binary data. Model performance was affected by unsupervised variable selection in the RF model; and the employed threshold (10%) which defines a flood event. Overall skill scores based on OOCV process were in the range of 0.2-0.5 indicating reasonable forecast skill.