Checking for non-preferred file/folder path names (may take a long time depending on the number of files/folders) ...
This resource contains some files/folders that have non-preferred characters in their name. Show non-conforming files/folders.
This resource contains content types with files that need to be updated to match with metadata changes. Show content type files that need updating.
| Authors: |
|
|
|---|---|---|
| Owners: |
|
This resource does not have an owner who is an active HydroShare user. Contact CUAHSI (help@cuahsi.org) for information on this resource. |
| Type: | Resource | |
| Storage: | The size of this resource is 132.6 MB | |
| Created: | May 15, 2026 at 11:24 p.m. (UTC) | |
| Last updated: | May 21, 2026 at 5:04 p.m. (UTC) (Metadata update) | |
| Published date: | May 21, 2026 at 5:04 p.m. (UTC) | |
| DOI: | 10.4211/hs.c75777eb63ff49c48b90bb37e6c7b00d | |
| Citation: | See how to cite this resource |
| Sharing Status: | Published |
|---|---|
| Views: | 129 |
| Downloads: | 3 |
| +1 Votes: | Be the first one to this. |
| Comments: | No comments (yet) |
Abstract
This HydroShare resource supports a study on seasonal-to-biennial streamflow forecasting in the Colorado River Basin. The resource contains processed forecast inputs and outputs, R functions, and example code associated with a 0–24-month lead forecasting framework for April–July naturalized flow at Lees Ferry, Arizona. The framework combines information from Ensemble Streamflow Prediction (ESP), North American Multi-Model Ensemble (NMME) forecasts, antecedent PRISM hydroclimate variables, and large-scale ocean–atmosphere climate indices.
The main experiment documented in this resource uses leave-P-year-out cross-validation with P = 1 for the 1983–2024 hindcast period. Machine-learning models, including Random Forest and Gradient Boosting Machine approaches, are evaluated using deterministic and probabilistic forecast verification metrics. The resource is intended to support reproducibility of the main forecast evaluation, including lead-dependent model performance and metric calculations. Raw external datasets are not redistributed here; users should refer to the original data providers for NMME, ESP, PRISM, and naturalized flow data.
Subject Keywords
Coverage
Spatial
Temporal
| Start Date: | |
|---|---|
| End Date: |
Content
README.md
output: html_document: default pdf_document: default
Processed Forecast Data and Evaluation Code for 0–24 Month Colorado River Streamflow Forecasts
Overview
This HydroShare resource supports a study on seasonal-to-biennial streamflow forecasting in the Colorado River Basin. The resource provides processed predictor data, machine-learning reforecast outputs, ESP baseline reforecasts, evaluation functions, and example R scripts for the main leave-P-year-out cross-validation experiment with P = 1.
The target variable is April–July naturalized flow at Lees Ferry, Arizona, evaluated for water years 1983–2024 across forecast lead times from 0 to 24 months. The machine-learning framework combines information from Ensemble Streamflow Prediction (ESP), North American Multi-Model Ensemble (NMME) forecasts, antecedent hydroclimate variables, and large-scale climate indices.
The term reforecast is used throughout this resource to refer to retrospective forecasts generated for past target years through cross-validation. Some functions and stored object names retain the term hindcast for compatibility with the original project workflow.
Repository structure
```text Data_Inputs/ Data_Sources.rds
Data_Outputs/ ESP/ Figures/ Metrics/ Reforecasts/
R_code/ functions_data_toolkit.R functions_machine_learning.R functions_metrics.R functions_nonlinear_correlation.R run_reforecast_LPOCV1_example.R evaluate_reforecast_LPOCV1_example.R
README.md data_dictionary.md ```
Contents
Data_Inputs/
Data_Sources.rdscontains the processed predictor data used by the machine-learning reforecast workflow. This object was derived from the original project preprocessing workflow, including ESP, NMME, PRISM hydroclimate variables, observed naturalized flow, and climate-index covariates.
The full raw-data preprocessing workflow used to create Data_Sources.rds is not included because several inputs are externally hosted and/or large. This resource instead provides the processed object needed to reproduce the main reforecast experiment and evaluation workflow.
Data_Outputs/ESP/
Contains processed ESP reforecasts used as the baseline comparison for the main evaluation.
Expected file:
text
ESP_reforecast_1983_2024_leads0_24.rds
Data_Outputs/Reforecasts/
Contains the machine-learning reforecast outputs for the LPOCV P = 1 experiment.
Expected combined file:
text
Reforecasts_1983_2024_LPOCV1yr.rds
Lead-specific reforecast files may also be stored in this folder if the reforecast generation script is rerun.
Data_Outputs/Metrics/
Stores deterministic and probabilistic verification outputs generated by evaluate_reforecast_LPOCV1_example.R.
Typical outputs include:
text
Deterministic_Metrics_LPOCV1yr.rds
Probabilistic_Metrics_Raw_LPOCV1yr.rds
Probabilistic_Metrics_LPOCV1yr.rds
Deterministic_Summary_SelectedLeads_LPOCV1yr.rds
Probabilistic_Summary_SelectedLeads_LPOCV1yr.rds
Data_Outputs/Figures/
Stores lightweight diagnostic figures generated by the example evaluation script. These figures are intended for reproducibility checks and are not necessarily identical to final manuscript figures.
R_code/
functions_data_toolkit.R: general data-formatting and utility functions.functions_machine_learning.R: functions for predictor preparation, machine-learning model training, reforecast generation, and ESP processing.functions_metrics.R: deterministic and probabilistic forecast verification functions.functions_nonlinear_correlation.R: nonlinear-correlation functions used in the original predictor-screening workflow.run_reforecast_LPOCV1_example.R: example script to rerun the main LPOCV P = 1 machine-learning reforecast experiment from the processed predictor object.evaluate_reforecast_LPOCV1_example.R: example script to evaluate the processed LPOCV P = 1 reforecasts and generate metrics and diagnostic figures.
How to run
Option 1: Evaluate the provided reforecasts
This is the recommended starting point. It uses the processed reforecast outputs already included in the resource and does not retrain the machine-learning models.
From the root folder of the resource, run:
r
source("R_code/evaluate_reforecast_LPOCV1_example.R")
This script reads the processed ML and ESP reforecasts, constructs the equal-weight GBM + RF ensemble, computes deterministic and probabilistic metrics, and saves outputs in:
r
Data_Outputs/Metrics/
Data_Outputs/Figures/
Option 2: Rerun the LPOCV P = 1 machine-learning reforecasts
To rerun the machine-learning reforecast experiment from the processed predictor object, run:
r
source("R_code/run_reforecast_LPOCV1_example.R")
This script uses Data_Inputs/Data_Sources.rds and calls hindcast.flow() from functions_machine_learning.R. Runtime can be substantial because models are retrained for each target year and lead time.
Main experiment settings
- Target variable: April–July naturalized flow at Lees Ferry, Arizona
- Evaluation period: 1983–2024
- Lead times: 0–24 months
- Cross-validation: leave-P-year-out cross-validation, P = 1
- Ensemble size: 2,000 simulations per model configuration, unless otherwise noted in the R scripts
- Main ML model groups:
- RF: combined Random Forest-type models
- GBM: combined Gradient Boosting-type models
- GBM + RF: equal-weight ensemble combining RF and GBM members
Software requirements
The scripts were developed in R. Required packages include:
r
dplyr
tidyr
purrr
forcats
scoringRules
randomForest
ranger
randomForestSRC
gbm
xgboost
foreach
doParallel
parallel
data.tree
DiagrammeR
ggplot2
ggh4x
scales
lubridate
Package versions may affect exact numerical reproducibility for stochastic model training. The evaluation script should be more stable than the full reforecast-generation script because it reads precomputed outputs.
Notes on reproducibility
This resource is designed to reproduce the main LPOCV P = 1 reforecast evaluation from processed inputs and outputs. It does not redistribute all raw external datasets used in the original preprocessing workflow. Users interested in reconstructing the full workflow from raw data should obtain the original data from the relevant providers and follow the preprocessing decisions described in the associated manuscript.
Acknowledgments
This research was supported by the National Oceanic and Atmospheric Administration (NOAA) Climate Program Office (CPO) through the Modeling, Analysis, Predictions, and Projections (MAPP) program under Competition ID 3076730 — MAPP-NIDIS: Science for the 21st Century Western U.S. Hydroclimate (award number NOAA-OAR-CPO-2023-2007440). Funding was provided to the University of Colorado Boulder, the NOAA Climate Prediction Center (CPC), and project collaborators.
Recommended citation
Please cite this HydroShare resource and the associated manuscript when using these data or code.
Resource citation:
Jerez, C., Balaji, R., LaJoie, E., Rosencrans, M., Baker, S., Miller, P., Shanahan, S., & Zagona, E. (2026). Processed forecast data and evaluation code for 0–24 month Colorado River streamflow forecasts. HydroShare. https://doi.org/10.4211/hs.c75777eb63ff49c48b90bb37e6c7b00d
Associated manuscript:
Jerez, C., Balaji, R., LaJoie, E., Rosencrans, M., Baker, S., Miller, P., Shanahan, S., & Zagona, E. (in preparation). Improving interannual water supply forecasts in the Colorado River basin using multi-model ensembles and machine learning.
Contact
Catalina Jerez
University of Colorado Boulder
Email: catalina.jerez@colorado.edu
Credits
Funding Agencies
This resource was created using funding from the following sources:
| Agency Name | Award Title | Award Number |
|---|---|---|
| National Oceanic and Atmospheric Administration | Modeling, Analysis, Predictions, and Projections (MAPP) Program — MAPP-NIDIS: Science for the 21st Century Western U.S. Hydroclimate | NOAA-OAR-CPO-2023-2007440 |
How to Cite
This resource is shared under the Creative Commons Attribution CC BY.
http://creativecommons.org/licenses/by/4.0/
Comments
There are currently no comments
New Comment