Checking for non-preferred file/folder path names (may take a long time depending on the number of files/folders) ...

Processed Forecast Data and Evaluation Code for 0–24 Month Colorado River Streamflow Forecasts


Authors:
Owners: This resource does not have an owner who is an active HydroShare user. Contact CUAHSI (help@cuahsi.org) for information on this resource.
Type: Resource
Storage: The size of this resource is 132.6 MB
Created: May 15, 2026 at 11:24 p.m. (UTC)
Last updated: May 21, 2026 at 5:04 p.m. (UTC) (Metadata update)
Published date: May 21, 2026 at 5:04 p.m. (UTC)
DOI: 10.4211/hs.c75777eb63ff49c48b90bb37e6c7b00d
Citation: See how to cite this resource
Sharing Status: Published
Views: 129
Downloads: 3
+1 Votes: Be the first one to 
 this.
Comments: No comments (yet)

Abstract

This HydroShare resource supports a study on seasonal-to-biennial streamflow forecasting in the Colorado River Basin. The resource contains processed forecast inputs and outputs, R functions, and example code associated with a 0–24-month lead forecasting framework for April–July naturalized flow at Lees Ferry, Arizona. The framework combines information from Ensemble Streamflow Prediction (ESP), North American Multi-Model Ensemble (NMME) forecasts, antecedent PRISM hydroclimate variables, and large-scale ocean–atmosphere climate indices.

The main experiment documented in this resource uses leave-P-year-out cross-validation with P = 1 for the 1983–2024 hindcast period. Machine-learning models, including Random Forest and Gradient Boosting Machine approaches, are evaluated using deterministic and probabilistic forecast verification metrics. The resource is intended to support reproducibility of the main forecast evaluation, including lead-dependent model performance and metric calculations. Raw external datasets are not redistributed here; users should refer to the original data providers for NMME, ESP, PRISM, and naturalized flow data.

Subject Keywords

Coverage

Spatial

Coordinate System/Geographic Projection:
WGS 84 EPSG:4326
Coordinate Units:
Decimal degrees
Place/Area Name:
Colorado River Basin, western United States
North Latitude
43.0000°
East Longitude
-105.0000°
South Latitude
31.0000°
West Longitude
-115.0000°

Temporal

Start Date:
End Date:

Content

README.md


output: html_document: default pdf_document: default


Processed Forecast Data and Evaluation Code for 0–24 Month Colorado River Streamflow Forecasts

Overview

This HydroShare resource supports a study on seasonal-to-biennial streamflow forecasting in the Colorado River Basin. The resource provides processed predictor data, machine-learning reforecast outputs, ESP baseline reforecasts, evaluation functions, and example R scripts for the main leave-P-year-out cross-validation experiment with P = 1.

The target variable is April–July naturalized flow at Lees Ferry, Arizona, evaluated for water years 1983–2024 across forecast lead times from 0 to 24 months. The machine-learning framework combines information from Ensemble Streamflow Prediction (ESP), North American Multi-Model Ensemble (NMME) forecasts, antecedent hydroclimate variables, and large-scale climate indices.

The term reforecast is used throughout this resource to refer to retrospective forecasts generated for past target years through cross-validation. Some functions and stored object names retain the term hindcast for compatibility with the original project workflow.

Repository structure

```text Data_Inputs/ Data_Sources.rds

Data_Outputs/ ESP/ Figures/ Metrics/ Reforecasts/

R_code/ functions_data_toolkit.R functions_machine_learning.R functions_metrics.R functions_nonlinear_correlation.R run_reforecast_LPOCV1_example.R evaluate_reforecast_LPOCV1_example.R

README.md data_dictionary.md ```

Contents

Data_Inputs/

  • Data_Sources.rds contains the processed predictor data used by the machine-learning reforecast workflow. This object was derived from the original project preprocessing workflow, including ESP, NMME, PRISM hydroclimate variables, observed naturalized flow, and climate-index covariates.

The full raw-data preprocessing workflow used to create Data_Sources.rds is not included because several inputs are externally hosted and/or large. This resource instead provides the processed object needed to reproduce the main reforecast experiment and evaluation workflow.

Data_Outputs/ESP/

Contains processed ESP reforecasts used as the baseline comparison for the main evaluation.

Expected file:

text ESP_reforecast_1983_2024_leads0_24.rds

Data_Outputs/Reforecasts/

Contains the machine-learning reforecast outputs for the LPOCV P = 1 experiment.

Expected combined file:

text Reforecasts_1983_2024_LPOCV1yr.rds

Lead-specific reforecast files may also be stored in this folder if the reforecast generation script is rerun.

Data_Outputs/Metrics/

Stores deterministic and probabilistic verification outputs generated by evaluate_reforecast_LPOCV1_example.R.

Typical outputs include:

text Deterministic_Metrics_LPOCV1yr.rds Probabilistic_Metrics_Raw_LPOCV1yr.rds Probabilistic_Metrics_LPOCV1yr.rds Deterministic_Summary_SelectedLeads_LPOCV1yr.rds Probabilistic_Summary_SelectedLeads_LPOCV1yr.rds

Data_Outputs/Figures/

Stores lightweight diagnostic figures generated by the example evaluation script. These figures are intended for reproducibility checks and are not necessarily identical to final manuscript figures.

R_code/

  • functions_data_toolkit.R: general data-formatting and utility functions.
  • functions_machine_learning.R: functions for predictor preparation, machine-learning model training, reforecast generation, and ESP processing.
  • functions_metrics.R: deterministic and probabilistic forecast verification functions.
  • functions_nonlinear_correlation.R: nonlinear-correlation functions used in the original predictor-screening workflow.
  • run_reforecast_LPOCV1_example.R: example script to rerun the main LPOCV P = 1 machine-learning reforecast experiment from the processed predictor object.
  • evaluate_reforecast_LPOCV1_example.R: example script to evaluate the processed LPOCV P = 1 reforecasts and generate metrics and diagnostic figures.

How to run

Option 1: Evaluate the provided reforecasts

This is the recommended starting point. It uses the processed reforecast outputs already included in the resource and does not retrain the machine-learning models.

From the root folder of the resource, run:

r source("R_code/evaluate_reforecast_LPOCV1_example.R")

This script reads the processed ML and ESP reforecasts, constructs the equal-weight GBM + RF ensemble, computes deterministic and probabilistic metrics, and saves outputs in:

r Data_Outputs/Metrics/ Data_Outputs/Figures/

Option 2: Rerun the LPOCV P = 1 machine-learning reforecasts

To rerun the machine-learning reforecast experiment from the processed predictor object, run:

r source("R_code/run_reforecast_LPOCV1_example.R")

This script uses Data_Inputs/Data_Sources.rds and calls hindcast.flow() from functions_machine_learning.R. Runtime can be substantial because models are retrained for each target year and lead time.

Main experiment settings

  • Target variable: April–July naturalized flow at Lees Ferry, Arizona
  • Evaluation period: 1983–2024
  • Lead times: 0–24 months
  • Cross-validation: leave-P-year-out cross-validation, P = 1
  • Ensemble size: 2,000 simulations per model configuration, unless otherwise noted in the R scripts
  • Main ML model groups:
  • RF: combined Random Forest-type models
  • GBM: combined Gradient Boosting-type models
  • GBM + RF: equal-weight ensemble combining RF and GBM members

Software requirements

The scripts were developed in R. Required packages include:

r dplyr tidyr purrr forcats scoringRules randomForest ranger randomForestSRC gbm xgboost foreach doParallel parallel data.tree DiagrammeR ggplot2 ggh4x scales lubridate

Package versions may affect exact numerical reproducibility for stochastic model training. The evaluation script should be more stable than the full reforecast-generation script because it reads precomputed outputs.

Notes on reproducibility

This resource is designed to reproduce the main LPOCV P = 1 reforecast evaluation from processed inputs and outputs. It does not redistribute all raw external datasets used in the original preprocessing workflow. Users interested in reconstructing the full workflow from raw data should obtain the original data from the relevant providers and follow the preprocessing decisions described in the associated manuscript.

Acknowledgments

This research was supported by the National Oceanic and Atmospheric Administration (NOAA) Climate Program Office (CPO) through the Modeling, Analysis, Predictions, and Projections (MAPP) program under Competition ID 3076730 — MAPP-NIDIS: Science for the 21st Century Western U.S. Hydroclimate (award number NOAA-OAR-CPO-2023-2007440). Funding was provided to the University of Colorado Boulder, the NOAA Climate Prediction Center (CPC), and project collaborators.

Recommended citation

Please cite this HydroShare resource and the associated manuscript when using these data or code.

Resource citation:

Jerez, C., Balaji, R., LaJoie, E., Rosencrans, M., Baker, S., Miller, P., Shanahan, S., & Zagona, E. (2026). Processed forecast data and evaluation code for 0–24 month Colorado River streamflow forecasts. HydroShare. https://doi.org/10.4211/hs.c75777eb63ff49c48b90bb37e6c7b00d

Associated manuscript:

Jerez, C., Balaji, R., LaJoie, E., Rosencrans, M., Baker, S., Miller, P., Shanahan, S., & Zagona, E. (in preparation). Improving interannual water supply forecasts in the Colorado River basin using multi-model ensembles and machine learning.

Contact

Catalina Jerez
University of Colorado Boulder
Email: catalina.jerez@colorado.edu

Credits

Funding Agencies

This resource was created using funding from the following sources:
Agency Name Award Title Award Number
National Oceanic and Atmospheric Administration Modeling, Analysis, Predictions, and Projections (MAPP) Program — MAPP-NIDIS: Science for the 21st Century Western U.S. Hydroclimate NOAA-OAR-CPO-2023-2007440

How to Cite

Jerez, C., B. Rajagopalan, E. LaJoie, M. Rosencrans, S. Baker, W. P. Miller, S. Shanahan, E. Zagona (2026). Processed Forecast Data and Evaluation Code for 0–24 Month Colorado River Streamflow Forecasts, HydroShare, https://doi.org/10.4211/hs.c75777eb63ff49c48b90bb37e6c7b00d

This resource is shared under the Creative Commons Attribution CC BY.

http://creativecommons.org/licenses/by/4.0/
CC-BY

Comments

There are currently no comments

New Comment

required