Checking for non-preferred file/folder path names (may take a long time depending on the number of files/folders) ...
This resource contains some files/folders that have non-preferred characters in their name. Show non-conforming files/folders.
This resource contains content types with files that need to be updated to match with metadata changes. Show content type files that need updating.
Evaluating the use of Soil Moisture, January Baseflow, and Snow Water Equivalent storage indicators to enhance Colorado Basin River Forecast Center water supply forecasts
| Authors: |
|
|
|---|---|---|
| Owners: |
|
This resource does not have an owner who is an active HydroShare user. Contact CUAHSI (help@cuahsi.org) for information on this resource. |
| Type: | Resource | |
| Storage: | The size of this resource is 32.1 MB | |
| Created: | Jul 16, 2025 at 4:59 p.m. (UTC) | |
| Last updated: | Nov 17, 2025 at 5:55 a.m. (UTC) | |
| Citation: | See how to cite this resource | |
| Content types: | Geographic Feature Content |
| Sharing Status: | Public |
|---|---|
| Views: | 139 |
| Downloads: | 106 |
| +1 Votes: | Be the first one to this. |
| Comments: | No comments (yet) |
Abstract
This resource provides the dataset and Python workflows used to evaluate improved water supply forecasting for the Upper Colorado River Basin and the Great Salt Lake Basin areas served by the Colorado Basin River Forecast Center (CBRFC). The study focuses on enhancing April–July runoff volume predictions by explicitly incorporating three key hydrologic storage indicators—January baseflow, soil moisture, and snow water equivalent (SWE)—alongside the official CBRFC Most Probable (MP) water supply forecast. These indicators represent antecedent conditions that help explain variability in spring snowmelt-driven streamflow across snow-dominated watersheds.
Data and Python code used to implement the multiple linear regression (MLR) models, station data processing, and spatial analysis are included here. The research found that combining multiple storage indicators with the CBRFC forecast leads to gains in predictive skill, particularly in headwater basins where natural hydrologic processes are less influenced by regulation. Among the variables evaluated, soil moisture contributed the largest improvements when added to the model.
This resource holds data and code used to compute the results reported in the MS thesis: Morovati, R., (2025), "Evaluating Use Of Multiple Hydrologic Storage Indicators To Enhance Streamflow Forecasting " MS Thesis, Civil and Environmental Engineering, Utah State University.
Subject Keywords
Coverage
Spatial
Content
readme.md
Last Updated: 11.15.2025
Contact: reza.morovati@usu.edu
This resource contains data, spatial layers, and Python scripts (Jupyter Notebook) used to build and evaluate a Multiple Linear Regression (MLR) model for enhance NOAA Colorado Basin River Forecast Center (CBRFC) February–April water supply forecast across watersheds in the Upper Colorado River Basin and Great Salt Lake Basin.
Overview
This project brings together watershed boundaries, predictor datasets (storage indicators), and model-evaluation tools to construct and analyze MLR-based water-supply forecasts. The notebook guides the user through:
- Preparing and unzipping the input datasets
- Loading spatial watershed files (boundary and buffer variants)
- Building and evaluating multiple linear regression models for each watershed
- Visualizing spatial and statistical model performance
- Comparing model accuracy between buffer-based and boundary-based watershed definitions
- Investigating predictor collinearity to support model refinement
Data Sources
This notebook uses prepared datasets contained in the ZIP files included with the repository. These include:
Watershed Spatial Data
- Boundary shapefiles for each forecasting point and related watershed
- 10 km buffer polygons generated around watershed boundaries
Predictor Data
- Snow Water Equivalent (SWE) data from SNOTEL sites (NRCS)
- Soil Moisture from both SCAN and SNOTEL sites (NRCS)
- January Baseflow data from USGS NWIS
Model Result Files
- MLR output summaries (NSE, KGE) for each watershed
- Side-by-side comparisons for buffer vs boundary watershed definitions
Code Structure
PythonCodes / Jupyter Scripts
MLR_CBRFC_Water_Supply_Forecast_FEBAPR.ipynb
Jupyter Notebook script performing the full workflow:
- Unzips input files to correct directories
- Loads watershed polygons, buffer polygons, and station metadata
- Imports predictor time series and joins them with station data
- Builds MLR models to predict Feb–Apr water supply
- Calculates performance metrics for each watershed
- Generates spatial model-performance maps
- Compares buffer vs boundary MLR results
- Conducts predictor-collinearity analysis
- Exports summary statistics and figures
Each code block in this notebook is described below.
Block-by-Block Notebook Guide
Block 1 — Unzipping Input Files
This block extracts all required data files from provided ZIP archives.If this step is skipped, the rest of the notebook cannot run because the data directories will be empty.
Block 2 — Project Description
Load Watersheds, Station Metadata, and Predictor Inputs
The script imports watershed boundaries (both boundary and buffer) along with all station-specific datasets such as SWE, soil moisture, baseflow, and CBRFC CMP data. It filters out stations that lack required data or fall outside the Colorado + Great Salt Lake study area.
Build Multiple Linear Regression (MLR) Models for Each Station
For every station, the code assembles available predictors and runs eight different MLR scenarios, each using different combinations of SWE, soil moisture, baseflow, and CMP. It applies a 70/30 train–test split based on years, trains each model, and generates monthly predictions.
Compute Performance Metrics (Train + Test)
For each model scenario, it calculates skill metrics including NSE, and KGE, both overall and month-by-month. Results are stored separately for boundary and buffer watershed versions.
Save Outputs and Summaries for All Stations
The script exports:
- Model predictions for each station/scenario
- A combined summary of all performance metrics
- A monthly metrics summary
- Lists of included and excluded stations
- Counts of how many stations were successfully processed
This provides a complete dataset for comparing model skill and evaluating the effect of boundary vs buffer watershed definitions.
Block 3 — Loading Spatial Libraries and Watershed Data
- The script loads all spatial datasets (states, rivers, lakes, Great Salt Lake Basin, Upper Colorado Basin) along with USGS station metadata and the monthly MLR performance results (NSE and KGE).
- It filters the model results to include only the buffer-based watershed type, selects each scenario and month, and merges the station performance metrics with station coordinates.
- For every scenario and month, it creates spatial maps that display Kling-Gupta Efficiency (KGE) and Nash-Sutcliffe Efficiency (NSE) as colored point layers over the basins, allowing the user to visually assess how well each model scenario performs at each station.
- It saves each set of maps to a folder organized by watershed type, producing a complete spatial visualization suite that shows the performance of all model scenarios across all months.
Block 4 — Spatial Visualization of Model Performance
This portion reads in a CSV file containing performance results for each watershed (NSE, KGE). It then joins these numerical results with the watershed polygons and produces:
- Colored maps of model skill
- Basin-level comparison figures across months and model configurations
These maps allow the user to see where the forecasting model performs well and where it struggles.
Block 5 — Performance Visualization Across All Watersheds
- The script loads the monthly performance metrics for all buffer-based stations and prepares them for visualization, including consistent scenario naming and month labeling.
- A custom blue-green-red colormap is created to show model skill, where darker blues represent stronger performance and reds indicate weaker performance.
- For each scenario and each spring month, the script builds grouped box-and-dot plots that display the full distribution of KGE and NSE values across stations, allowing direct comparison of how each predictor combination behaves through time.
- The resulting figures summarize model stability and variability, helping highlight which scenarios consistently perform well and which are more sensitive to month-to-month changes.
Block 6 — Comparing Buffer vs Boundary Watersheds
This section loads two different MLR result files:
- Boundary-based results
- Buffer-based results
Block 7 — Collinearity Analysis of Predictors
This block inspects whether the predictors used in the MLR models are highly correlated with each other.
Steps include:
- Loading all predictor datasets
- Merging predictor tables for each station into a single combined dataset
- Calculating a correlation matrix
- Plotting a heatmap showing predictor relationships
- Identifying redundant variables that may degrade model performance
This helps clean and refine the predictor set to improve the regression model’s stability and accuracy.
Data Services
Related Resources
| This resource is described by | Morovati, R., (2025), "Evaluating Use Of Multiple Hydrologic Storage Indicators To Enhance Streamflow Forecasting " MS Thesis, Civil and Environmental Engineering, Utah State University. |
Credits
Funding Agencies
This resource was created using funding from the following sources:
| Agency Name | Award Title | Award Number |
|---|---|---|
| National Science Foundation | HDR Institute: Geospatial Understanding through an Integrative Discovery Environment | 2118329 |
| Utah Water Research Laboratory | Graduate Research Assistantship |
How to Cite
This resource is shared under the Creative Commons Attribution CC BY.
http://creativecommons.org/licenses/by/4.0/
Comments
There are currently no comments
New Comment