Checking for non-preferred file/folder path names (may take a long time depending on the number of files/folders) ...

This resource contains some files/folders that have non-preferred characters in their name. Show non-conforming files/folders.

This resource contains content types with files that need to be updated to match with metadata changes. Show content type files that need updating.

Data Repository for 'Bootstrap aggregation and cross-validation methods to reduce overfitting in reservoir control policy search'

Authors:
Owners:		This resource does not have an owner who is an active HydroShare user. Contact CUAHSI (help@cuahsi.org) for information on this resource.
Type:	Resource
Storage:	The size of this resource is 8.3 MB
Created:	Jan 20, 2020 at 7:42 p.m. (UTC)
Last updated:	Jun 24, 2020 at 1:46 p.m. (UTC) (Metadata update)
Published date:	Jun 24, 2020 at 1:46 p.m. (UTC)
DOI:	10.4211/hs.b8f87a7b680d44cebfb4b3f4f4a6a447
Citation:	See how to cite this resource

Sharing Status:	Published
Views:	2795
Downloads:	43
+1 Votes:	Be the first one to this.
Comments:	No comments (yet)

Abstract

Policy search methods provide a heuristic mapping between observations and decisions and have been widely used in reservoir control studies. However, recent studies have observed a tendency for policy search methods to overfit to the hydrologic data used in training, particularly the sequence of flood and drought events. This technical note develops an extension of bootstrap aggregation (bagging) and cross-validation techniques, inspired by the machine learning literature, to improve control policy performance on out-of-sample hydrology. We explore these methods using a case study of Folsom Reservoir, California using control policies structured as binary trees and daily streamflow resampling based on the paleo-inflow record. Results show that calibration-validation strategies for policy selection and certain ensemble aggregation methods can improve out-of-sample tradeoffs between water supply and flood risk objectives over baseline performance given fixed computational costs. These results highlight the potential to improve policy search methodologies by leveraging well-established model training strategies from machine learning.

Subject Keywords

Coverage

Spatial

Coordinate System/Geographic Projection:

WGS 84 EPSG:4326

Coordinate Units:

Decimal degrees

Place/Area Name:

Folsom Reservoir Watershed

North Latitude

39.0000°

East Longitude

-120.0000°

South Latitude

38.0000°

West Longitude

-121.0000°

Temporal

Start Date:
End Date:

Content

Learn more about the BagIt download

Select a file to see file type metadata.

README.txt

README file describing data and code in repository for:
	Manuscript: 'Bootstrap aggregation and cross-validation methods to reduce overfitting in reservoir control policy search'
	Authors: Brodeur, Z, Herman, J.D., and Steinschneider, S.

Current as of 3 May 2020

Main directory 'bagging_cross-val_policy-search' files:
'ensemble_optimize.py' Python code to optimize 30x policy trees for both historical and bootstrapped cases
'validate_pickbest.py' Python code to analyze scaled policy scores based on validation performance
'calibrate_pickbest.py' Python code to analyze scaled policy scores based on calibration performance
'simulate_totcost_metrics.py' Python code to produce summed/scaled total cost metrics
'simulate_metrics.py' Python code to produce separate water supply and flood overage metrics for primary figures
'single_runs.py' Python code to do individual runs of the Folsom model
'results_plot.r' R code to plot primary results and calculate significance values
'results_tot_plot.r' R code to plot addition summed/scaled cost results
'folsom_model_optimize.py' Folsom reservoir model configured for optimizing policies via 'ensemble_optimize.py'
'folsom_model_ensemble.py' Folsom reservoir model configured for output analysis for both ensemble mode and single policies

'data' repository
	'generate_paleo-bootstrap_training.R' R code to generate training period (1982-2016) paleo-bootstrap resampled datasets
	'generate_paleo-bootstrap_test.R' R code to generate test period (1923-1981) paleo-bootstrap resampled datasets
	'generate_paleo-bootstrap_test_review.R' same as 'generate..test.R' but with random resampling of paleo flows
	'folsom-daily-w2016.csv' Inflow, Outflow, Storage, Evap timeseries for Folsom Reservoir, 1922-2016
	'demand.txt' Average daily demand by day of water year for Folsom reservoir
	'Sacramento_paleo_inflow.csv' Paleo annual inflow data for American River at Folsom Reservoir, 900-2012 CE
	'plot-static-trees.py' Script to create graphical depictions of policy trees
	Sub-directory('resamp')
		'inflow_forecast_0.csv,... ""_29.csv' Paleo-bootstrapped inflow sequences based on training period, WY 1982-2016
	Sub-directory('resamp_test')
		'inflow_forecast_0.csv,... ""_29.csv' Paleo-bootstrapped inflow sequences based on test period, WY 1923-1981
	Sub-directory('resamp_test1','resamp_test1','resamp_test1')
		'inflow_forecast_0.csv,... ""_29.csv' same as resamp test, but with resampled paleo annual flows in 1, 2, and 3 year blocks
	Sub-directory('figs')
		Output directory for 'plot-static-trees' script
		

'output' repository
	'results_2381.csv' Results csv file for primary figure plots via 'results_plot.r' for testing period
	'results_8216.csv' "" for training period
	'results_2381_rs1,rs2,rs3.csv' same as 'results_2381.csv' but with resampled paleo annual flows in 1, 2, and 3 year blocks
	'results_tcost_2381.csv' Results csv file for summed/scaled cost boxplots via 'results_tot_plot.r' for testing period
	'results_tcost_8216.csv' "" for training period
	Sub-directory('p1_ens')
		'snapshots-forecast-p1-gw0.0TAF-seed-0.pkl,...""-29.pkl' Policy trees 0-29 all trained on historical 1982-2016 data
	Sub-directory('p1_ens_resamp')
		'snapshots-forecast-p1-gw0.0TAF-seed-0.pkl,...""-29.pkl' Policy trees 0-29 each trained on bootstrapped 1982-2016 data

'pdf' repository
	Plots of paleo-bootstrap resampling diagnostics as described in R resampling code

'ptreeopt' repository
	Python modules to run tree-based evolutionary optimization algorithm

'graphvis'
	Python package for plotting graphical models (tree structures)

Steps to recreate results:
1) Generate bootstrap samples via 'generate_paleo..R' scripts
2) Optimize policies to both historical and bootstrapped training data via 'ensemble_optimize.py' script
3) Find best policies via calibration and validation procedures via 'calibrate_pickbest.py' and 'validate_pickbest.py' scripts
4) Run 'simulate_metrics.py' with manual selections of best policies from step 3
5) Plot results via 'results_plot.r' script

Related Resources

This resource is referenced by

Brodeur, Z., Herman, J. D., & Steinschneider, S. S. (2020). Bootstrap aggregation and cross-validation methods to reduce overfitting in reservoir control policy search. Accepted in AGU Journal Water Resources Research, June 2020

How to Cite

Brodeur, Z. P., S. S. Steinschneider, J. D. Herman (2020). Data Repository for 'Bootstrap aggregation and cross-validation methods to reduce overfitting in reservoir control policy search', HydroShare, https://doi.org/10.4211/hs.b8f87a7b680d44cebfb4b3f4f4a6a447

This resource is shared under the Creative Commons Attribution CC BY.

http://creativecommons.org/licenses/by/4.0/

Comments

There are currently no comments