Kristen L Underwood

University of Vermont | Research Associate Professor

 Recent Activity

ABSTRACT:

Underwood et al. (2023) have recently introduced the tandem evolutionary algorithm (TEVA) of Hanley et al. (2020) to the water resources and ecology domains, and applied it to identify features (catchment-scale attributes) and feature interactions important in determining patterns in Dissolved Organic Carbon across the continental US. TEVA has particular advantages for feature selection in large, multivariate observational data sets of complex systems like riverscapes or ecosystems, and has been shown to outperform logistic regression or Random Forest for identifying feature interactions and equifinality (Hanley et al., 2020; Anderson et al., 2020). TEVA finds interactions between multiple variables that may result from either additive processes or feature interactions, and not only extracts features significantly associated with a given outcome class(es), but also identifies the specific value ranges associated with those features (Underwood et al., 2023; Hanley, et al., 2020). This algorithm is also robust to issues of mixed data types (continuous, categorical), missing data, censored data, skewed distributions, and unbalanced target classes or clusters (Hanley et al., 2020).

When presented with n observations of p features across a study domain and a target of one or more classes or outcomes, the algorithm identifies and archives two types of clauses below a given fitness threshold. In the first pass, TEVA identifies Conjunctive Clauses (CCs) - a combination of variables that may or may not be correlated and somehow interact to produce an outcome. For example, an Extreme Flood may result from steep slopes + shallow soils + intense rainfall. A second pass of TEVA identifies Disjunctive Clauses (DCs) - a sequence of CCs that are linked with a logical “OR” statement. For example, an Extreme Flood may results from (steep slopes + shallow soils + intense rainfall) OR (high antecedent soil moisture + rainfall) OR (thick snow pack + high temperatures). DCs are multi-order, while the CCs comprising a DC can themselves range from first-order to multi-order (Underwood et al., 2023).

In this workshop, we illustrate the functionality of TEVA using a dataset of 91 observations from forested catchments across the CONUS of 54 catchment attributes inferred to have importance to DOC dynamics. Combinations of these catchment attributes were identified in CCs and DCs with high probability to be linked to an outcome class of High or Low mean DOC concentration. Target classes were assigned using Jenks natural breaks for 91 catchments with sufficient (≥3) observations of DOC in stream water to calculate a mean value. Originally, computation of TEVA was performed in the MATLAB programming language; the codebase has now been transferred to the open-source coding language Python, and is accessed through CUAHSI JupyterHub.

Show More

ABSTRACT:

Here we provide the data and R scripts to complete the analyses and create the figures presented in the manuscript titled, “Solute export patterns across the contiguous United States” by Kincaid et al. 2024 at Hydrological Processes. Importantly, this resource contains paired solute concentration (C) and discharge (Q) data for 11 solutes from CAMELS-Chem (Sterle et al. 2024; https://doi.org/10.5194/hess-28-611-2024). This relational database was built upon the CAMELS dataset (https://doi.org/10.5194/hess-21-5293-2017), an existing dataset of catchment and hydroclimatic attributes from relatively undisturbed catchments across the contiguous United States. The version of CAMELS-Chem provided here has US Geological Survey (USGS) National Water Information System (NWIS) C and Q data for 506 catchments. C and Q measurements span from 1898 to 2020 with the first paired C-Q sample occurring in 1924. Solutes include aluminum (Al), calcium (Ca), chloride (Cl), dissolved organic C and N (DOC, DON), magnesium (Mg), nitrate (NO3), potassium (K), silica (Si), sodium (Na), and sulfate (SO4). Of note, a shorter version of the CAMELS-Chem database that spans from 1980 to 2018, but includes data for more stream water quality constituents and atmospheric deposition data is described in CAMELS-Chem (Sterle et al. 2024; https://doi.org/10.5194/hess-28-611-2024) and available for download via Hydroshare (http://www.hydroshare.org/resource/841f5e85085c423f889ac809c1bed4ac).

The R scripts and data files provided in this resource are intended to allow users to replicate the tables and figures in the Kincaid et al. manuscript. Specifically, we provide all files to complete the analyses coded in in the R script 9_analyses_figures_for_manuscript.R. However, other R scripts and data files provided should allow users to replicate intermediate steps in the analyses as well. See the README file for more details, but analyses provided in the R scripts include: modeling C-Q relationships with the power-law function using data-driven Bayesian segmented regression; conducting hierarchical clustering to group catchments based on catchment attributes; building random forest models to select catchment attribute correlates of C-Q metrics; conducting flow-duration exceedance probability analyses; and general code for figures, tables, and other statistics presented in the Kincaid et al. manuscript.

The metadata for the CAMELS-Chem dataset (camels_chem_all_2022-02-25.csv) is available in camels_chem_metadata.csv

Show More

 Contact

 Author Identifiers

Resources
All 0
Collection 0
Resource 0
App Connector 0
Resource Resource
Code and Data for CAMELS-Chem Concentration-Discharge Analysis
Created: July 17, 2023, 11:15 p.m.
Authors: Kincaid, Dustin · Kristen Underwood

ABSTRACT:

Here we provide the data and R scripts to complete the analyses and create the figures presented in the manuscript titled, “Solute export patterns across the contiguous United States” by Kincaid et al. 2024 at Hydrological Processes. Importantly, this resource contains paired solute concentration (C) and discharge (Q) data for 11 solutes from CAMELS-Chem (Sterle et al. 2024; https://doi.org/10.5194/hess-28-611-2024). This relational database was built upon the CAMELS dataset (https://doi.org/10.5194/hess-21-5293-2017), an existing dataset of catchment and hydroclimatic attributes from relatively undisturbed catchments across the contiguous United States. The version of CAMELS-Chem provided here has US Geological Survey (USGS) National Water Information System (NWIS) C and Q data for 506 catchments. C and Q measurements span from 1898 to 2020 with the first paired C-Q sample occurring in 1924. Solutes include aluminum (Al), calcium (Ca), chloride (Cl), dissolved organic C and N (DOC, DON), magnesium (Mg), nitrate (NO3), potassium (K), silica (Si), sodium (Na), and sulfate (SO4). Of note, a shorter version of the CAMELS-Chem database that spans from 1980 to 2018, but includes data for more stream water quality constituents and atmospheric deposition data is described in CAMELS-Chem (Sterle et al. 2024; https://doi.org/10.5194/hess-28-611-2024) and available for download via Hydroshare (http://www.hydroshare.org/resource/841f5e85085c423f889ac809c1bed4ac).

The R scripts and data files provided in this resource are intended to allow users to replicate the tables and figures in the Kincaid et al. manuscript. Specifically, we provide all files to complete the analyses coded in in the R script 9_analyses_figures_for_manuscript.R. However, other R scripts and data files provided should allow users to replicate intermediate steps in the analyses as well. See the README file for more details, but analyses provided in the R scripts include: modeling C-Q relationships with the power-law function using data-driven Bayesian segmented regression; conducting hierarchical clustering to group catchments based on catchment attributes; building random forest models to select catchment attribute correlates of C-Q metrics; conducting flow-duration exceedance probability analyses; and general code for figures, tables, and other statistics presented in the Kincaid et al. manuscript.

The metadata for the CAMELS-Chem dataset (camels_chem_all_2022-02-25.csv) is available in camels_chem_metadata.csv

Show More
Resource Resource
Tandem EVolutionary Algorithm (TEVA) of Hanley et al (2020)
Created: Dec. 20, 2024, 1:46 p.m.
Authors: Underwood, Kristen L. · Donna M. Rizzo · John P. Hanley

ABSTRACT:

Underwood et al. (2023) have recently introduced the tandem evolutionary algorithm (TEVA) of Hanley et al. (2020) to the water resources and ecology domains, and applied it to identify features (catchment-scale attributes) and feature interactions important in determining patterns in Dissolved Organic Carbon across the continental US. TEVA has particular advantages for feature selection in large, multivariate observational data sets of complex systems like riverscapes or ecosystems, and has been shown to outperform logistic regression or Random Forest for identifying feature interactions and equifinality (Hanley et al., 2020; Anderson et al., 2020). TEVA finds interactions between multiple variables that may result from either additive processes or feature interactions, and not only extracts features significantly associated with a given outcome class(es), but also identifies the specific value ranges associated with those features (Underwood et al., 2023; Hanley, et al., 2020). This algorithm is also robust to issues of mixed data types (continuous, categorical), missing data, censored data, skewed distributions, and unbalanced target classes or clusters (Hanley et al., 2020).

When presented with n observations of p features across a study domain and a target of one or more classes or outcomes, the algorithm identifies and archives two types of clauses below a given fitness threshold. In the first pass, TEVA identifies Conjunctive Clauses (CCs) - a combination of variables that may or may not be correlated and somehow interact to produce an outcome. For example, an Extreme Flood may result from steep slopes + shallow soils + intense rainfall. A second pass of TEVA identifies Disjunctive Clauses (DCs) - a sequence of CCs that are linked with a logical “OR” statement. For example, an Extreme Flood may results from (steep slopes + shallow soils + intense rainfall) OR (high antecedent soil moisture + rainfall) OR (thick snow pack + high temperatures). DCs are multi-order, while the CCs comprising a DC can themselves range from first-order to multi-order (Underwood et al., 2023).

In this workshop, we illustrate the functionality of TEVA using a dataset of 91 observations from forested catchments across the CONUS of 54 catchment attributes inferred to have importance to DOC dynamics. Combinations of these catchment attributes were identified in CCs and DCs with high probability to be linked to an outcome class of High or Low mean DOC concentration. Target classes were assigned using Jenks natural breaks for 91 catchments with sufficient (≥3) observations of DOC in stream water to calculate a mean value. Originally, computation of TEVA was performed in the MATLAB programming language; the codebase has now been transferred to the open-source coding language Python, and is accessed through CUAHSI JupyterHub.

Show More