Checking for non-preferred file/folder path names (may take a long time depending on the number of files/folders) ...

This resource contains some files/folders that have non-preferred characters in their name. Show non-conforming files/folders.

This resource contains content types with files that need to be updated to match with metadata changes. Show content type files that need updating.

Statewide cumulative human health risk assessment of inorganics contaminated groundwater wells, Montana, USA - Data and Code

Authors:
Owners:		This resource does not have an owner who is an active HydroShare user. Contact CUAHSI (help@cuahsi.org) for information on this resource.
Type:	Resource
Storage:	The size of this resource is 592.3 MB
Created:	May 07, 2024 at 12:46 a.m. (UTC)
Last updated:	Feb 17, 2025 at 3:28 p.m. (UTC)
Published date:	Feb 17, 2025 at 3:29 p.m. (UTC)
DOI:	10.4211/hs.11599c9474744b9299bc37754c12f117
Citation:	See how to cite this resource
Content types:	Geographic Feature Content

Sharing Status:	Published
Views:	909
Downloads:	52
+1 Votes:	Be the first one to this.
Comments:	No comments (yet)

Abstract

Human health risk from consumption of groundwater is widely documented and particularly challenging to address in private wells, where testing is not required and is infrequent. Furthermore, the common approach of assessing health risk based on whether individual contaminants exceed a health threshold does not account for how close a concentration is to the threshold nor for cumulative effects across contaminants. Assessing cumulative human health risk from drinking water is relatively new and has primarily been conducted on datasets collected from discrete sampling campaigns where all data produced has a common set of analytes and similar detection limits. These sample campaigns are cost prohibitive for many communities and more efficient approaches for conducting tier 1 (screening) level human health risks are needed.

In this work, we leveraged a publicly available database for Montana groundwater and adapted methods developed by USGS to conduct a statewide cumulative human health risk assessment across 19 inorganic contaminants. This type of analysis requires decisions about which thresholds to apply, which data is most relevant to include, and what minimum data availability is considered sufficient. Sensitivity of results to each of these decisions was assessed and results for many alternative analysis scenarios are provided so users can assess what scenarios might be best suited to their assessment needs. Also included is code/output for histograms of contaminant concentrations and detection limit for non-detect concentrations. These histograms were important for identifying outliers from errant data and for informing what detection limits were considered adequately low for non-detect data to be included in the analysis. Histograms revealed that concentration data for some analytes are normally distributed, which could allow for exploration of alternative methods for handling non-detect data, such as the NADA Package in R Statistical Software. The NADA package was not feasible in our analysis due to non-detect concentrations outnumbering detection data for 7 out of 19 analytes. For datasets with a lower frequency of non-detect data, users could re-examine potential for use of NADA to numerically represent non-detect concentrations for this kind of analysis.

For users specifically working with the Montana Bureau of Mines and Geology, Groundwater Information Center database, the code provided here can be used to compile data and create metadata fields (detection limit, qualifiers, non-detect, etc.) from the somewhat cumbersome single field the database uses to store numeric results and metadata.

This data resource includes all data, code, and analysis products for the accompanying manuscript so that users can easily assess, apply, or adapt these methods for other datasets and applications.

Subject Keywords

Coverage

Spatial

Coordinate System/Geographic Projection:

WGS 84 EPSG:4326

Coordinate Units:

Decimal degrees

North Latitude

48.9936°

East Longitude

-104.1098°

South Latitude

44.2760°

West Longitude

-116.1786°

Content

Learn more about the BagIt download

Select a file to see file type metadata.

readme.txt

Statewide cumulative human health risk assessment of inorganics contaminated groundwater wells, Montana, USA
- Data and Code

Citation
Kiekover, N., W. A. Sigler, M. J. Eggers (2025). Statewide cumulative human health risk assessment of inorganics contaminated groundwater wells, Montana, USA - Data and Code, HydroShare, http://www.hydroshare.org/resource/11599c9474744b9299bc37754c12f117
Abstract
Human health risk from consumption of groundwater is widely documented and particularly challenging to address in private wells, where testing is not required and is infrequent. Furthermore, the common approach of assessing health risk based on whether individual contaminants exceed a health threshold does not account for how close a concentration is to the threshold nor for cumulative effects across contaminants. Assessing cumulative human health risk from drinking water is relatively new and has primarily been conducted on datasets collected from discrete sampling campaigns where all data produced has a common set of analytes and similar detection limits. These sample campaigns are cost prohibitive for many communities and more efficient approaches for conducting tier 1 (screening) level human health risks are needed. 
In this work, we leveraged a publicly available database for Montana groundwater and adapted methods developed by USGS to conduct a statewide cumulative human health risk assessment across 19 inorganic contaminants. This type of analysis requires decisions about which thresholds to apply, which data is most relevant to include, and what minimum data availability is considered sufficient. Sensitivity of results to each of these decisions was assessed and results for many alternative analysis scenarios are provided so users can assess what scenarios might be best suited to their assessment needs. Also included is code/output for histograms of contaminant concentrations and detection limit for non-detect concentrations. These histograms were important for identifying outliers from errant data and for informing what detection limits were considered adequately low for non-detect data to be included in the analysis. Histograms showed that concentration data for some analytes are normally distributed, which could allow for exploration of alternative methods for handling non-detect data, such as the NADA Package in R Statistical Software. The NADA package was not feasible in our analysis due to non-detect concentrations outnumbering detection data for 7 out of 19 analytes. For datasets with a lower frequency of non-detect data, users could re-examine potential for use of NADA to numerically represent non-detect concentrations for this kind of analysis.
For users specifically working with the Montana Bureau of Mines and Geology (MBMG), Groundwater Information Center (GWIC) database, the code provided here can be used to compile data and create metadata fields (detection limit, qualifiers, non-detect, etc.) from the somewhat cumbersome single field the database uses to store numeric results and metadata. 
This data resource includes all data, code, and analysis products for the accompanying manuscript so that users can easily assess, apply, or adapt these methods for other datasets and applications.

Software/Hardware
Code was run successfully in February 2025 with R version 4.4.2, on a Windows 10 (64-bit) PC with an Intel i7-8750H @2.2 GHz and 16Gb of RAM.  Packages include:
•	Tidyverse (ggplot, stringr, lubridate, dplyr, readr)
•	Strex
•	Reshape2
•	readr
•	Sf
•	Grid
•	Scales
•	Ggbreak
•	cowplot

Folder Structure and Organization
The primary data/code/results supporting the manuscript are in the ‘DataAndAnalysis’ folder with four main folders: 1_RawData, 2_Code, 3_IntermediateData, and 4_Results.  1_RawData consists of all data necessary for the analysis, including raw data downloaded from GWIC (slightly edited manually to address structure consistency issues), tabulated water quality threshold lists, GIS shapefiles of watershed and state boundaries, a csv specifying plot color symbology, a csv compiling variables to streamline the sensitivity analysis, and a few other files.  2_Code contains all R scripts used in the analysis for the generation of all tables and figures in the manuscript.  3_IntermediateData includes compiled and preprocessed data products, which are subsequently used for final outputs and analysis.  4_Results contains all analysis results (tables, figures, etc. in manuscript) for all analysis scenarios across combinations of thresholds, data filters, etc. All analysis scenario results are stored in a folder named for that scenario. The results folder is simplified here to contain only the two primary threshold scenarios discussed in the manuscript, but the code can be readily adjusted to output many more analysis scenarios with different thresholds, sample size requirements, data filtering, etc. 
There are two other top level folders. The ‘Supporting’ folder contains two files, with datapoints removed from analysis due to quality assurance issues. The ‘SensitivityAnalysis’ folder is a collection of results output encompassing all scenarios considered in the sensitivity analysis. It contains five folders, each addressing a specific aspect of sensitivity analysis, as well as the input files necessary within the primary code/folder structure to generate the associated scenarios results.
See the PDF version of readme file for easier to read bulleted list of file names.

Framework Outline and Component Description
-	AnalysisAndCode
-	1_RawData (All raw datasets and input data)
-	1_WaterQualityTXT
Tab-separated format .txt files containing water quality data, downloaded from the GWIC database
-	2_WellDataTXT
Tab-separated format .txt files containing water sample site data, downloaded from the GWIC database
-	3_Thresholds
.csv files listing chemical analytes, the numeric threshold values for each analyte, the maximum reporting limit above which results were omitted from analysis, and values assigned to non-detects 
-	T01_thresholds_MclgHa_2024-04-15_was.csv
	MCLG-HA thresholds for 19 analytes
-	T02_thresholds_MclgHa_OnlyMclAnalytes_2023-07-13_was.csv
	MCLG-HA thresholds for only the 13 analytes with MCL thresholds 
-	T03_thresholds_Mcls_2024-02-15_was.csv
	MCL thresholds for 13 analytes
-	4_MontanaHUC8_Clipped
GIS shapefile for 8-Digit HUC watersheds clipped to fit within the map of Montana.  A Raw Montana Shapefile can be found in the Supporting data/code folder 
-	AnalyteColors.csv
.csv table assigning colors to analytes, used to help maintain consistency with symbology across analyses with editing in one central location 
-	GWIC_Metadata_ZF.pdf
Metadata for GWIC data files, downloaded from GWIC website.
-	GwicBasinCodes.csv
Table of watershed names, 8-digit HUC codes, and corresponding two-letter basin codes used in GWIC to identify Montana watersheds
-	misplaced_wells.csv
A list of water sampling sites from the water quality dataset which had latitude/longitude coordinates in GWIC that fell outside the designated watershed. It was unknown whether the error was in the coordinates or the designated HUC8 code, so these points were omitted from the analysis. 
-	scenario_params.csv
Table listing all categories for analysis scenarios evaluated for the sensitivity analysis. This table is used in the script 4_CumulativeRiskLoop to conduct analysis scenarios and can be edited to alter which scenarios are assessed. 

-	2_Code (All .R scripts used for analysis and producing figures/tables)
-	1_WaterQualityCompile_FromTxt_2024-06-06.R
Compiles raw water quality .txt files into one large table and saves to .csv format; (~ 50 lines of code)
-	2_WellDataCompile_FromTxt_2024-06-06.R
Compiles groundwater monitoring site data .txt files into one .csv table (~50 lines of code)
-	3_DataReformatAndClean_2024-06-06.R
The raw water quality data downloaded from GWIC is in cross-tab format and has a single field for numeric results, qualifiers, and detection limit information. This script converts the data to long format, and separates numeric results, qualifiers, and detection limit information into separate fields, in addition to some initial filtering of abnormal/unreliable data. (~120 lines of code)
-	4_CumulativeRiskLoop_2024-06-06.R
Performs the steps for producing cumulative risk results in a loop across all (or a subset) of the analysis scenarios generated by scenario_params.csv.  By default, this outputs only results for the two primary scenarios discussed in the manuscript.  Editing the for loop at line 147 will allow for generation of other (or all) scenarios. (~500 lines of code)
Some important data frames created in this script which have no direct .csv outputs include:
o	long.data: filtered concentration results before averaging by well
o	long.data2: filtered concentration results averaged by well
o	scenarios:  table of all possible analysis scenarios given by the values entered in scneario_params.csv
o	exceedance.sub:  number and percentage of results exceeding threshold by analyte and by watershed
-	5_ExtraOutputs_2024-06-06.R
Script for creating tables and figures that are a compilation of results from specific analysis scenarios, specifically combining results from MCLG-HA and MCL threshold analysis into the same table or figure for the manuscript.  Since this script relies on specific scenarios and their outputs, it may not run if scenario parameters are altered or certain scenarios are not included.  Scenario folders include dates of generation, so the filenames called in this script may need to be updated before running. (~80 lines of code)
-	6_NewResultHist_2024-06-06.R
Outputs histograms of water quality results, with filtering steps mirroring 4_CumulativeRiskLoop_2024-06-06, with both normal and log-transformed data. (~230 lines of code)
-	7_DataFieldCounts_2024-06-06.R
Counts entries in the compiled water quality data by different categories and outputs them as tables to 4_Results/Other (~50 lines of code)
-	8_RL_Hist_2024-06-06.R
Produces histograms of reporting limit values for each analyte, with filtering steps mirroring 4_CumulativeRiskLoop_2024-06-06 (~300 lines of code)
-	999_PlotFunctions_2024-06-06.R
Contains functions for creation of plots based on cumulative risk, called in 4_CumulativeRiskLoop (~600 lines of code)
-	3_IntermediateData (Compiled and preprocessed data used for subsequent analysis)
-	1_GWIC_WaterQualityCompile _YYYY-MM-DD.csv
Water quality data in cross tab format, compiled from one .txt file from each of Montana’s 56 counties downloaded from GWIC (created with the script 1_WaterQualityCompile_FromTxt_2024-06-06.R).
-	2_GWIC_WellDataCompile_ YYYY-MM-DD.csv
Groundwater monitoring site data, compiled from one .txt file from each of Montana’s 56 counties (created with the script 2_WellDataCompile_FromTxt_2024-06-06.R)
-	3_WaterQualityLongFormat_ YYYY-MM-DD.csv
Water quality data in long format, created from the preceding file 1_GWIC_WaterQualityCompile _YYYY-MM-DD.csv using the script 3_DataReformatAndClean_2024-06-06.R


-	4_Results (All figures, and tables produced from analysis) 
-	1_ScenarioResults
Each folder (see “Analysis Scenario Name Conventions” below) includes an array of tables, figures, and maps for each analysis scenario, including those which are in the manuscript for the primary analysis scenarios. These results are created by the script 4_CumulativeRiskLoop_2023-06-06.R. 
-	2_ConcentrationHistograms
Contains histograms of concentration data by analyte, using the same scenario structure for data filtering as CR analysis (see “Analysis Scenario Name Conventions” below )
-	3_ReportingLimitHistograms
Contains histograms of reporting limits by analyte, using data filtered according to the same scenarios used in the CR analysis (see “Analysis Scenario Name Conventions” below )
-	4_ExtraOutputs
Contains tables and figures produced by 4_ExtraOutputs and 7_DataFieldCounts
-	BasinCR_Comparison_SS_5__NDMIN_G2_WBSP_DTR_Post1974_YYYY-MM-DD.csv
Combined cumulative risk data for both T01 and T03 threshold data, with scenario as indicated in filename
-	Exceedance_Plot_Comparison_SS_5_NDMIN_G2_WBSP_DTR_Post1974_YYYY-MM-DD.csv
	2-Way plot of percent exceedance by analyte for T01 and T03 scenarios; Figure 5 in the manuscript. 
-	HucCounts.csv
	Counts number of samples from each HUC (no filtering)
-	MinimumReportingLimits.csv
Table showing the smallest reporting limit for each analyte in pre-processed data.  The full dataset may have some smaller RLs that were filtered out in preprocessing
-	Quotient_Comparison_SS_5_G2_WBSP_DTR_Post1974_YYYY-MM-DD.csv
Combined risk quotient data from T01 and T03 threshold data, with the same scenario as above
-	RQ_Plot_Comparison_SS_5_G2_WBSP_DTR_Post1974_YYYY-MM-DD.csv
4-Way plot of risk quotient data by analyte for T01 and T03 scenarios, including both Q75 and median-based quotients.
-	SiteTypes.csv
Counts samples for each site type (no filtering) used to inform which site types were included in the analysis, as described in the manuscript supplement's methods section. 
-	WellUses.csv
Counts number of wells for each well use category (no filtering) which were used to create well use groups for analysis scenarios, as described in the methods section of the manuscript supplement.
-	Scenario_Values.csv
Table with key results metrics and sample sizes for all analysis scenarios generated with scenario_params.csv (see the following entry for metadata). This file is a consolidated location to evaluate analysis results across all analysis scenarios and each row corresponds to one output folder in 1_ScenarioResults.
-	ScenarioValues_Metadata.docx
This file describes the meaning for each column heading in scenario_values.csv 

-	ScenarioResults
Contains one folder for each analysis scenario.

-	[Scenario Filename] (See example image and “Analysis Scenario Name Conventions” below for filename guidelines)
Contains all tables, plots, and maps generated for one scenario.  Several outputs have multiple versions for different CR calculation methods.  Those using CR50 have a filename beginning with ‘MEDIAN’, while those using CR75 have ‘Q75’. Files with a CR-dependent filename will have [CR_Type] as a placeholder.
 
-	1_BasinAnalyteStats.csv
Table of analyte concentration summary statistics by watershed, along with their corresponding hazard quotients; used in Figure 3 in the manuscript. 	
-	2_BasinCR.csv
Table of Cumulative Risk by Watershed; this is the same data that is in Figures 2 and 6 in the manuscript for the primary scenario. 
-	3_BasinSampleSizes.csv
Table of sample sizes for each analyte by watershed
-	4_Hazard_Quotients.csv
Table of the average and median risk quotients by watershed, for both Q75 and median approaches.
-	5_PercentExceedanceByAnalyte.csv
Table of percent threshold exceedance by analyte, across all watersheds used in the corresponding scenario; used in figure 5 in the manuscript
-	CRBarChart.png
Plot of cumulative risk (CR) values calculated with 25/50/75th percentiles, shown as a barplot with whiskers.
-	CRBoxPlot.png
Plot of (CR) values based on different percentiles for each watershed. This is not a traditional boxplot; the extents of the box and whiskers represent CR calculated with different analyte concentration percentiles. 
-	ExceedanceByAnalyte.png
Barplot of % threshold exceedance by analyte, across the state.
-	ExceedanceStackedBar.png
Stacked bar chart of % threshold exceedance by watershed, separated by analyte
-	FourAnalyteWatershedExceedance.png
4 Individual plots of %Exceedance by HUC for 4 Analytes:  As, No3 as N, F, U; Figure in supplement of manuscript. 
-	[CR_Type]_CR_StackedBarChart.png
Stacked bar chart of Cumulative Risk by watershed, with analyte contributions colored individually (colors may be edited via analyte_colors.csv)
-	[CR_Type]_NEW_CR_MAP.png
Map of Montana watersheds colored by CR value
-	[CR_Type]_HazardQuotientBar.png
Bar chart of median risk quotient by analyte
-	[CR_Type]_HazardQuotientBoxplot.png
Plots distribution of hazard quotients by analyte as boxplot. Uses a log-transformed y-axis scale by default; this can be disabled by changing ‘log’ to ‘F’ in the function RiskQuotientBoxPlot() in the script 4_CumulativeRiskLoop.
-	HazardQuotientBoxPlot_OneWatershed.png
Plot of hazard quotients for each analyte with a format parallel to the preceding CRBoxPlot for one chosen HUC, which is given in the filename.  Similar to CRBoxPlot, the box and whishers represent hazard quotients calculated from different percentiles.  This is a plot for a single watershed as an example.
-	WellSampleSizeWithBasinByAnalyte.png
Plot of the distribution of sample sizes for different watersheds by analyte

-	Supporting
-	Pace.csv
Table of samples from the GWIC water quality dataset with the ‘lab_name’ field as ‘PACE.’  Some entries from this lab had abnormally high results (higher than realistically feasible, and inconsistent with other samples from the same sites).  To ensure no errant data was present in the analysis, all data in this table is filtered out during preprocessing.
-	PSW_Water_Quality.csv
Table of water quality samples with ‘type’ listed as ‘PUBLIC WATER SUPPLY – GW’.  None of these sites appear in the well dataset, and as such are not included in the analysis.

-	SensitivityAnalysis
This folder consists of five comprehensive sets of outputs used for sensitivity analysis along with the input files which generate the scenarios for each output, in contrast to the primary analysis for the manuscript with only two scenarios (one for MCLG-HA thresholds and one for MCL standards).  Users who wish to do a comprehensive analysis of scenarios other than what is included in the primary code folder will have to make the following changes to code in the DataAndAnalysis main folder to output the content in SensitivityAnalysis.
Lines of code change:
•	In script “4_CumulativeRiskLoo…” the line just above the for loop (near 145) is changed to comment out the line that runs only specified scenarios, and the line is uncommented that runs all scenarios. This is the only script necessary to run the sensitivity analysis if intermediate results are retained from the original DataAndCode folder.
•	To output histograms for all scenarios, change which line is commented out at the top of the lop for running scenarios (near line 115) in the “6_NewResultHist....” script 

•	Parallel comment for the reporting limit script: 8_RL_Hist_2024-06-06.R

Below is a description of the parameter variations covered in each SensitivityAnalysis subfolder, and the different threshold files used.  See the discussion below on analysis scenario naming for definitions of the parameters being altered.

-	1_General
Scenario params:
•	Non-detect method = NDMIN
•	Sample size = 3, 5, 10, 15
•	Well use groups = G1, G2, G3
•	Cutoff year = 0, 1940, 1974
Threshold files:
•	T01
•	T01a-e
•	T02
•	T03

-	2_ND_Mult
Scenario params:
•	Non-detect method = NDMIN, NDHALF
•	Sample size = 3, 5, 10, 15
•	Well use groups = G2
•	Cutoff year = 1974
Threshold files:
•	T01
•	T03

-	3_SampleSize
Scenario params:
•	Non-detect method = NDMIN
•	Sample size = 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
•	Well use groups = G2, G3
•	Cutoff year = 1974
Threshold files:
•	T01
•	T03

-	4_Years
Scenario params:
•	Non-detect method = NDMIN
•	Sample size = 5
•	Well use groups = G2
•	Cutoff year = 0, 1940, 1974, 1984, 1994, 2004, 2014
Threshold files:
•	T01
•	T03

-	5_Uses
Scenario params:
•	Non-detect method = NDMIN
•	Sample size = 5
•	Well use groups = G1, G2, G3, STCKWTR
•	Cutoff year = 1974
Threshold files:
•	T01
•	T03

Each SensitivityAnalysis Subfolder consists of the following files, which function the same way as the files with the same name in the main folder:
-	1_ScenarioResults
Results specific to each scenario, with the exact same contents as 1_ScenarioResults in the main code results
-	Thresholds
Set of threshold files used to create scenarios
-	scenario_params.csv
Table of parameters being used to generate scenarios
-	Scenario_Values.csv
Table of summary statistics for each scenario
-	Scenario_Values_Metadata.docs
Metadata document corresponding to Scenario_Values.csv



Analysis Scenario Name Conventions
Results for each analysis scenario use shorthand names to indicate the attributes (data filtration, thresholds, etc.) defining the scenario, separated by underscores, followed by the date of file creation (based on system time of computer at the time the script is run).
Threshold Files:
•	T01 – uses 19 analytes with conservatively protective health risk thresholds (MCLG-HA), this file has reporting limit cutoffs for four analytes increased above the threshold (see manuscript).
•	T01a-e - parallel to T01 but with different MDL cutoff values set equal to the T01 threshold values and multiples of those values. 
o	a = MDL cutoffs 0.1 of thresholds
o	b = MDL cutoffs 0.5 of thresholds
o	c = MDL cutoffs equal to thresholds
o	d = MDL cutoffs 2 times thresholds
o	e = MDL cutoffs 4 times thresholds
•	T02 – same as T01, but only includes the 13 analytes with MCL thresholds (those in T03)
•	T03 – includes 13 analytes from T01 that have MCLs, uses MCLs as thresholds, and uses reporting limit cutoff values from T01
Sample Size Requirement (SS) – Required number of wells with results for each analyte for a watershed to be included in the analysis; the primary analysis uses 5.  Appears as SS_5, SS_10, etc.
Non-detect replacement value method:
•	NDMIN – uses the lowest reporting limits in the dataset for each analyte to determine the replacement concentration value for non-detect entries (these values are found in the threshold tables)
•	NDHALF – uses half the value of the reporting limit for each non-detect concentration as the replacement value
Data Filtering Components:
Well Use:
•	G1 – Group 1, contains wells with ‘Well_Use’ in the well dataset equal to ‘DOMESTIC’ or ‘PUBLIC WATER SUPPLY’
•	G2 – Group 2, contains G1 wells, and wells whose usage is stockwater, irrigation, medical, commercial, or recreation)
•	G3 – Group 3, contains all wells except those in the ‘injection’ category
•	STCKWTR – contains only wells in the ‘STOCKWATER’ category
Site Type:
•	WBS – Contains results from sites in the ‘WELL, ‘BOREHOLE, and ‘SPRING’ categories, based on the well data’s Site_Type category
•	WELLS – Contains only sites with a type of ‘WELL’ (Not currently in use, must be added to site type column in scenario_params.csv to use again)
Procedure Type
•	DTR – Contains results with procedure type listed as ‘Dissolved’ or ‘Total Recoverable’
•	Ultimately this was not a dynamic component of establishing analysis scenarios, so DTR was the only scenario.
Cutoff Year
•	Post0 – Uses data from all dates
•	Adding years under ‘year’ in scenario_params.csv will add scenarios where all data from the year entered and before is removed; this will appear as Post[year].
•	Due to some errant data being found, lead data from before 1993 is automatically removed in 3_DataReformatAndClean_2024-06-06.R

Scenario Filename Example:  T01_SS_5_NDMIN_G1_WBS_DTR_Post0_2024-02-25
•	T01 - Threshold file T01, i.e. MCLG-HA
•	SS_5 - sample size of five wells required for each analyte in every watershed
•	NDMIN - uses the minimum recorded reporting limit for each analyte as numeric replacement for non-detects
•	G1 - contains well use categories in group one , i.e. domestic/public water supply sites
•	WBS - Includes sites of type ‘WELL’, ‘SPRING’, or ‘BOREHOLE’
•	DTR - includes only samples with a procedure type of ‘Dissolved’ or ‘Total recoverable’ 
•	Post0 – year to filter data by is set to zero, allowing for inclusion of dates set with the year 1900 (Microsoft zero datum for dates), resulting in no removal of data by date.
•	2024-02-25 – File creation date, year, month, day

Raw Data Abnormalities
Some edits were manually made to the text files downloaded from the GWIC database in order to properly work with them in R.  These include adding tabs to some rows where they were missing, which was causing misaligned cells when read into R.  Additionally, quotation marks were removed from some of the site names in the Hill county .txt file for well data to ensure it was being read properly.  Using newer versions of the GWIC data may require the same steps to be taken to ensure all data is input into R properly.

Rights
The data presented here from GWIC are in the public domain under CC0
The code presented here is available under the MIT license.
https://opensource.org/licenses/MIT
Purpose
The purpose of this resource is: 1) to support and provide transparency of methods for the corresponding manuscript; and 2) to share a codebase for others to conduct parallel analysis or build on our work to advance cumulative human health risk analysis from public water quality datasets. 
Other Resources
Eggers, M. J., Sigler, W. A., Kiekover, N., Bradley, P. M., Smalling, K. L., Parker, A., Peterson, R. K. D., & LaFave, J. I. (2025). Statewide cumulative human health risk assessment of inorganics-contaminated groundwater wells, Montana, USA. Environmental Pollution, 125810. https://doi.org/10.1016/j.envpol.2025.125810
Montana Bureau of Mines and Geology, Groundwater Information Center database. https://mbmggwic.mtech.edu/
Shuangbin Xu, Chen M, Feng T, Zhan L, Zhou L, Yu G (2021). “Use ggbreak to effectively utilize plotting space to deal with large datasets and outliers.” Frontiers in Genetics, 12, 774846. doi:10.3389/fgene.2021.774846.

Data Services

The following web services are available for data contained in this resource. Geospatial Feature and Raster data are made available via Open Geospatial Consortium Web Services. The provided links can be copied and pasted into GIS software to access these data. Multidimensional NetCDF data are made available via a THREDDS Data Server using remote data access protocols such as OPeNDAP. Other data services may be made available in the future to support additional data types.

Web Map Service

https://geoserver.hydroshare.org/geoserver/HS-11599c9474744b9299bc37754c12f117/wms?request=GetCapabilities

Web Feature Service

https://geoserver.hydroshare.org/geoserver/HS-11599c9474744b9299bc37754c12f117/wfs?request=GetCapabilities

Related Resources

This resource requires	Shuangbin Xu, Chen M, Feng T, Zhan L, Zhou L, Yu G (2021). “Use ggbreak to effectively utilize plotting space to deal with large datasets and outliers.” Frontiers in Genetics, 12, 774846. doi:10.3389/fgene.2021.774846.
This resource has a related resource in another format	Montana Bureau of Mines and Geology, Groundwater Information Center database. https://mbmggwic.mtech.edu/
This resource is referenced by	Eggers, M. J., Sigler, W. A., Kiekover, N., Bradley, P. M., Smalling, K. L., Parker, A., Peterson, R. K. D., & LaFave, J. I. (2025). Statewide cumulative human health risk assessment of inorganics-contaminated groundwater wells, Montana, USA. Environmental Pollution, 125810. https://doi.org/10.1016/j.envpol.2025.125810

Credits

Funding Agencies

This resource was created using funding from the following sources:

Agency Name	Award Title	Award Number
Montana Water Center	Exploring water quality in Montana groundwater and understanding associated drivers of human health risk	#WA538
Montana Institute on Ecosystems	Uncovering and Addressing Environmental Health Risks Associated with Montana Groundwater

Contributors

People or Organizations that contributed technically, materially, financially, or provided general support for the creation of the resource's content but are not considered authors.

Name	Organization	Address	Author Identifiers
Venice Bayrd	Montana State University;Montana EPSCoR	MT, US	ORCID
John LaFave	Montana Bureau of Mines and Geology	MT, US
Al Parker	Montana State University	MT, US

How to Cite

Kiekover, N., W. A. Sigler, M. J. Eggers (2025). Statewide cumulative human health risk assessment of inorganics contaminated groundwater wells, Montana, USA - Data and Code, HydroShare, https://doi.org/10.4211/hs.11599c9474744b9299bc37754c12f117

The data presented here from GWIC are in the public domain under CC0

The code presented here is available under the MIT license.
https://opensource.org/licenses/MIT

https://opensource.org/licenses/MIT

Comments

There are currently no comments