Checking for non-preferred file/folder path names (may take a long time depending on the number of files/folders) ...
This resource contains some files/folders that have non-preferred characters in their name. Show non-conforming files/folders.
This resource contains content types with files that need to be updated to match with metadata changes. Show content type files that need updating.
Authors: |
|
|
---|---|---|
Owners: |
|
This resource does not have an owner who is an active HydroShare user. Contact CUAHSI (help@cuahsi.org) for information on this resource. |
Type: | Resource | |
Storage: | The size of this resource is 1.2 GB | |
Created: | Dec 08, 2021 at 7:07 p.m. | |
Last updated: | Feb 06, 2024 at 2:14 p.m. (Metadata update) | |
Published date: | Aug 08, 2022 at 9:11 p.m. | |
DOI: | 10.4211/hs.9547035cf37940eb9b500b7994a378a1 | |
Citation: | See how to cite this resource |
Sharing Status: | Published |
---|---|
Views: | 2383 |
Downloads: | 636 |
+1 Votes: | 1 other +1 this |
Comments: | No comments (yet) |
Abstract
Water quality monitoring can inform policies that address pollution; however, inconsistent measurement and reporting practices render many observations incomparable across bodies of water, thereby impeding efforts to characterize spatial patterns and long-term trends in pollution. Here, we harmonized 9.2 million publicly available monitor readings from 226 distinct water monitoring authorities spanning the entirety of the Mississippi/Atchafalaya River Basin (MARB) in the United States. We created the Standardized Nitrogen and Phosphorus Dataset (SNAPD), a novel dataset of 4.8 million standardized observations for nitrogen- and phosphorus-containing compounds from 107 thousand sites during 1980–2018. To the best of our knowledge, this dataset represents the largest record of these pollutants in a single river network where measurements can be compared across time and space. We addressed numerous well-documented issues associated with the reporting and interpretation of these water quality data, heretofore unaddressed at this scale, and our approach to water quality data processing can be applied to other nutrient compounds and regions.
Subject Keywords
Coverage
Spatial
Temporal
Start Date: | |
---|---|
End Date: |
Content
README.md
Standardized Nitrogen and Phosphorus Dataset (SNAPD)
This document describes the code and data necessary to reproduce the harmonized water quality dataset from Krasovich et al., 2022 "Harmonized nitrogen and phosphorus concentrations in the Mississippi/Atchafalaya River Basin from 1980 to 2018."
Setup
All scripts are written in R. Throughout this README, when indicating paths to code and data, it is assumed that you’ll execute scripts from the folder structure provided in the Hydroshare Repository using R or RStudio.
Hydroshare Repository
You may view and download source code and data from our Hydroshare Repository. This repository contains data inputs as well as the code necessary to replicate our final harmonized dataset, SNAPD, and intermediate dataset, WQP_to_SNAPD_flagged.
Please cite as:
Krasovich, E., P. Lau, J. Tseng, J. Longmate, K. Bell, S. Hsiang (2022). Standardized Nitrogen and Phosphorus Dataset (SNAPD), HydroShare, http://www.hydroshare.org/resource/9547035cf37940eb9b500b7994a378a1
Hydroshare Folder Structure
Hydroshare Repository:
├── README.md
├── SNAPD
│ └── Code
│ ├── _install_us_wq_packages.R
│ ├── _master_workflow_and_setup.R
│ ├── A00_us_raw_wqd_retrieval_workflow.R
│ ├── A01download_wq_sites_from_WQP.R
│ ├── A02create_and_clean_WQP_site_df.R
│ ├── A03download_wqd_by_nutrient.R
│ ├── A04merge_wqd_w_site_data_by_download.R
│ ├── A05crop_wqp_sites_to_mrb.R
│ ├── B00_us_wqd_processing_workflow.R
│ ├── B01standardize_wq_org_names.R
│ ├── B02recover_state_and_make_unique_sites.R
│ ├── B03flag_sample_level_metadata.R
│ ├── B04flag_raw_obs_w_unknown_chemical_form.R
│ ├── B05flag_result_level_metadata.R
│ ├── B06flag_and_convert_wqd_units.R
│ ├── B07merge_nutrient_compounds_and_rename_RSFs.R
│ ├── B08get_upper_DLs_and_merge_w_wqd.R
│ ├── B09impute_non_detects.R
│ ├── B10flag_potential_outliers.R
│ ├── B11flag_duplicate_types.R
│ ├── B12create_full_flagged_dataset.R
│ ├── B13harmonize_duplicates.R
│ ├── B14combine_parameters.R
│ ├── B15final_cleaning.R
│ ├── C00_us_wq_data_figures_and_tables_workflow.R
│ ├── C01create_raw_wqd_summary_table.R
│ ├── C02create_harmonization_process_table.R
│ ├── C03create_final_wqd_summary_table.R
│ ├── C04create_technical_validation_histograms.R
│ └── C05make_sankey_plots.R
│ └── Data
│ ├── _A_workflow
│ └── all_raw_wqd_and_sites.fst
│ ├── _B_workflow
│ ├── analytical_methods_and_chemical_forms_to_import.csv
│ ├── DL_wqd_units_raw_to_import.csv
│ ├── fips_gnis_state_codes.xlsx
│ ├── WQP_to_SNAPD_flagged.fst
│ ├── all_WQP_sites_df.fst
│ ├── SNAPD.fst
│ ├── WQP_to_SNAPD_flagged.fst.zip
│ ├── wq_org_names_to_import.csv
│ ├── wqd_units_raw_to_import.csv
│ └── wqp_water_chars_handles.xlsx
│ ├── _C_workflow
│ └── SNAPD_final_wqd_sites.csv
Replication Process
There are three stages to our data harmonization process. We have structured our code in three workflows that correspond with each stage:
- Data retrieval from the Water Quality Portal and minimal cleaning:
A00_us_raw_wqd_retrieval_workflow.R
- Data processing and harmonization:
B00_us_wqd_processing_workflow.R
- Figure creation:
C00_us_wq_data_figures_and_tables_workflow.R
Run full pipeline
The entire pipeline can be run from the master workflow _master_workflow_and_setup.R
. The master workflow installs the necessary packages, loads required libraries, creates directories, sets file paths, and sources each workflow so that it may be run directly from the master workflow script. To run the master workflow, you must set the working directory to the analagous [SNAPD
] folder so that the folder structure matches the Hydroshare repository. In addition, our workflow requires you to download two shapefiles before running any code in order for the pipeline to run. These two shapefiles are cited in our manuscript and must be saved into the ['/Data/_A_workflow'] folder.
- Schwartz, Michael. (2015). USGS [Mississippi River Basin]. Retrieved from: https://www.sciencebase.gov/catalog/item/55de04d5e4b0518e354dfcf8
- U.S. Department of Commerce, U.S. Census Bureau, Geography Division. (2017). [TIGER/Line Shapefile, 2017, nation, U.S., Current State and Equivalent National]. Retrieved from: http://www2.census.gov/geo/tiger/TIGER2017/STATE/tl_2017_us_state.zip
If instead you wish to run the pipeline in stages, you can also interact with each workflow and the corresponding scripts directly. Each workflow's scripts are indicated by letter and numbers to indicate which workflow they correspond to and the order in which the script should be run within a given workflow.
Stage 1. Data retrieval from the Water Quality Portal and minimal cleaning
This stage of the pipeline downloads raw water quality site and sample data from the Water Quality Portal into specified directories. Raw site and sampling data are combined and minimal cleaning is performed. You can run this stage of the workflow from _master_workflow_and_setup.R
or directly from A00_us_raw_wqd_retrieval_workflow.R
.
Note that the Water Quality Portal is frequently updated, so if this stage of the pipeline is run, then the resulting retrieved data may differ from the data we retrieved for our resulting harmonized dataset. Therefore, we recommend not running A00_us_raw_wqd_retrieval_workflow.R
unless new water quality data is desired. We have provided the output of this stage of the pipeline as a fst file all_raw_wqd_and_sites.fst
in [/Data/_A_workflow
], which is the input to Stage 2 of our pipeline. This dataset contains all the raw water quality site and sample data for each nitrogen and phosphorus compound downloaded from the Water Quality Portal. In this stage, we drop observations that are outside the boundary of the Mississippi/Atchafalaya River Basin or outside our timeframe of interest (1980 - 2018).
If new data is desired, run A00_us_raw_wqd_retrieval_workflow.R
, but downstream stages may error due to differences in the newly downloaded data, which may require some code adjustments.
Stage 2. Data processing and harmonization
This stage of the pipeline roughly follows Table 2 in (Krasovich et al., 2022), which entails a number of cleaning actions that ensure the comparability of water quality data. You can run this stage of the workflow from _master_workflow_and_setup.R
or directly from B00_us_wqd_processing_workflow.R
.
Outputs from this stage of the pipeline are in [/Data/_B_workflow
], including both the intermediate flagged dataset called 'WQP_to_SNAPD_flagged.fst' as well as the final harmonized dataset called 'SNAPD.fst'. We publish both datasets so that secondary users may decide which best suits their data needs. Variable definitions for both datasets can be found in the Data Records section of our manuscript.
Stage 3. Figure creation
This stage of the pipeline creates most of the figures and tables in (Krasovich et al., 2022). You can run this stage of the workflow from _master_workflow_and_setup.R
or directly from C00_us_wq_data_figures_and_tables_workflow.R
. This workflow requires that both the first two stages of the pipeline are completed.
Specifically, this stage outputs information and plots used for Table 1, Table 2, Table 5, Figure 4, Figure 5 (A and B), and Figure 6 (A and B). Figure 1 and 3 are created in QGIS, but use the WGS data above along with the 'SNAPD_final_wqd_sites.csv' that is output from this workflow. Tables and figures in the paper not listed here are not created programatically. Outputs of this stage of the pipeline will be saved to [/Data/_C_workflow
]. Post-outputting from R, final edits of all figures are made in Adobe Illustrator.
Related Resources
The content of this resource is derived from | U.S. Department of Commerce, U.S. Census Bureau, Geography Division. (2017). [TIGER/Line Shapefile, 2017, nation, U.S., Current State and Equivalent National]. Retrieved from: http://www2.census.gov/geo/tiger/TIGER2017/STATE/tl_2017_us_state.zip |
The content of this resource is derived from | National Water Quality Monitoring Council. Water Quality Portal. https://www.waterqualitydata.us/ (2019). |
The content of this resource is derived from | United States Environmental Protection Agency, the United States Geological Survey & Water Quality eXchange. Best Practices for Submitting Nutrient Data to the Water Quality eXchange (WQX) https://www.epa.gov/sites/default/files/2017-06/documents/wqx_nutrient_best_practices_guide.pdf (2017). |
Credits
Funding Agencies
This resource was created using funding from the following sources:
Agency Name | Award Title | Award Number |
---|---|---|
Tūaropaki Trust | ||
Royal Society Te Apārangi Rutherford Postdoctoral Fellowship |
Contributors
People or Organizations that contributed technically, materially, financially, or provided general support for the creation of the resource's content but are not considered authors.
Name | Organization | Address | Phone | Author Identifiers |
---|---|---|---|---|
Sandy Sum | Bren School of Environmental Science & Management, University of California, Santa Barbara | |||
Daniel Allen | Global Policy Laboratory, Goldman School of Public Policy, University of California, Berkeley |
How to Cite
This resource is shared under the Creative Commons Attribution CC BY.
http://creativecommons.org/licenses/by/4.0/
Comments
There are currently no comments
New Comment