Checking for non-preferred file/folder path names (may take a long time depending on the number of files/folders) ...

This resource contains some files/folders that have non-preferred characters in their name. Show non-conforming files/folders.

This resource contains content types with files that need to be updated to match with metadata changes. Show content type files that need updating.

Water Quality Portal (WQX 2.2 Profile Backup) v0.1.0

Authors:
Owners:		This resource does not have an owner who is an active HydroShare user. Contact CUAHSI (help@cuahsi.org) for information on this resource.
Type:	Resource
Storage:	The size of this resource is 39.5 GB
Created:	Feb 27, 2025 at 5:09 a.m. (UTC)
Last updated:	Feb 27, 2025 at 5:40 a.m. (UTC)
Citation:	See how to cite this resource

Sharing Status:	Public
Views:	736
Downloads:	0
+1 Votes:	Be the first one to this.
Comments:	No comments (yet)

Abstract

This resource is a backup of the Water Quality Portal (WQP), a database of water quality samples from the U.S. Geological Survey and U.S. Environmental Prorection Agency. This resource includes:

1. An R script (run.R) that downloads the data from WQP web services.
2. A hiearchical archive of zipped csv files, organized by geographic area (in general, Country/State/County or equivalent)
3. A data dictionary for the Water Quality Portal csv exports

Subject Keywords

Deleting all keywords will set the resource sharing status to private.

Content

Learn more about the BagIt download

Select a file to see file type metadata.

README.md

Backup the Water Quality Portal

Description

This is a backup of the Water Quality Portal (Legacy WQX 2.2 profile, including USGS NWIS water quality samples through March 11, 2024, and EPA STORET water quality samples through February 2025) Files are in zipped CSVs organized in a directory by Country (State for the US). In each Country/State.zip file are county directories, in which are separate zipped CSVs corresponding to each WQP "data profile" :

Organization Data (organizations.zip)
Site Data Only (sites.zip)
Project Data (projects.zip)
Project Monitoring Location Weighting (weighting.zip)
Sample Results (Physical/Chemical) (physChem.zip)
Sample Results (Biological) (biological.zip)
Sample Results (Narrow) (narrowResult.zip)
Sampling Activity (activity.zip)
Sampling Activity Metrics (activityMetric.zip)
Biological Habitat Metrics (resultDetectionQuantitationLimit.zip)
Result Detection Quantitation Limit Data (biologicalMetric.zip)

A data dictionary for all fields for each of these profiles is additionally provided in WQX_Data_Dictionary.zip

The Non-US "Countries" are as follows:

FM (Federated States of Micronesia) CA (Canada) GT (Guatemala) IN (India) LE (Lake Erie) LH (Lake Huron) NI (Nicaragua) OA (Atlantic Ocean) OI (Indian Ocean) OP (Pacific Ocean) QO (Lake Ontario) QS (Lake Superior) MX (Mexico) RM (Marshall Islands) PS (Palau) YT (Mayotte) ZC (Caribbean Sea)

Script

An R script for archiving the Water Quality Portal (WQP), and the resulting files. This script systematically downloads data as zipped csv by the lowest administrative unit possible (typically county, but varies by country) to minimze server timeouts and improve archive indexing, organizing it into a hierarchical directory structure.

Downloads data from each WQP Web Service endpoint
Handles both countries with and without county-level administrative divisions
Creates organized directory structure based on geographic hierarchy
Includes retry logic
Comprehensive logging and progress tracking
Rate limiting to respect API endpoints

Directory Structure

The script creates a hierarchical directory structure based on geographic divisions:

For countries with county systems (US, FM, PS, RM): locations/ ├── US/ │ ├── 06_California/ │ │ ├── 001_Alameda/ │ │ │ ├── sites.zip │ │ │ ├── organizations.zip │ │ │ └── ... │ │ └── 003_Alpine/ │ └── 36_New_York/ └── FM/ └── ...

For countries without county systems: locations/ ├── CA/ │ ├── 01_Alberta/ │ │ ├── sites.zip │ │ ├── organizations.zip │ │ └── ... │ └── 02_British_Columbia/ └── MX/ └── ...

Requirements

R >= 4.0.0
Required R packages:
tidyverse
httr
fs
jsonlite
furrr
progressr
parallelly

Install dependencies: r install.packages(c("tidyverse", "httr", "fs", "jsonlite", "furrr", "progressr", "parallelly"))

Usage

Clone the repository: bash git clone https://github.com/ksonda/wqp-backup.git cd wqp-backup
Run the script: r source("run.R")

The script will: 1. Create the complete directory structure 2. Download data for each endpoint in sequence 3. Process locations in parallel within each endpoint

Configuration

The tool's behavior can be customized by modifying the CONFIG list in the script:

r CONFIG <- list( base_url = "https://www.waterqualitydata.us/data", endpoints = list(...), base_dir = "locations", location_types = list( county_countries = c("US", "FM", "PS", "RM"), state_countries = c("CA", "MX") ), parallel = list( workers = parallelly::availableCores() - 1, # Use all cores except one chunk_size = 100 # Number of locations to process in each chunk ) )

base_url: Base URL for the Water Quality Portal API
endpoints: List of endpoints and their configurations
base_dir: Base directory for downloaded data
location_types: Geographic division configurations
parallel: Parallel processing settings
workers: Number of parallel workers to use
chunk_size: Number of locations to process in each chunk

Logging

Failed downloads logged to download_errors.log

Rate Limiting

To respect the API's resources: - 1-second delay between requests - Exponential backoff on failures - Maximum of 3 retry attempts per download

License

This project is licensed under the MIT License - see the LICENSE file for details.

References

Water Quality Portal. Washington (DC): National Water Quality Monitoring Council, United States Geological Survey (USGS), Environmental Protection Agency (EPA); 2021. https://doi.org/10.5066/P9QRKUVJ.

Related Resources

The content of this resource is derived from

Water Quality Portal. Washington (DC): National Water Quality Monitoring Council, United States Geological Survey (USGS), Environmental Protection Agency (EPA); 2021. https://doi.org/10.5066/P9QRKUVJ.

How to Cite

Onda, K. (2025). Water Quality Portal (WQX 2.2 Profile Backup) v0.1.0, HydroShare, http://www.hydroshare.org/resource/7b4d4e186c6b4e888876bcb713b4dff7

This resource is shared under the Creative Commons Attribution CC BY.

http://creativecommons.org/licenses/by/4.0/

Comments

There are currently no comments