Checking for non-preferred file/folder path names (may take a long time depending on the number of files/folders) ...

DROMEDARY US


Authors:
Owners: This resource does not have an owner who is an active HydroShare user. Contact CUAHSI (help@cuahsi.org) for information on this resource.
Type: Resource
Storage: The size of this resource is 629.0 MB
Created: Nov 12, 2024 at 6:59 p.m. (UTC)
Last updated: Apr 14, 2026 at 6:30 p.m. (UTC)
Citation: See how to cite this resource
Content types: CSV Content 
Sharing Status: Public
Views: 1907
Downloads: 1
+1 Votes: Be the first one to 
 this.
Comments: No comments (yet)

Abstract

The DROMEDARY US dataset is a publicly accessible collection that encompasses 3,246 basins throughout the contiguous United States. It includes several datasets: the Gages II dataset (Falcone, 2011), the Cropland Data Layer from the CropScape web platform (Han et al., 2012), meteorological data from the Daymet dataset (Thornton et al., 2021), and streamflow data published by the USGS, accessible via their data retrieval tool (De Cicco et al., 2018).

This dataset was created to train Long Short-Term Memory (LSTM) neural networks to predict daily streamflow time series at the outlets of these basins. It aims to provide a comprehensive sample of basins across various hydroclimatological contexts in the US, including those that have undergone significant changes in land use and land cover.

The dataset is divided into static attributes and time series. The time series are stored in .nc files, with the 3,246 .nc files compressed into an archive that is split into three parts to facilitate uploading.

More details are provided in the article: "Using Long Short-Term Memory Neural Networks to Assess Streamflow Alteration from Land-Use and Land Cover Changes: Application to Fallowed Land Across the United States" by Baptiste Francois, Samson Zhilyaev and Casey Brown, which was submitted to Water Resources Research journal. The full reference of the article will be updated once/if it gets accepted for publication.

Falcone, J., 2011. GAGES-II: Geospatial Attributes of Gages for Evaluating Streamflow. Reston, VA. https://doi.org/10.3133/70046617

De Cicco, L.A., Hirsch, R.M., Lorenz, D., Watkins, D., Johnson, M., 2024. dataRetrieval: R packages for discovering and retrieving water data available from Federal hydrologic web services, v.2.7.15. https://doi.org/10.5066/P9X4L3GE

Thornton, P.E., Shrestha, R., Thornton, M., Kao, S.-C., Wei, Y., Wilson, B.E., 2021. Gridded daily weather data for North America with comprehensive uncertainty quantification. Sci. Data 8, 190. https://doi.org/10.1038/s41597-021-00973-0

Han, W., Yang, Z., Di, L., Mueller, R., 2012. CropScape: A Web service based application for exploring and disseminating US conterminous geospatial cropland data products for decision support. Comput. Electron. Agric. 84, 111–123. https://doi.org/10.1016/j.compag.2012.03.005

Subject Keywords

Coverage

Temporal

Start Date:
End Date:

Content

README.md

LSTM input data: DROMEDARY basin sample with dynamic land use (Francois et al., 2026)

This resource contains NeuralHydrology-style inputs for continental-U.S. streamflow modeling with time-varying cropland/land-cover (CDL) fractions and static basin attributes. It supports the LSTM experiments that pair DayMet meteorology, USGS streamflow, Gages II basin descriptors, and USDA NASS Cropland Data Layer (CDL) categories aggregated to each basin.


Contents overview

Path Description
time_series/ One NetCDF (.nc) file per basin: daily forcings, CDL-derived land-use fractions, and observed discharge.
attributes/ static_attributes.csv: one row per basin with Gages II–based static features, climatology derived from DayMet (2008–2023), and a single-year (2015) CDL snapshot used as static covariates.

The basin list included here has 3246 gauges (see time_series/list_DROMEDARY_basins.txt).


time_series/

Files

  • {USGS_ID}.nc — NetCDF-4 dataset for basin USGS_ID (8-digit USGS streamgage identifier, zero-padded where applicable). There is one file per basin in the sample.
  • list_DROMEDARY_basins.txt — Text list of all basin IDs (one ID per line), matching the expected .nc filenames without the extension.
  • list_all_gages.py — Optional helper script: if run from inside time_series/ after all .nc files are present, it scans *.nc and regenerates list_DROMEDARY_basins.txt.

Temporal coverage and coordinate

  • Time dimension: date — daily timestamps from 2008-01-01 through 2023-12-31 (inclusive), aligned across variables.
  • Spatial unit: Each file corresponds to a single gaged catchment; variables are basin averages (meteorology from basin-averaged DayMet; CDL classes as percent of basin area).

Variables in each NetCDF

Meteorology (DayMet, basin average) — variable names in the file:

Variable Description (summary)
dayl Day length
prcp Precipitation
srad Shortwave radiation
swe Snow water equivalent
tmax, tmin Maximum / minimum air temperature
vp Vapor pressure
pet Potential evapotranspiration

Units follow the DayMet product conventions for gridded DayMet variables (see the DayMet documentation for exact units and definitions).

CDL-derived land use / land cover (daily time series) — each field is the percent of basin area in that CDL aggregate class (the classes sum to 100% at each time step). Values come from annual CDL summaries; on the daily time axis they stay fixed for long stretches and change roughly around the turn of the calendar year (end of December into January). Variable names use underscores (e.g. Grassland_Pasture, Developed_Open_Low for “Grassland/Pasture”, “Developed Open/Low”).

Corn, Cotton, Rice, Sorghum, Soybeans, Oilseed, Barley, Spring_Wheat, Winter_Wheat, Other_Cereals, Alfalfa, Other_Hay, Nuts, Peas_Beans, Tree_Crops, Melons, Berries, Herbs, Roots, Vegetables, Double_Crops, Aquaculture, Fallow, Developed_Open_Low, Developed_Med_High, Forest, Wetlands, Shrubland, Grassland_Pasture, Open_Water, Perennial_Ice_Snow, Barren

Streamflow (target)

Variable Description
QObs Observed discharge as millimeters per day (mm d⁻¹), computed from USGS cubic feet per second using the basin drainage area from Gages II (DRAIN_SQKM) and standard unit conversions used in the project’s NeuralHydrology preprocessing.

attributes/

File: static_attributes.csv

  • Rows: One per basin; the first column is the USGS basin ID (same identifier as the {USGS_ID}.nc filenames). When reading with pandas, use index_col=0 and treat IDs as strings (zero-pad to 8 digits if needed).
  • Columns: Concatenation of:
  • Gages II basin characteristics (e.g. drainage area, gage coordinates, dam density, withdrawals, hydro modification and morphology indices, elevation, slope, soil hydrologic group fractions HGAHGVAR, and encoded basin CLASS).
  • Climatology computed from DayMet basin-average daily time series over 2008–2023: long-term mean annual precipitation and PET, aridity index (PET / precipitation), precipitation seasonality (PREC_SEAS), snow fraction (SNOW_FRAC), and metrics of high/low precipitation frequency and duration (HPF, HPD, LPF, LPD).
  • CDL snapshot for a single reference year (2015): the same cropland/land-cover class columns as in the dynamic NetCDF files, representing percent of basin area for that year (used as static inputs in configurations that do not feed the full daily CDL stack).

Column names match the headers in the CSV (e.g. DRAIN_SQKM, LAT_GAGE, LNG_GAGE, DDENS_2009, … through the CDL aggregates). For soil-group codes (HGA, HGB, …), percentages describe the share of each NRCS hydrologic soil group within the basin (see Gages II documentation).


Data provenance (short)

  • Streamflow: USGS NWIS (processed Gages II–style daily extracts used in the project).
  • Meteorology: DayMet, spatially averaged to the basin (basin-mean CSVs / workflow described in the repository).
  • Static physiographic attributes: Gages II basin characteristics (USGS Gages II).
  • Land cover: USDA NASS Cropland Data Layer (CDL), aggregated to catchments and organized into the crop/land-cover classes above (https://www.sciencedirect.com/science/article/abs/pii/S0168169912000798?via%3Dihub).

Using this dataset with NeuralHydrology

  • dataset: generic
  • data_dir to this dataset root
  • dynamic_inputs, static_attributes, and target_variables consistent with the variables above (training may use a subset of the columns present in the files).

Point data_dir to the folder that contains time_series/ and attributes/ as siblings.


Citation and contact

When publishing this resource (e.g. on HydroShare), cite the associated paper / DOI and this dataset’s HydroShare identifier once it is assigned. For questions about field definitions or preprocessing, email baptiste@tova.earth

Credits

Contributors

People or Organizations that contributed technically, materially, financially, or provided general support for the creation of the resource's content but are not considered authors.

Name Organization Address Phone Author Identifiers
Casey Brown University of Massachusetts, Amherst
Samson Zhilyaev University of Massachusetts, Amherst
Baptiste Francois Tova Earth Inc.

How to Cite

francois, b. (2026). DROMEDARY US, HydroShare, http://www.hydroshare.org/resource/0148df2833b24d989592c94164cb735d

This resource is shared under the Creative Commons Attribution CC BY.

http://creativecommons.org/licenses/by/4.0/
CC-BY

Comments

There are currently no comments

New Comment

required