Checking for non-preferred file/folder path names (may take a long time depending on the number of files/folders) ...
This resource contains some files/folders that have non-preferred characters in their name. Show non-conforming files/folders.
This resource contains content types with files that need to be updated to match with metadata changes. Show content type files that need updating.
Supporting data and tools for "Impact of data temporal resolution on quantifying residential end uses of water"
Authors: |
|
|
---|---|---|
Owners: |
|
This resource does not have an owner who is an active HydroShare user. Contact CUAHSI (help@cuahsi.org) for information on this resource. |
Type: | Resource | |
Storage: | The size of this resource is 16.1 MB | |
Created: | Apr 19, 2022 at 5:10 p.m. | |
Last updated: | Aug 08, 2022 at 4:47 p.m. (Metadata update) | |
Published date: | Aug 08, 2022 at 4:46 p.m. | |
DOI: | 10.4211/hs.6625bdbde41c45c2b906f32be7ea70f0 | |
Citation: | See how to cite this resource |
Sharing Status: | Published |
---|---|
Views: | 1122 |
Downloads: | 59 |
+1 Votes: | Be the first one to this. |
Comments: | No comments (yet) |
Abstract
The files provided here are the supporting data and code files for the analyses presented in "Impact of data temporal resolution on quantifying residential end uses of water", an article submitted to the Water journal (https://www.mdpi.com/journal/water) The journal paper assessed how the temporal resolution at which water use data are collected impacts our ability to identify water end use events, calculate features of individual events, and classify events by end use. Additionally, we also explored implications for data management associated with collecting this type of data as well as methods and tools for analyzing and extracting information from it. The data were collected in the cities of Logan and Providence, Utah, USA in 2022 and are included in this resource. The code and data included in this resource allow replication of the analyses presented in the journal paper, and the raw data included allow for extension of the analyses conducted.
Subject Keywords
Coverage
Spatial
Content
readme.md
Products included in this HydroShare resource:
- Code to reproduce all analyses described within the related manuscript.
- The anonymized pulse data used for analyses described in the manuscript.
Files are organized as follows
-
FileSize_TempAg contains:
- A folder named pulsedata with daily CSV files with pulse data for sites 1 and 2. These files are named: s_mmdd.csv, where s represents the site number, 001 or 002. and mmdd are the month and day. All data were collected in 2022.
- A folder named tmp_agg with daily CSV files for temporally aggregated data for sites 1 and 2. These files are named: opn_2022-mm-dd_df_st_tr_ts.csv, where n represents option 1 or 2 (as defined in the article), mm-dd are the month and day, st represents the site number (001 or 002), and *t is the temporal aggregation of the data (1 s, 4 s, 5 s, or 10 s).
-
LogEvents contains:
- EventLog_001.csv: Original CSV file with events logged by Site 1 residents. The file has the following information: datetime: date and time where the event was identified by the user, userlabel: exact label given by the residents, counterval: number of event, label: end use.
- EventLog_002.csv: Original CSV file with events logged by Site 2 residents. The file has the following information: datetime: date and time where the event was identified by the user, userlabel: exact label given by the residents, counterval: number of event, label: end use.
- UserLabelledEvents.csv: Processed event files including events labelled by users at both sites that were found in the pulse data. The file has the following information:
- datetime: event start date and time, YYYY-MM-DD HH:MM:SS in MST
- id: an event counter
- duration: event duration, in seconds
- volume: event volume, in liters
- average_fr_LPM: average flow rate, in liters per minute (LPM)
- median_fr_LPM: median flow rate, LPM
- maximum_fr_LPM: maxflow rate, LPM
- mode_fr_LPM: mode flow rate, LPM
- mode_freq_perc: percentage of values that are equal to the mode
- iqr_fr_LPM: interquartile range, LPM
- sd_fr_LPM: standard deviation of the flow rate, LPM
- range_fr_LPM: max - min flow rate, LPM
- ValuesCount: numer of values recorded in the event,
- label_datetime: date time when the event was labelled by the resident
- userlabel: label assigned by the residents
- counterval: an event counter
- label: end use
- site: site where the event was labelled
-
MeterReading_Logs contains:
- MeterReadingsLog_001.xlsx: Manual meter readings conducted at site 1. The file has the following information: MR: a counter of meter readings, DateTime: date and time when the meter was read, Reading: actual meter reading, Volume: volume since last reading
- MeterReadingsLog_002.xlsx: Manual meter readings conducted at site 2. The file has the following information: MR: a counter of meter readings, DateTime: date and time when the meter was read, Reading: actual meter reading, Volume: volume since last reading
-
PulseData_Processed contains:
- site_001_AllData.csv: Original CSV file with all the pulse data for site 1 used in the article. The file has the following information: datetime: exact date and time when a pulse was logged by the Pulse Datalogger, and pulse_spacing: time since las pulse, in milliseconds.
- site_002_AllData.csv: Original CSV file with all the pulse data for site 2 used in the article. The file has the following information: datetime: exact date and time when a pulse was logged by the Pulse Datalogger, and pulse_spacing: time since las pulse, in milliseconds.
-
RawMagnetometerData contains:
- RawData_Magnetometer.csv: Raw magnetic field data. The magnetic field is sampled at 155 Hz. The magnetic field is expressed as an unsigned number that varies from 0 to 65,535 in the assigned range (± 4 gauss).
- RawData_Magnetometer.csv: Raw magnetic field data. The magnetic field is sampled at 155 Hz. The magnetic field is expressed as an unsigned number that varies from 0 to 65,535 in the assigned range (± 4 gauss).
-
RawPulseData contains:
- The original CSV files recorded by the Pulse Datalogger at both sites.
- The files are named: s_mmdd.csv, where s represents the site number, 001 or 002. and mmdd* are the month and day. All data were collected in 2022. The files contain a 3 line header with 1) Date (exact time when logged started), 2) Site: site where data was logged, and 3) ID: a datalogger ID. The files have only one value: time since last pulse (in milliseconds), and the first value is time since logging started (indicated in the 1st line of the header).
All personally identifiable information was removed from the files published here to protect the identities of the study participants.
The R code provided in this resource was developed using: R version 4.1.2 (2021-11-01). Platform: x86_64-apple-darwin17.0 (64-bit). Running under: macOS Monterey 12.0.1
The following R packages are required for running the provided scripts:
- lubridate - Version 1.8.0 Functions for working with dates/times.
- tidyverse - Version 1.3.1. A collection of R packages designed for data science.
- readxl - Version 1.3.1. makes it easy to get data out of Excel and into R
- scales - Version 1.1.1. Graphical scales map data to aesthetics, and provide methods for automatically determining breaks and labels for axes and legends
- cowplot - Version 1.1.1. It provides various features that help with creating publication-quality figures, such as a set of themes, functions to align plots and arrange them into complex compound figures, and functions that make it easy to annotate plots and or mix plots with images.
- ggh4x - Version 0.2.1. It provides some utility functions that don’t entirely fit within the ‘grammar of graphics’ concept —they can be a bit hacky— but can nonetheless be useful in tweaking your ggplots.
Instructions for Reproducing Results
To reproduce the results:
- Download the entire folder. Leave the files together in the folder to ensure the paths to the files remain correct.
- Execute FileSizing.R, DataAnalysis.R, DataVerification_QC.R, or RawData_Magnetometer.R using R https://cran.r-project.org/ or R-Studio https://rstudio.com/. The script Functions.R contains functions used in the other scripts and does not produce any output.
Related Resources
The content of this resource references | GitHub repository for the pulse datalogger used to collect the data in this study: https://github.com/UCHIC/CIWS-Pulse-Logger |
This resource is referenced by | Bastidas Pacheco, C.J., Horsburgh, J.S.., Beckwith, A.J. (2022). Impact of temporal resolution on data for quantifying residential end uses of water. Submitted for publication in the Water journal. |
Credits
Funding Agencies
This resource was created using funding from the following sources:
Agency Name | Award Title | Award Number |
---|---|---|
National Science Foundation | Cyberinfrastructure for Intelligent Water Supply (CIWS): Shrinking Big Data for Sustainable Urban Water | 1552444 |
Utah Water Research Laboratory |
How to Cite
This resource is shared under the Creative Commons Attribution CC BY.
http://creativecommons.org/licenses/by/4.0/
Comments
There are currently no comments
New Comment