Volunteer Accuracy in a Benthic Macroinvertebrate Participatory Science Project - Data and Code

Created: Feb 03, 2024 at 10:33 p.m.
Last updated: Nov 11, 2024 at 2:01 p.m. (Metadata update)
Published date: Nov 11, 2024 at 2:01 p.m.
DOI: 10.4211/hs.3fa46a0a96cb47219aab2230ad141a42
The data and R code provided here are the underpinnings of a manuscript in the journal, Citizen Science: Theory and Practice (Volunteer Accuracy in a Benthic Macroinvertebrate Participatory Science Project). Volunteer-derived aquatic macroinvertebrate identifications and resulting water quality metrics are compared to results from a professional entomologist. The assessment included a total of 357 benthic macroinvertebrate quality control (QC) samples collected by volunteers using leaf packs, kick nets, and visual assessments between 2011 and 2016 for the Environmental Quality Institute (EQI) in North Carolina, USA. Of the 357 total samples, 284 were of sufficient quality to be used in the analysis. Data incudes counts of organisms conducted by volunteers on each sample and counts conducted by an entomologist. Macroinvertebrate index values are calculated based on volunteer and entomologist counts and are compared using linear regression and Bray-Curtis dissimilarity methods.

Volunteer Accuracy in a Benthic Macroinvertebrate Participatory Science Project - Data and Code Summary

This code was successfully run in February 2024 using R version 4.3.2, with a PC running Windows 10 Enterprise, 64-bit operating system. R packages used in the analysis include: vegan and reshape2.

W. Adam Sigler
Check my ORCiD for my current email address

The original data presented here are available under CC-BY 4.0

Folder Structure and File Descriptions

This file (48 columns and 715 rows, including column headings) contains data with all taxa group counts for volunteers and biologist for all sites, seasons, years, and collection methods. Columns are taxa groups and each row is a site visit for the volunteer or entomologist counts. This file is an output from the workflow component of the resource where it was compiled from 11 raw data Excel files.  

This file is an inventory of sample quality (n = 357), which was created by Virginia Hamilton through a manual assessment of the biologist report indicating condition for each sample. A quality of 1 indicates no issues. A quality of 2 was a rough initial accounting of samples with 3 or fewer taxa observed, but this assignment was not consistently applied and was not used in the analysis. A quality of 3 indicates an issue with preservation or sample labeling that precluded use in the analysis. 

This file contains a list of the 43 taxa under consideration with SMIE name, SMIE tolerance score, trophic group, and Order. This input table provided the taxa group names that was the foundation of Table 1 in the manuscript. Observation counts and similarities were added to this template table during analysis to create Table 1. 

This R script contains code to conduct all analysis and generate all tables and figures for the manuscript. This script imports data from the 1_Data folder and exports results to the 3_Results folder. 

This R script contains code for fitting linear regression models to data and adding statistics to regression plots. This script is used as source code for the primary AllAnalysis_2024-02-03_1237_was.R Script. 


Plots Folder
Figure 3 in the manuscript - boxplot of similarity scores for the 284 samples used in the analysis. 	

Same as Figure 3 from the manuscript, with mean similarity printed on plot.

Figure 1A from the manuscript  regressions of macroinvertebrate index values for volunteer versus entomologist identified samples; includes all samples (including visual assessment). 

Figure 1B from the manuscript  regressions of macroinvertebrate index values for volunteer versus entomologist identified samples; includes only leaf pack and kick net samples (excludes visual assessment).  

Figure 2 from the manuscript similarity scores by taxa group; this version omits labels to facilitate manual labeling of points.

Same as Figure 2 from the manuscript, with labels automatically added in R for reference, but not sufficiently aesthetic for manuscript.

This file is created by the AllAnalysis_2024-02-03_1237_was.R script and has a row for each sample analyzed (284) and has columns for macroinvertebrate index values based on the volunteer counts and the entomologist count as well as the Bray-Curtis similarity for the volunteer versus entomologist counts. 

This file is created by the AllAnalysis_2024-02-03_1237_was.R script and is the data for Table 1 in the manuscript. 

This file is created by the AllAnalysis_2024-02-03_1237_was.R script. It includes a row for each taxa group in Table 1 and has columns with organism counts for each site visit date for each sampling method for volunteers and the entomologist. 

This resource is referenced by Hamilton, V., K.F. Stepenuck, R.A. Zinna, A.M. Traylor, D. Penrose, W.A. Sigler. (in press) Volunteer Accuracy in a Benthic Macroinvertebrate Participatory Science Project. Citizen Science: Theory and Practice. DOI:

How to Cite

Sigler, W. A., V. Hamilton, A. M. Traylor (2024). Volunteer Accuracy in a Benthic Macroinvertebrate Participatory Science Project - Data and Code, HydroShare,

This resource is shared under the Creative Commons Attribution CC BY.


