Saikumar Payyavula

Oklahoma State University

Subject Areas: water management

 Recent Activity

ABSTRACT:

This dataset was developed to support research on predicting alum dosage in small water treatment plants. It combines daily plant records with weather data, including maximum temperature (TMAX). To make the data reliable for analysis and modeling, outliers and incorrect readings were carefully removed using logical and domain-based rules.

Records with clearly impossible or error values, such as extremely high or negative numbers, were deleted. Each variable was kept within realistic operating limits—for example, alum between 0 and 3500 mg/L, hardness between 5 and 1000 mg/L, and alkalinity between 2 and 1000 mg/L. Unusual readings like pH = 0.54 were also removed. Missing value rows were entirely removed from the dataset.

Through this cleaning process, the dataset became consistent, accurate, and ready for machine-learning models that can better predict chemical dosing and support safer, more efficient water treatment operations.

Show More

ABSTRACT:

This resource contains a Python script used to clean and preprocess the alum dosage dataset from a small Oklahoma water treatment plant. The script handles missing values, removes outliers, merges historical water quality and weather data, and prepares the dataset for AI model training.

Show More

ABSTRACT:

This HydroShare collection contains three resources. The two datasets in the collection were used in the study ‘Artificial Intelligence to Assist Small Water Treatment Plant Operations.’ The first dataset (Raw Data) contains historical water treatment and Oklahoma Mesonet weather records from 2011–2024 in unprocessed form. The second dataset (Cleaned Data) is the processed and merged version of the same data, cleaned for duplicates, and missing values were removed. Together, they provide a transparent data pipeline from raw input to AI-ready dataset for modeling alum dosing.

Show More

ABSTRACT:

This dataset contains daily raw water treatment plant operational data and Oklahoma Mesonet weather data collected from 2011–2024. It includes inflow, pH, turbidity, alkalinity, alum dosage, and daily aggregated weather attributes such as TMAX, TMIN, humidity, and pressure. Data is provided in raw, pre-cleaning form for reproducibility.

Show More

 Contact

Resources
All 0
Collection 0
Resource 0
App Connector 0
Resource Resource

ABSTRACT:

This dataset contains daily raw water treatment plant operational data and Oklahoma Mesonet weather data collected from 2011–2024. It includes inflow, pH, turbidity, alkalinity, alum dosage, and daily aggregated weather attributes such as TMAX, TMIN, humidity, and pressure. Data is provided in raw, pre-cleaning form for reproducibility.

Show More
Collection Collection

ABSTRACT:

This HydroShare collection contains three resources. The two datasets in the collection were used in the study ‘Artificial Intelligence to Assist Small Water Treatment Plant Operations.’ The first dataset (Raw Data) contains historical water treatment and Oklahoma Mesonet weather records from 2011–2024 in unprocessed form. The second dataset (Cleaned Data) is the processed and merged version of the same data, cleaned for duplicates, and missing values were removed. Together, they provide a transparent data pipeline from raw input to AI-ready dataset for modeling alum dosing.

Show More
Resource Resource
Python Script for Cleaning Alum Dataset
Created: Oct. 14, 2025, 3:39 a.m.
Authors: payyavula, saikumar · Sadler, Jeff

ABSTRACT:

This resource contains a Python script used to clean and preprocess the alum dosage dataset from a small Oklahoma water treatment plant. The script handles missing values, removes outliers, merges historical water quality and weather data, and prepares the dataset for AI model training.

Show More
Resource Resource

ABSTRACT:

This dataset was developed to support research on predicting alum dosage in small water treatment plants. It combines daily plant records with weather data, including maximum temperature (TMAX). To make the data reliable for analysis and modeling, outliers and incorrect readings were carefully removed using logical and domain-based rules.

Records with clearly impossible or error values, such as extremely high or negative numbers, were deleted. Each variable was kept within realistic operating limits—for example, alum between 0 and 3500 mg/L, hardness between 5 and 1000 mg/L, and alkalinity between 2 and 1000 mg/L. Unusual readings like pH = 0.54 were also removed. Missing value rows were entirely removed from the dataset.

Through this cleaning process, the dataset became consistent, accurate, and ready for machine-learning models that can better predict chemical dosing and support safer, more efficient water treatment operations.

Show More