Over the past year, PWSA and the University of Pittsburgh’s School of Computing and Information (SCI) and University Center for Social and Urban Research (UCSUR) collaborated on a machine learning model that will help better predict the locations of remaining lead service lines in PWSA’s water distribution system. Through this partnership, PWSA will better be able to predict the locations of lead lines where no reliable record is found, avoiding costly excavations and impact to our customers.
Understanding the Inventory
The ultimate goal is to replace older water mains that have a high percentage of lead lines attached that are located in areas with high concentrations of at-risk populations. To gain a clearer understanding, we have used various historical records and investigations to find lead service lines. PWSA began this collaboration with the SCI and UCSUR to create a machine learning model that would help make sense of these various streams of information.
Determining Probability
The goal of the model is fairly simple: to provide a statistical probability of a property having a lead service line, taking into consideration all the different data points available for a given property. To do this, the research team at the University of Pittsburgh tested different predictive models to find the one that provided the most accurate predictions and “meticulously interpolated missing data, balanced the data set, and pruned weak predictors,” said University of Pittsburgh authors Saeed Hajiseyedjavadi, Dr. Michael Blackhurst, and Dr. Hassan A. Karimi.
Findings
After running the model and removing any data points that were not increasing accuracy, researchers found that curb box inspections and tap water lead levels were most useful in providing a strong probability of a lead service line. In other words, the various historical data points alone may or may not point to a lead line, but a recent elevated water sample and curb box inspections showing lead provide the most useful results. Additionally, geographical location, building characteristics, and available historical records were among the most useful metrics.
The Future of Lead Service Line Removal
Over the next four years, PWSA will invest over $250 million replacing aging water mains and all lead service lines attached to those old mains. To plan our replacement locations, it will be crucial to combine water main age with the findings of the machine learning model to invest ratepayer dollars wisely.
In order to fulfill PWSA’s goal of replacing all public lead lines by 2026, we need to use all sources of information to target ratepayer dollars effectively. The findings of the model will help PWSA crews more effectively find lead where they dig and avoid costly excavations where lead is not found.
To learn more, read the press release on our website.