Validation of water main failure predictions: A 2‐year case study

Recent studies have shown that U.S. water mains are failing at an accelerating rate. In the meantime, water utilities are challenged by limited funding. It is important that water mains with much higher likelihood of failure (LOF) are replaced before they fail to avoid possible high consequences, such as public safety threats, high financial losses, and environmental damages. This article presents a model to evaluate the LOF of water mains using data available in geographic information systems (GIS). A case study is presented comparing 2 years of actual water main break data with the results of the model. The comparison shows a strong correlation between the model prediction and the actual break rates of main pipes; thus, it validates the robustness of the model and shows that funding can be used more efficiently by focusing on the water mains with a high LOF as predicted by the GIS model. This model has been used in New Jersey American Water's distribution systems. It can be used in other water systems to help guide water main replacement efforts.


| INTRODUCTION
The water infrastructure in North America is deteriorating. Recent studies show that water main break rates have increased by 27% during 2012-2018 (Folkman, 2018). Among the pipe materials used in the distribution system, the main break rate of cast iron pipes increased by greater than 40% (Folkman, 2018). With the accelerated pipe failures, more funding is needed for pipe replacement. An AWWA study showed that, over the coming decades through 2050, more than US$1.7 trillion are needed. Deferring the pipe replacement could cause increasing frequency of main breaks and of disrupted water services (AWWA, 2012).
In the face of high demand for pipe replacement, water utilities struggle with limited funding and the pressures of raising their customers' water rates. As not all pipes fail at the same time, it makes sense to allocate funding to pipes that are more likely to fail, thereby improving capital efficiency while maintaining or increasing the level of service to the customers.
Significant efforts have been made to predict and prioritize water main replacements, which includes topdown approaches and bottom-up approaches. For topdown approaches, Nessie curve tools, such as the Buried No Longer (BNL) tool released by AWWA, provide a forecast of long-term pipe replacement needs using some basic factors such as pipe age, pipe size, and pipe material (AWWA, 2012). While the BNL tool has value in providing an overall view of the pipe replacement, it is not designed for granular project-level determinations such as where specifically to replace pipe. Compared with the BNL tool, the KANEW model is a more advanced, topdown model. It is based on a cohort survival model proposed by Herz (Deb, Hasit, Grablutz, & Herz, 1998). This model has three parameters, which are estimated using historical main break or pipe replacement data (also called pipe "death" data). As with AWWA's BNL tool, the KANEW model does not provide estimation of pipe failure for individual pipes.
Considerable research on statistical models have been conducted to estimate the pipe condition and failures. Kleiner and Rajani (2001) conducted a comprehensive review of a statistical model used to predict the deterioration of water mains. Recent literature (Kimutai, Betrie, Brander, Sadiq, & Tesfamariam, 2015;Nishiyama & Filion, 2013;Osman & Bainbridge, 2011;St. Clair & Sinha, 2012;Wilson, Filion, & Moore, 2017) covered the review and application of statistical models as well, for example, the Weibull distribution curve, which also has three parameters to define the shape of the curve so that it can fit historical main break data (Osman & Bainbridge, 2011). In general, these statistical models are used more for projections of main breaks in the water system but cannot provide detailed pipe-level prediction.
New Jersey American Water Company, Inc. (the "Company") owns and operates roughly 9,000 miles of water mains and maintains copious amounts of condition data about them. The company operates a US$350 million annual capital investment program, which in 2019 included more than US$117 million toward investments in distribution system improvements. The program replaces between 75 and 100 miles of water mains annually (~1% of the statewide system). Informed decisionmaking is central to ensuring investments have the greatest positive impact while limiting removal of infrastructure that may have many years of useful life remaining. It is imperative that Company asset managers make consistent, measurable, and comparable investment decisions across multiple construction offices. It is important for managers to be able to measure the results of investment decisions to determine which investments have the greatest impact on the reduction of operating and maintenance (O&M) costs and the disruption to customers associated with main breaks, ultimately extending the useful life of existing infrastructure. A Water Research Foundation study (WaterRF, 2017) showed that 75% of water utilities used main breaks as the key factor in prioritizing pipe replacements. Replacing only pipes that break is a reactive posture that, while necessary, can be improved. From a high-level planning perspective, making pipeline replacement decisions on the basis of pipe age is generally acceptable. However, as pipes fail because of various factors, many pipes are still in good condition even though they may be defined as "beyond useful life" on the basis of age alone. The Company believed that developing a reliable and granular failure prediction model for its distribution main asset replacement program would facilitate its ability to quantify and reduce resources spent "reacting" to breaks and to increase resources in strategic capital investment.
Efficient replacement planning is a complex process that involves many variables, several of which are outside the scope of this study. These variables include considerations such as paving schedules, customer impact, water quality, hydraulic requirements, safety, and others. The goal of the model development was to analyze nearly 9,000 miles of a small-diameter water main (<16 00 diameter) and produce granular, reliable, and measurable predictive results that could quantify the relationship between prioritization decisions and a reduction in emergency O&M work to improve capital renewal efficiency. The goal of this study is to verify the predictive ability of the geographic information system (GIS)-based model in evaluating the LOF of pipes by presenting actual main break rates within each of the GIS model LOF prediction cohorts for 2 years following the model run. While model guidance was provided to Company decision makers after the model run, the decision makers were permitted to override model guidance for several operational priorities. This flexibility presented an opportunity for the model authors to observe, for two calendar years, the performance of main break failures on water mains that were not replaced by the Company following the model run.
The authors have successfully validated the GIS-based model across all districts and all pipe material types. Correlation, while present in all districts in both study years across the state, was shown to contain some variation by geographic district, sample size, and study year. The Company's districts are not homogeneous in terms of climate, age, hydraulics, or many other factors. The Company's smallest, as well as youngest, system analyzed showed the weakest correlation; however, this district accounted for only 4.3% of the cast iron water main and 2.3% of statewide cast iron breaks throughout the study period. The remaining larger districts within the Company's heavily populated suburban Northeast Corridor (zone between Philadelphia and New York) showed consistent and strong correlations. System-by-system correlation graphs are presented in Appendix A. Water utilities should expect some variation in the strength of correlation and may need to adjust variables, scores, and/or weights to account for local influencing factors, such as predominant material types.
In the interests of brevity, this article focuses on the LOF of small-diameter (<16 00 diameter) cast iron pipes statewide. The article demonstrates that, by using a specific GIS-based modeling, with well-cleaned GIS data, water utilities can quite effectively determine which small group of water mains has a much higher likelihood to fail earlier than their peers. Furthermore, the article demonstrates that replacing the highest-LOF cohort of water mains first can significantly reduce annual O&M expenses directly related to main breaks, leading to greater investment capital available for proactive main replacement investments and fewer disruptions to customers.

| Study hypothesis
• GIS-based modeling can classify water mains into cohorts according to LOF. Higher-LOF cohorts will experience significantly higher break rates over time.
Lower-LOF cohorts contain a larger percentage of main, while higher-LOF cohorts contain a much smaller percentage of main. • By aligning more water main replacement opportunities with higher-LOF cohorts, water utilities can reduce O&M costs and service disruptions related to emergency main break repair.
The study tracked pipeline performance following a GIS-based model run in September 2017. This article presents the results of the predictive model run and the actual failure rates of water mains within each of five LOF cohorts generated by the model. Specifically, the study tracked failures on each water main for two full calendar years, trailing the GIS-based model run. Mains that were replaced during the interim years were excluded from annual failure rate data.

| GIS data preparation
Reliance on model guidance requires a high confidence in the validity of the data entered. Most utilities use GIS to track some portion of their asset inventory (U.S. Environmental Protection Agency, 2013). The amount of time required to bring GIS and field transactional data (i.e., break locations) to an acceptable level of accuracy for prediction will vary by utility. Prerequisite data cleansing is an exercise that is necessary for accurate predictive results; however, the Company has found that the labor cost is small compared with value delivered. For example, the initial data cleansing of 5 years of main break points for approximately 9,000 miles of main totaled approximately one GIS analyst full-time equivalent (FTE). The initial cleansing of material, install date, and pipe diameter totaled approximately one additional GIS analyst FTE. In contrast, the potential annual O&M savings from the GIS model results were greater than 5 FTE, and the potential annual capital opportunities created were greater than 40 FTE. Maintaining data cleanliness past the initial phase requires a nominal effort closely related to common GIS work at the Company.

| Case study areas and data quality description-New Jersey
The Company had 3,779 miles of cast iron main in New Jersey at the beginning of the study period, spanning five geographic districts. Cast iron is a prevalent material in most of the 9,000+-mile study area and experiences the most breaks (percent of main breaks for cast iron pipes), which makes it the largest sample available. Following the GIS model run, 43 miles of cast iron were retired in calendar year 1, and an additional 54 miles of cast iron were retired in calendar year 2. None of this main was replaced with cast iron.
GIS data within the study area are considered excellent. With reasonable expectations of accuracy respectively, more than 97% of assets have assigned dates of installation, more than 99% have populated diameter, and more than 99% have populated material type. More than 99% of the GIS assets have subfoot Global Positioning System (GPS) coordinates associated with the location. Every main break in the system, dating back to 5 years before the model run, has been manually snapped to the exact failure main segment within GIS. This effort was undertaken to eliminate the assignment of failure data to "good" mains by commonly used automated methods such as geocoding. Frequently, geocoding assigns points close to a true location, but often, such points are not useful in a segment analysis. Water quality complaints were similarly reviewed to ensure complaints were related to water quality rather than temporary maintenance activity, such as hydrant flushing. Fire flow data and velocity data were appended to the GISbased model from the Company's hydraulic model.

| VARIABLE WEIGHTING
In the model, each variable has a corresponding GIS layer. Every main segment in the model is assigned a value for each variable associated with it (if applicable). Table 1 displays variables and corresponding weights within the GIS-based model. Weights are applied to variable scores on a segmentby-segment basis within the model. Note that consequence of failure (COF) variables are available if desired. The authors considered most COF variables, apart from water quality complaints, not to have any influence on water main failure predictions (LOF). As such, COF has been mostly excluded from this case study. The COF values, and other considerations, are certainly considered in later portions of the capital investment planning process, which are outside the scope of this article. Available fire flow (FF) was considered an LOF variable in this analysis; however, it can be argued that FF is a COF. Regardless of its effect on failure, the Company has very complete data for FF, and low flows are not desired, so the Company determined FFs should be prioritized. Readers should interpret FF and water quality complaints as being neutral or, perhaps, diluting predictive variables in this case study (Figures 1-3).

| Main breaks (and leaks)
The GIS staff at New Jersey American Water developed two mechanisms for including failure data (main breaks or leaks) within the LOF prediction model. "Frequency" takes into account prior failure of a specific main, while "kernel density analysis" (Environmental Systems Research Institute [ESRI], 2019a) takes into account microgeographies (neighborhoods) that may be adversely affecting certain material types of the main within a very specific location, causing the main of a certain material type to be particularly vulnerable to near-term failure. Densities describe a predicted distribution of a phenomenon over a surface (ESRI, 2019c).
• Break frequency per segment (count of breaks and/or leaks by segment) Break frequency identifies individual segment failure counts. It is not a perfect measure because of the arbitrary nature of pipe segmentation within a given GIS; however, it is a far more appropriate measure for segment analysis than is using a break rate per segment. The American Water GIS contains segment lengths that are often very small, as well as arbitrary. Individual segment break rates, therefore, are not a useful variable to the Company for this study. The scores assigned to the pipe based on break frequency are shown in Table 2.
• Break kernel density (rate of main breaks by neighborhood and material type) Kernel densities are the most important part of the GIS methodology because they rank the anticipated performance of a water main within a neighborhood whose boundaries are governed by a phenomenon rather than delimited by a superimposed "zone." The kernel density variable calculates from the data that something-the authors are not concerned with "what"-is negatively affecting a material in a small neighborhood, which places the remaining matching material within that neighborhood at a significantly elevated risk of failure. Break kernel densities are mathematical curves of spatial densities of main break activity. The kernel function used by ESRI ® GIS software is based on the quadratic kernel function Silverman described (Silverman, 1986, p. 76, eq. 4.5). Kernel density is a common spatial statistical tool used to understand the intensity and spatial patterns of point phenomena (King, Thornton, Bentley, & Kavanagh, 2015). The GIS kernel densities can add nuance ranking and other contributing factors that humans cannot otherwise easily observe or infer from the data.
In effect, kernel densities isolate extremely high break rates. Rather than measure at a system, town, or district meter zone level, kernel densities measure the intensity of the problem where it has been occurring. The extremely high break rates within these very small "hotspot" zones are the most influential factor on break rates within any size system.
If a matching material main is within a hot spot, the GISbased model does not automatically rank it as "high LOF," but if the main also has other negative variable scores, it will almost certainly fall within the high-LOF cohort.

| Break kernel density (rate)
Break kernel densities are calculated for material types that make up more than 5% of total breaks within a district. Density ranges are relative to the break population within each material and are classified by GIS with the Jenks' optimization (ESRI, 2019b) formula. Also called Jenks' natural breaks, the formula minimizes each class's average deviation from the class mean while maximizing each class's deviation from the means of the other groups (Chen, Yang, Li, Zhang, & Lv, 2013). The density scoring chart in Table 3 is an example. Relative scores vary by location and are determined by the density formula, which is driven by location and frequency of breaks.
Note: Score classes for materials other than cast iron may be displayed in some tables, but are not part of the present study.

| Pipe age
Several variable weights were attempted in early model testing, and the results were viewed side by side in test neighborhoods. After iterative review, the best and most granular results were determined to come from including age, material, and survival probability as separate variables in the model. Table 4 shows the age variable scoring chart.

| Cohort survival probability (Weibull proportional hazard model)
Survival probability curves are applied within the GIS model and scored and weighted to reflect their F I G U R E 3 Break kernel density results revealing a localizedmaterial break "hotspot." The geographic information system model will assign scores from the hotspot only to water mains of this material type. The matching material main receives the "hottest" score that it "touches." Light gray mains in this figure are of different material types, so they do not receive a negative score from this hotspot Unknown 0 generalized impact to predictive value. The Weibull proportional hazard model (WPHM), in particular, has been shown to be a strong indicator of cast iron and ductile iron breaks (Kimutai, Betrie, Brander, Sadiq, & Tesfamariam, 2015). Cast iron and ductile iron are the predominant material types within the Company. The cohort survival probabilities used in the Company's GIS model are produced from the Weibull survival function.
In general, a survival function is 1 − the cumulative density function (CDF; Washington, 2019). The CDF, in turn, is the integral of the failure probability density function. Calculating the expression 1 − integral (Weibull probability density function) produces the survival function of: f x ð Þ = e − x=b ð Þ a ð Þ , where b and a are coefficients, x is age, and f(x) is cohort survival probability (Innovyze, 2019a;2019b).
For this analysis, the percent chance of survival over 5 years from model runtime (2022) was chosen as a variable to participate in the GIS LOF model. The survival probability from WPHM is classified into variable scores using Jenks' natural breaks method. The Jenks' classification is applied using ESRI ® GIS software (ESRI, 2019b). Figure 4 is an example set of WPHM survival probability curves, with all material types displayed, and is not the actual data from this study.  1910-1930 9 1931-1950 7 1951-1970 6 1971-1990 4 1991-

| Velocity
High-flow velocity in the distribution main can be an indicator of potential risk of failure due to increased general stress or sudden pressure fluctuations (water hammer). Table 6 shows the Velocity scoring chart.

| Water quality complaints
This variable is scored by kernel density (Table 7), using a similar workflow as that of the other variables that use relative rates. Water quality complaint points are turned into hotspots. The classification of the scores is performed by the Jenks' natural breaks method as applied by ESRI software (ESRI, 2019b). Ranges are driven by location and frequency of complaints. Water quality complaints are arguably a consequence variable; however, they can be indicative of condition problems in a system that are otherwise not present in data records.

| Available fire protection
The FF scoring chart (Table 8) registers the severity of low flows below the generally desired 3,500 gpm. Future plans include a more granular FF score based on local zoning layers, but this is not possible with currently available data. While FF is a stronger COF than LOF variable, high velocities in a fire event can lead to a water main failure. Company leadership, however, chose to include the variable in the analysis primarily because low flows are associated with poor interior pipe condition. In addition, as a COF variable, FF data are widely available and simple to model. The effect of the FF variable on prediction ability is still unknown.

| CASE STUDY-STATEWIDE RESULTS
Break rates are reported using the industry standard of breaks/100 miles/year. Colors within Figure 5 represent five LOF cohorts created by the GIS model (low-high risk of failure). As discussed earlier in this article, there are many valid reasons to replace a water main besides the LOF. A low-LOF GIS cohort does not necessarily indicate a "healthy" pipe or a pipe that is properly sized for the system. It simply indicates the pipe is not considered by the GIS LOF model to be at imminent risk for a nearterm failure or is at the highest priority to replace compared with its peers. The authors admit that the inclusion of FF and water quality complaints as variables may have a mitigating factor on near-term failure correlation to LOF risk cohorts; however, the Company's priorities dictated the inclusion of these variables because they are indicators of potential poor interior pipe conditions. Figure 5 illustrates that, over the 2-year study period, more than 77% of the Company's cast iron pipes experienced a significantly lower break rate than the 2018 national average of 34.8 breaks/100 miles/year for cast iron pipe (Folkman, 2018). In addition, averaged over the study period, more than 90% of the Company's cast iron mains experienced lower break rates than the 2018 national average. The table indicates that age alone is not a reliable indicator of large increases in break rates. This is not to say that age does not matter; it suggests that age as a single variable is not a reliable indicator of failure. Age is included as a variable within the GIS LOF model, both directly as an individual variable and indirectly as part of the Weibull survival probability variable. The similarity in average age between some GIS LOF prediction cohorts, along with large jumps in break rates between those cohorts, indicates that the GIS LOF model predicts failure more precisely than does age guidance alone.
Each LOF cohort in Figure 6 has two bars representing the actual failure rates of years 1 and 2, respectively. The purple axis (the y axis) measures the actual failure rate. The purple trend line displays the actual break rate average of years 1 and 2 within each GIS prediction cohort. The trend line illustrates the potential of the GIS model to refine asset replacement strategies. The AWWA Buried No Longer Tool and other similar high-level age-based guidelines are not granular enough to inform individual replacement decisions, leaving professionals to call on institutional knowledge, system records, and human processes to guide prioritization with limited funds available.
The GIS model, however, performed a completely different type of assessment of pipes than age-based guidance methods are designed to identify. The GIS model applies simple mathematics, using a limited number of important variables for each pipe segment in a "bottom-up" manner. Note that there is a higher total number of breaks in lower-LOF cohorts. This is expected random-failure behavior given the far larger population within these cohorts. There are 32% more total breaks in Year 1, which experienced a much harsher Arctic cold spell than did Year 2. With the removal of the (unmodeled) weather stressor in Year 2, the break rates of all cohorts except the high risk were lower. In fact, the high risk cohort break rate accelerated in Year 2, presumably because of the persistence of conditions being modeled, while the weather stressor was removed. In fact, the weather stressor seemed to have little, if any, effect on the high risk cohort. More climate monitoring over several years would be required to establish the weather relationship. The case study indicates there is an overall high degree of correlation between actual failure rates and GIS model LOF prediction cohorts. The high-LOF cohort is small enough in length that utilities can, in fact, plan on replacing all pipes within this cohort in the near term. This will reduce service disruptions, save on operating expenses associated F I G U R E 5 Statewide geographic information system (GIS) model cohort prediction performance of the cast iron main. The charts indicate a strong correlation between the GIS model's higher likelihood of failure cohorts and elevated near-term higher break rates. Break rates are calculated using the miles of main remaining after replacement activity. Avg., average with repairing breaks, and enable installation of more water mains with little to no impact on customers.
During the course of the study period, the Company's decision makers had access to the model guidance but were allowed to "override" the guidance for a variety of operational priorities. The Company did not yet have empirical evidence of how robust the model risk cohorts would be in a predictive nature. When the study was completed in the fall 2019, the Company began limiting the number of valid operational justifications to override the GIS model guidance, and it further implemented automatic notifications of projects requiring override. This protocol is intended to encourage the acceleration of the replacement of higher-risk GIS model cohorts when and where operationally feasible.
In the study period, the Company did, in fact, replace approximately 32% of the highest-risk cohort. The authors felt it important to illustrate how a small change in decision behavior-replacing the remaining 68% of high-risk LOF by the end of Year 1-could have resulted in more water main replaced, with little to no impact on customers, while reducing service disruptions.
To illustrate this point, examine this hypothetical: • Assume the average cost of an unplanned repair is $4,000 (it can be higher) As a result of careful attendance to GIS data quality, the model can identify trends within very small groups of pipes, which will experience failure first. Based on the hypothetical, the annual value delivered by adherence to the model guidance is more than 20× the initial data cleanup effort cost of two FTEs. To fully realize the potential of this hypothetical, decision makers would, of course, need to defer replacement of main in lower-risk cohorts. The lower-LOF main might need to be replaced for any number of reasons outside the scope of this article (such as a new hydraulic demand). There is also the small possibility that the main deferred for replacement in lower-LOF cohorts experiences failures during year 2. In this hypothetical, however, the probability at a statewide level of any 1 mile of main deferred from the low-LOF cohort failing during year 2 is significantly lower than a main failing in the high-LOF cohort. Furthermore, the GIS model can offer practical guidance in choosing which mains to defer. Although there are only five LOF Cohorts, the GIS model prioritizes each main segment with a score of 1-1,000 prior to classification into the five LOF cohorts. The GIS model can isolate the lowest-scoring LOF individual mains, which can then inform the managers about which mains are the least likely to experience failure in order to guide granular deferment decisions.

| CASE STUDY CONCLUSIONS
The GIS-based model and case study confirmed the two hypotheses: • GIS-based modeling can identify small cohorts of water main at high LOF for near-term failure. Higher-LOF cohorts will experience higher break rates over time.
High-risk LOF cohorts will have a small, manageable population. • By aligning more main replacement opportunities with higher-LOF cohorts, water utilities can reduce O&M costs and service disruptions related to emergency main break repair.
It must be noted that, while this study focused solely on the results of cast iron main within LOF cohorts, the GISbased model used for this study scored all material types of main. Cast iron mains represent approximately 42% of the statewide mains ranked by the model. A cast iron main is also considered stronger than many other types of mains and is scored as such in the model. In effect, a cast iron main in this study competed for a higher LOF rating along with all other material types, for example, poorly performing cement main. Although it is outside the scope of this article, the correlation between failure rates and cast iron GIS LOF cohorts could be strengthened if no other material type was modeled. The question, however, is purely academic because, in practice, all pipeline material types compete against one another for replacement funding.
The GIS-based model has performed a far more granular, as well as comprehensive, assessment of all pipes than traditional industry age-based guidance and assessment methods are designed to identify. While there is year-to-year variation in the total number of breaks, as well as in the magnitude of break rate within cohorts, the case study suggests a high degree of correlation between actual failure rates and GIS LOF prediction cohorts. It is imperative, however, that the GIS data be properly maintained and curated. GIS data are central to utilities' ability to predict LOF reliably. While largely a manual effort, labor costs associated with GIS data cleansing are a fraction of the value delivered.
New Jersey American Water's GIS-based computer LOF model removes a significant amount of subjectivity from baseline prioritization by providing a "bottom-up" comprehensive score of each main. No artificial intelligence is used. The variables considered are not novel; however, the attention to precision and accuracy within the GIS system is. The study results show that, when variables that affect the condition of a main are assigned as attribution to the correct mains within a GIS system, the system can then produce a remarkably accurate, reproducible, and granular failure prediction model.

A P P END I X A : DISTRICT MODEL RESULTS
The appendix contains the trendline graphs of each of the five geographic districts in New Jersey that were combined to create the statewide results presented in the study. The central, southwestern, northern coastal, and north portions of the state displayed a stronger correlation than the southeast. The coastal southeast region is significantly smaller and younger than the other four districts. This district is very flat and experiences a seasonal population, and material types are predominantly ductile iron (Figures A1-A5).
F I G U R E A 1 Over the 2-year study period: 587 breaks and 852 initial miles of cast iron F I G U R E A 2 Over the 2-year study period: 578 breaks and 1,378 initial miles of cast iron