Table of Contents
Definitions and Concepts
As a reference chapter, the following discussions present terms
and concepts used throughout this text.
Pipe, pipeline, component, facility
As used in this book, a pipeline segment can be any length of pipe, not necessarily a ‘joint’ length. A component is a part of a pipeline that is other than a pipe segment and can be a flange, valve, fitting, tank, pump, compressor, separator, filter, regulator, or any of many other portions of a typical pipeline. A pipeline is a collection of pipe segments and components. A facility is a collection of components. A system is one or more pipelines and associated facilities. See also the discussion of segmentation for purposes of assessing risk under .
Risk concepts covered in this book are meant to apply to any segment of pipe, component, entire pipeline, facility, or system. While pipe is often used to illustrate a concept, the concept also applies to any other component.
As a convenience, the terms component and segment will be used most often in this book.
The basic risk concepts also apply to all component material types. While steel is often the focus of discussion, risks associated with all other materials of construction such as plastic, cast iron, concrete, and others, can be efficiently assessed using these same methods.
Owner/Operator references are used interchangeably here, both referring to the decision-makers who control choices in pipeline design, operations, and maintenance.
Types
Pipeline systems are often categorized into types such as transmission, distribution, gathering, offshore, and others, as discussed in . All types are appropriately assessed using the same methodology.
Facility
Facility, station, etc refers to one or more occurrences of, and often a collection of, equipment, piping, instrumentation, and/or appurtenances at a single location, typically where at least some portion is situated above-ground (unburied). Facilities and their subparts are efficiently assessed using the same methodology.
System
The word ‘system’ has many uses in this text. It is used in context such as safety system, control system, management system, procedure system, training system, to indicate a collection of parts or sub-systems. While no set definition exists, a pipeline system normally refers to a large collection of pipeline segments and related stations/facilities.
Hazards and Risk
As detailed in many references, risk is most commonly defined as the probability of an event that causes a loss and the potential magnitude of that loss. Risk changes with changes in either the probability of the event or when the magnitude of the potential loss (the consequences of the event). In common use, the term ‘hazard’ generally refers more to the consequence. It has commonly been said that the hazard associated with a thing or an action is unchangeable but the risk is changeable. Transportation of products by pipeline entails the hazards of the pipeline failing, releasing its contents, and causing damage (in addition to the potential loss of the product itself). The risk associated with this transportation is highly changeable by numerous means.
The most commonly accepted definition of risk is often expressed as a mathematical relationship:
Risk = (event likelihood or probability) × (event consequence)
Risk is best expressed as a measurable quantity such as the expected frequency of certain types of incidents, human fatalities or injuries, or economic loss.
A complete understanding of the risk requires that three questions be answered:
- What can go wrong?
- How likely is it?
- What are the consequences?
The risk assessment approach recommended here measures risk in terms of expected loss (EL).
Expected Loss
A powerful approach to measuring and reporting risk is to combine the range of possible consequence scenarios, and their respective probabilities of occurrence, into a single value representing all potential losses over time. Risk expressed in this fashion is called “expected loss” (EL). It encompasses the classical definition of risk: probability x consequences, but expresses risk as a probability of various potential consequences over time. While the concept of expected loss is not a new concept in risk, especially in financial matters, it is perhaps unfamiliar to many practicing pipeline risk assessment.
EL measurement units present the risk as a loss over time, often based on average expected behavior—dollars per year, for instance, for a particular pipeline system. The value is intended to embody all possible consequences (losses) with their respective likelihoods. This value can be viewed as the amount of potential future loss that has been created by the presence of the facility. Costs are a convenient common denominator for all types of losses, and monetized losses are used in the examples presented here.
An EL analysis captures the high-consequence-extremely-improbable scenarios; the low-consequence-higher-probability scenarios, and all variations between. It does this without overstating the influence of either end of the range of possibilities. The use of probabilities ensures that the influences of certain scenarios are not over- or under-impacting the results. All scenarios are considered with appropriate ‘weight’ for more objective decision support.
Each point on a pipeline produces its own unique set of potential probability-consequence pairings and hence its own expected loss. Theoretically, each possible dollar consequence scenario is multiplied by a probability of occurrence to arrive at a probability-adjusted consequence value (dollars) for each possible consequence scenario. Each point on the pipeline therefore has a distribution of possible failure and consequence scenarios. For practical reasons, a subset of all possible scenarios is used to approximate the distribution of all possible scenarios. This distribution can be expressed as a single point estimate—the expected loss at that location.
The individual expected values for all scenarios at all points along the pipeline can then be combined to produce an expected loss for the entire pipeline (or any portion of any pipeline). Multiple pipelines can have their EL’s combined for a measure of the risk of an entire operation. These values show decision-makers the overall risks and suggest levels of appropriate risk management actions, as will be discussed later.
Annualizing all potential consequences into an EL is a modeling convenience. A $100,000 loss event that occurs once every 10 years is mathematically equivalent to an expected loss of $10,000 per year. However, a uniform loss rate—X dollars of loss each period—is really not the expectation. Only the long-term expected losses over time—the behavior of the population—are thought to be fairly represented by the average annual expectation. This presents some financial planning challenges when one considers that while the expected loss on an annualized basis might be acceptable to an organization, that cost might actually occur in a tremendous one-year event and then no other losses occur for decades—no doubt a much less acceptable situation. Similarly, from a risk-tolerance perspective, a once-every-10-years $100,000 event is usually quite different from an annual $10,000 event. While the mathematical equivalence is valid, other considerations challenge the notion of equivalency.
The phrase ‘expected loss’ carries some emotionalism. It implies that a loss—including injuries, property damages, and perhaps even fatalities—is being forecast as inevitable. This often leads to the question: ‘why not avoid this loss?’ Most can understand that there is no escaping the fact that risks are present. Society embraces risk and even specifies tolerable risk levels through its regulatory and spending habits. EL is just a measure of that risk. Nonetheless, such terms should be used very carefully, if at all, in risk communications to less-technical audiences. This is more fully discussed elsewhere.
In summary, the EL, as it is proposed here, will represent an average rate of loss from the combination of all loss scenarios at a specific location along a pipeline. An $11K/year EL may represent a $100K loss every ten years and an annual $1K loss ($100K / 10 yrs + $1K/yr = $11K/yr). It is therefore a point estimate representing a sometimes wide range of potential consequences. The EL sets the stage for cost/benefit analyses of possible projects and courses of action as is discussed under .
Other Risk Units
The most compelling presentation of risk is perhaps in EL values—that is, in monetary terms. They are easily recognizable and provide context that most will understand.
However, they are not without controversy, as discussed later. When alternate, non-monetary presentations of risk is required, options are available.
As an example, consider a table of risk estimates presented in PRMM, based on a real evaluation of a 700 mile gasoline pipeline. That table presents risk as the expected frequencies of certain consequences. Careful examination of this presentation shows many different aspects of risk being considered:
- Leak count as consequence—2.6 leaks over the project life is, itself, an expression of risk
- Receptor damages as consequence—the frequencies of specific damages are shown, recognizing that not all receptors are exposed to all miles and that only some of the 2.6 leaks will result in measurable damage (ie, some will be too small, be rapidly contained, or otherwise not really cause damage)
- Risk presented by the entire project, ie 700 miles of pipeline operating for 50 years. While accurate as a summary value, this will compare unfavorably to most other facilities (non-pipelines), operating within fenced boundaries and having very limited geographical impact potential.
- Length effects. While 700 miles of pipeline actually exists and does expose receptors along its route, a risk value based on total length can be misleading. Each potential receptor is only exposed to a certain length. In this example, usually 2,500 ft of pipe is conservatively assumed to expose a certain point location. So, a 100 ft creek crossing would be exposed to leak/rupture potential from 2,500 + 100 = 2,600 ft of pipeline.
- Annual risks versus lifetime risk (50 years, in this example) is another presentation choice that is potentially misleading.
Many measures of acceptable risk are linked to fatality, specifically annual individual fatality risk. See related discussions under value of human life and risk acceptability criteria under .
Failure
As detailed in PRMM, answering the question of “what can go wrong?” begins with defining a pipeline failure. A failure implies a loss or consequence.
The unintentional release of pipeline contents is one common definition of a failure. Loss of integrity is a type of pipeline failure also implying leak/rupture. The difference between the two may lie in some scenarios such as tank overfill that may include the first but not the latter.
A more general definition of failure is ‘no longer able to perform its intended function’. The risk of service interruption, includes failure from all scenarios resulting in the pipeline not meeting its delivery requirements (its intended purpose).
The concept of limit state can be useful here. In structural engineering, a limit state is a threshold beyond which a design requirement is no longer satisfied (CSA Z662 Annex O). The structure is said to have failed when it fails to meet its design intent which in turn is an exceedance of a limit state. Typical limit states include ‘ultimate’—corresponding to a rupture or large leak—‘leakage’, and ‘serviceability’.
Complicating the quest for a universal definition of failure in the pipeline industry is the fact that municipal pipeline distribution systems (water, wastewater, natural gas) tolerate some amount of leakage. Failure may be defined as ‘excessive’ leakage in contrast to pipelines where any amount of leakage is considered ‘failure’.
The most used definition of failure in this book will be leak/rupture. The term leak implies that the release of pipeline contents is unintentional, distinguishing a failure from a venting, de-pressuring, blow down, flaring, or other deliberate product release.
Failure mechanism, failure mode, threat
Digging deeper, we often need a definition of ‘failure’ from a material science point of view. Loss of load carrying capacity is a good working definition of material failure. ‘Load carrying capacity’ is also an appropriate definition for resistance, as measured in a risk assessment. In this text, a failure mechanism is the driving force that can cause a failure.
The failure mode is the manner in which the material fails. Common failure mode categories are ductile (yield), brittle (fracture) or a combination, with sub categories of tensile, compressive, and shear. The failure mode is the end state.
The failure mechanism is the process that leads to the failure mode. Failure mechanisms include corrosion, impact, buckling, and cracking.
A failure scenario is the complete sequence of events that, when combined, result in the failure.
A failure manifesting as a leak is included in the ‘load carrying capacity’ definition for most pipeline components, since the load of internal pressure is no longer completely carried once a leak of any size forms.
As detailed in PRMM, the ways in which a pipeline can fail can be categorized according to the behavior of the failure mechanisms relative to the passage of time. When the failure rate tends to vary only with a changing environment, the underlying mechanism is considered time-independent and should exhibit a constant failure rate as long as the environment stays constant. When the failure rate tends to increase with time and is logically linked with an aging effect, the underlying mechanism is time-dependent.
Pipelines tend to avoid early-life leak/rupture failures by commonly used techniques such as manufacture/construction quality control (for example, pipe mill pressure testing, weld inspection) and post-installation pressure test.
Pipelines are often constructed of materials such as steel that has no known degradation mechanism other than corrosion and cracking. By controlling these, a steel pipeline is thought to have an indefinite life-span. See discussion under ‘design life’.
Estimates of pipe strength are essential in risk assessment. This is discussed in .
Probability
PRMM provides a compelling discussion of probability as it applies to pipeline risk management. The most useful definition of probability is a degree of belief. Probability of anything more than ‘systems’ such as a simple game of chance (coin flip, poker, roulette, dice, etc) requires analysis beyond simple examination of historical event rates and their accompanying statistics. It includes engineering judgment, expert opinion, and an understanding of the underlying physical phenomena of the ‘event’ whose probability is being assessed.
Probability of Failure
When we speak of the probability of a pipeline failure, we are expressing our belief regarding the likelihood of an event occurring in a specified future period. Probability is most often expressed as a decimal ≤ 1.0 or a percentage ≤ 100%. Historical data, usually in the form of summary statistics, often partially establishes our degree of belief about future events. Such data is not, however, the only source of our probability estimates.
Probability is often expressed as a forecast of future events. In this application, the expression has the same units as a measured event frequency, i.e. events per time period. When event frequencies are very small, they are, for practical purposes, interchangeable with probabilities: 0.01 failures per year is essentially the same as a 1% probability of one or more failures per year, for purposes here. When event frequencies are larger, a mathematical relationship—reflecting an assumed underlying distribution—is used to convert them into probabilities, ensuring that probabilities are always between 0 and 100%.
The pipeline risk assessment model described here is designed to incorporate all conceivable failure mechanisms that can contribute to probability of failure. Emerging or yet to be identified failure causes are readily added to this framework, once they are understood. The risk assessment is calibrated using appropriate historical incident rates, tempered by knowledge of changing conditions. This results in estimates of failure probabilities that are realistic, utilize all available information appropriately, and match the judgments and intuition of those most knowledgeable about the pipelines.
PoF Triad
In risk assessment, there is the need for a very specific approach to measuring failure probability (PoF). Three factors must be independently measured/estimated in order to fully understand PoF. The reasoning here is that the PoF is being examined in distinct pieces—a reductionist approach—prior to their aggregation into a PoF estimate.
Regardless of the definition of ‘failure’ being used, failure only occurs when there is a failure mechanism and preventive measures are insufficient and there is insufficient resistance to the failure mechanism. All three must occur before failure occurs. This is the genesis of the proper way to measure PoF.
We also recognize that there is more than one potential failure mechanism that can lead to failure. These two basic concepts lead to one of the most important of the essential elements of pipeline risk assessment:
All plausible failure mechanisms must be included in the assessment of PoF. Each failure mechanism must have each of the following three aspects measured or estimated in verifiable and commonly used measurement units:
Exposure (attack)— an exposure[1] is defined as an event which, in the absence of any mitigation, can result in failure, if insufficient resistance exists. The type and unmitigated aggressiveness of every force or process that may precipitate failure is an exposure.
Mitigation (defense)—the type and effectiveness of every mitigation measure designed to block or reduce an exposure.
Resistance—a measure or estimate of the ability of the component to absorb the exposure force without failure, once if the exposure reaches the component.
For each time-dependent failure mechanism, a theoretical remaining life estimate must be produced and expressed in a time unit.

- Exposure, Mitigation, Resistance
An analogous naming convention is ‘attack’, ‘defense’, and ‘survivability’, respectively, for these three terms. The evaluation of these three elements for each threat to each pipeline component within a segment results in a PoF estimate for that segment.
Measuring exposure—attack—independently generates knowledge of the ‘area of opportunity’ or the aggressiveness of the attacking mechanism. Then, the separate estimate of mitigation—defense—effectiveness shows how much of that exposure should be prevented from reaching the component being assessed. Finally, the resistance estimate shows how often the component will failure—it’s survivability, if the exposure actually reaches the component.
This three-part assessment also helps with model validation and most importantly, with risk management. Fully understanding the exposure level, independent of the mitigation and system’s ability to resist the failure mechanism, puts the whole risk picture into clearer perspective. Then, the roles of mitigation and system vulnerability are both known independently and also in regards to how they interact with the exposure. Armed with these three aspects of risk, the manager is better able to direct resources appropriately.
In risk management, where decision-makers contemplate possible additional mitigation measures, additional resistance, or even a re-location of the component (often the only way to change the exposure), this knowledge of the three key factors will be critical.
The simple equation for PoF shows two ways to reduce PoF—either increase mitigation—blocking the failure mechanism—or increase resistance—making the structure stronger to absorb more forces. This independent evaluation of exposure and mitigation also captures the idea that “no exposure” will inherently have less risk than “mitigated exposure,” regardless of the robustness of the mitigation measures. As well, the notion that a very stout component is intrinsically safer, is captured.
In estimating future exposures, it is important to first list all potentially damaging mechanisms that could occur at the subject location. Then, numerical exposure values should be assigned to each.
Pre-dismissal of exposures should be avoided—the risk assessment will show, via low PoF values, where threats are insignificant. It will also serve as documentation that all threats are considered.
For example, falling trees, walls, utility poles, etc are often overlooked in a pipeline risk assessment. This is an understandable result of discounting such threats via an assumption that a buried component is often virtually immune from such damage. While this is normally an appropriate assumption, the risk assessment errs when such threat dismissal occurs without due process. Pre-screening of threats as insignificant weakens the assessment. The independent evaluation of exposure and mitigation ensures that, should depth of cover condition change, ie, the component is relocated above grade; or a particular falling object indeed can penetrate to the buried pipeline; are not lost to the assessment.
- Swiss Cheese Analogy: More Slices and/or
Fewer Holes Reduces Event Probability
Units of Measurement
Units of measurement should always be transparent and intuitive. In one common application of the exposure, mitigation, resistance triad, units are as follows. Each exposure is measured in one of two ways—either in units of ‘events per time and distance’, ie events/mile-year, events/km-year, etc, or in units of degradation—metal loss or crack growth rates, ie mpy, mm per year, etc. An ‘event’ is an occurrence that, in the absence of mitigation and resistance, will result in a failure. To estimate exposure, we envision the component completely unprotected and highly vulnerable to failure (think ‘tin can’ wall thickness). So, an excavator working over a buried pipeline is an event. This is counted as an event regardless of depth of burial, use of one-call, signs/markers, patrol, etc.
Units of measure, beginning with exposure estimates and carried through until final risk estimates, include time and distance. As time periods and distances increase, so too does risk. This is intuitive—more miles and more years of operation logically suggests that more things can go wrong—a greater area of opportunity. The probability (future frequency) of a corrosion leak at any location may only be 0.001 leaks per mile-year, but with hundreds of miles and/or decades of operation, the probability grows to almost 100% of at least one corrosion leak somewhere along the route within the time period.
Mitigation and Resistance are each measured in units of % representing ‘fraction of damage or failure scenarios avoided’. A mitigation effectiveness of 90% means that 9 out of the next 10 exposures will not result in damage. Resistance of 60% means that 40% of the next damage scenarios will result in failure, 60% will not.
For assessing PoF from time-independent failure mechanisms—those that appear random and do not worsen over time—the top level equation can be as simple as:
PoF_time-independent = exposure x (1–mitigation) x (1–resistance)
With the above example units of measurement, PoF values emerge in intuitive and common units of ‘events per time and distance’, ie events/mile-year, events/km-year, etc.
A risk assessment measures the aggressiveness of potential failure mechanisms and effectiveness of offsetting mitigation measures and design features. The interplay between aggressiveness of failure mechanisms and mitigation/resistance effectiveness yields failure potential estimates.
Damage Versus Failure
Another benefit emerges from the exposure/mitigation/resistance triad. Probability of Damage—damage without immediate failure—can be measured independently from PoF. Using the first two terms without the third—exposure and mitigation, but not resistance—yields the probability of damage.
Probability of Damage (PoD) = f (exposure, mitigation)
Probability of Failure (PoF) = f (PoD, resistance)
Damage results from an exposure that reaches the component but does not cause failure.
Damage that does not result in immediate failure may cause reduced resistance against future failure mechanisms. Some damage may also trigger or accelerate a time-dependent failure mechanism. Calculation of both PoD and PoF values creates better understanding of their respective risk contributions and provides the ability to better respond with risk management strategies.
From TTF to PoF
Estimation of PoF for time-dependent failure mechanisms requires an intermediate calculation of time-to-failure (TTF).
PoF_time-dependent = f(TTF_time-dependent)
TTF_time-dependent = resistance / [exposure x (1–mitigation)]
The relationship between an estimated TTF and the probability of failure can be complex and warrants special discussion. The PoF is normally calculated as the chance of one or more failures in a given time period. In the case of time-dependent failure mechanisms, TTF estimates are first produced. The associated failure probability assumes that at least one point in the segment is experiencing the estimated degradation rate and no point is experiencing a more aggressive degradation rate.
The TTF estimate is expressed in time units and is calculated by using the estimated pipe wall degradation rate and the theoretical pipe wall thickness and strength, as was shown above. In order to combine the TTF with PoF from all other failure mechanisms, it is necessary to express the time-dependent failure potential as PoF. This requires a conversion of TTF to PoF. It is initially tempting to use the reciprocal of this time-to-failure number as a leak rate—failures per time period. For instance, 20 years to failure implies a failure rate of once every twenty years perhaps leading to the assumption of 0.05 failures per year. However, a logical examination of the TTF estimate shows that it is not really predicting a uniform failure rate. The estimate is actually predicting a failure rate of ~0 for 19+ years and then a nearly 100% chance of failure in the final year. Nonetheless, use of a uniform failure rate is conservative and helps overcome potential difficulties in expressing degradation rate in probabilistic terms. This is discussed later.
An exponential relationship can be used to show the relationship between PoF in year one and failure rate. Using the conservative relationship of [failure frequency] = 1/TTF, a possible relationship to use at least in the early stages of the risk assessment is:
PoF = 1-EXP(-1/ TTF)
Where
PoF = probability of failure in year one
TTF = time to failure
This relationship ensures that PoF never exceeds 1.0 (100%). As noted, this does not really reflect the belief that PoF’s are very low in the first years and reach high levels only in the very last years of the TTF period. The use of a factor in the denominator will shift the curve so that PoF values are more representative of this belief. A Poisson relationship or Weibull function can also better show this, as can a relationship of the form PoF = 1 / (fctr x TTF2) with a logic trap to prevent PoF from exceeding 100%. The relationship that best reflects real-world PoF for a particular assessment is difficult, if not impossible to determine. Therefore, the recommendation is to choose a relationship that seems to best represent the peculiarities of the particular assessment, chiefly the uncertainty surrounding key variables and confidence of results. The relationship can then be modified as the model is tuned or calibrated towards what is believed to be a representative failure distribution.
The relationship between TTF and PoF includes segment length as a consideration. PoF logically increases as segment length increases since a longer length logically means more opportunity for active failure mechanisms, more uncertainty about variables, and more opportunities for deviation from estimated degradation rates. This is discussed more fully in a later section. See also and for a continuation of the TTF to PoF discussion.
Age as a Risk Variable
Age-based or historical leak-rate based estimates are readily generated when data is available and can be useful for quick or initial risk estimates. Statistical examination of historical leak and break data provides insights into behaviors of populations of components over long periods of time. When such populations are similar in characteristics and environment to a collection of components being assessed, such statistical analyses have some predictive capability. This is often an approach for general predicting of leaks in larger distribution systems.
While age is often used as a gross indicator of leak/break likelihood, especially on distribution systems where some amount of leakage is tolerable and is tracked over time, neither age nor historical leak rates indicate the presence of degradation mechanisms at any specific location.
Age is rarely a direct indicator of risk. It does, however, suggest indirect risk indications related to issues such as era of manufacture/construction and extent of degradation where time-dependent mechanisms are active. Location-specific failure probability is best estimated by assessment of relevant exposure, mitigation, and resistance characteristics at that location and system-wide deterioration is best estimated by accumulating all location-specific damage potentials. The more useful risk assessment will evaluate the actual mechanisms possibly at work at any location and then supplement this with population statistical data.
The Test of Time Estimation of Exposure
In the absence of more compelling evidence, an appropriate starting point for the exposure estimation may be the fact that a component or collection of components has not failed after x years in service. This involves the notion of having ‘withstood the test of time’. A component having survived a threat, especially for many years, is evidence of the exposure level. This is best illustrated by example. If 10 miles of pipe, across an area with landslide potential, has been in place for 30 years without experiencing any landslide effects, then a failure tomorrow perhaps suggests an event rate of 1/(10 miles x 30 years) = 1/300 mile years.
This simple estimate will not address the conservatism level. The estimator will still need to determine if this value represents more of a P50 estimate or perhaps a more conservative P90+ value.
In some cases, the evidence is actually of the mitigated exposure level. That is, the component has survived the threat, but perhaps at least partially due to the presence of effective mitigation. This makes the separation of exposure more challenging.
Despite the lack of complete clarity, this ‘test of time’ rationale can be a legitimate part of an exposure estimate.
Time-dependent vs independent
Risk assessment begins with understanding potential failure scenarios. While both types of failure mechanism— time-dependent vs time independent—can be involved in a failure scenario, it might not be immediately obvious how to treat the combined effect scenario in a risk assessment. Fortunately, in a good risk assessment methodology, the contributions from each type are automatically and intuitively considered. All exposures should be included and any degradation effects should be factored into the ability to resist all corresponding stress levels.
There should not be any confusion regarding when a time-dependent mechanism is involved in a failure scenario. Consider an investigation of a failed component. The dominant failure mechanism type can normally be determined by simply answering the question; ‘why did it fail today and not yesterday?’. If the component performed without failure for some previous period of time, and was not subjected to new stresses, then logically, some degradation occurred to cause the failure ‘today’ and not ‘yesterday’. Degradation indicates a time-dependent failure mechanism at work.
In other words, unless the component has never before been subjected to the failure stress, the fact that it fails ‘today’ versus ‘yesterday’ implies a time factor—ie, some time-dependent mechanism was active and weakened the pipe since the previous application of that stress level. If the stress level is simply ‘recently new’, ie, hasn’t been experienced lately, then degradation is still likely the dominant mechanism. Reductions in resistance (effective wall thickness, as detailed in ) hasten time to failure and increase failure potential upon application of stress.
Even if the failure scenario does not involve a typical degradation process, but a time element is nonetheless inferred, the assessment can efficiently include it as a time-dependent failure mechanisms. Consider a leak at a threaded connection, where no corrosion or cracking is found. If the connection was leak free at one time and no new stresses were applied, the loosening of the connection can still be efficiently modeled as a degradation mechanism (see discussion in ).
Probabilistic Degradation Rates
Degradation rates are among the most difficult aspects of risk assessment to accurately estimate. Rates are highly variable over time and even in very localized areas. For instance, an aggressive pitting corrosion rate of 50 mpy can commonly exist within millimeter fractions where virtually zero degradation is occurring. It can also reach 50 mpy for some period of time and then become inactive for long periods. Our understanding of many of the even more common mechanisms, requires us to model a degree of randomness in the occurrence locations and possible rates. We use probabilities to recognize this randomness.
It would be convenient to model a 10% chance of a 50 mpy degradation as a 5 mpy degradation. But these two values have different implications. If it takes 50 mils of wall loss to cause a leak in a component, then a 10% chance of 50 mpy suggests that there is a 10% chance of a leak every year. However, a 5mpy would not result in a leak until 10 years have passed.
Both scenarios can be accommodated in the assessment by appropriate treatment of the conversion from TTF to PoF.
Capturing “Early Years’ Immunity”
Using the basic relationship employing some form of PoF as a function of 1/TTF described above can result in excessive conservatism. Consider a very new, thick-walled component whose early years are virtually unthreatened by any plausible degradation rate. Even a 100 mpy degradation rate should not threaten the year one (or even year three) integrity of a 0.400” pipe. New components, those with heavy wall thicknesses, those in very benign environments, those with very accurate and recent inspections, etc, all have some amount of immunity to failure from slow-acting degradations, at least in the early years of exposure.
However, this immunity is uncertain and temporary for most. Using a relationship such as lognormal or Weibull to show failures only in later years of the TTF estimates risks missing the often small but real chance of very aggressive degradations or unexpectedly thin component wall thickness. Recall the example where a 10% chance of 50 mpy can suggest a real chance of leak in the next year.
A two-part relationship between PoF and TTF solves this issue and is often warranted. By adding an extreme value analysis to the basic TTF analysis, early year TTF’s can be dismissed in certain scenarios.
The extreme value analysis requires the creation of a variable called TTF99. TTF99 is the minimum plausible TTF—a value that is lower than any actual value will be, 99-99.9% of the time—for example, the subject matter expert (SME) is 99+% confident that the TTF cannot be worse than this value, even considering a highly improbable coincidence of very unlikely factors. Establishing this extreme value can be done by taking the best pipe wall thickness estimate and degrading that by the highest plausible unmitigated corrosion/cracking rate. Alternatively, statistical methods can be used to establish the 99% confidence level, when data is available.
Using both TTF and TTF99 creates four scenarios, each with its own relationship to PoF. These scenarios involving TTF (best estimate of current time to failure) and TTF99 (lowest plausible TTF) are examined to arrive at an estimate of PoF:
The scenarios are summarized as follows, assuming the time of interest is 1 year[2]—a year one PoF is sought (what is the probability of failure in the next 12 months?) Note that TTF is the best estimate—ie, thought to be the most likely value—and TTF99 is the very conservative estimate:
If TTF99 less than 1 year AND TTF less than 1 year, then PoF = 99+%
If TTF99 less than 1 year AND TTF greater than 1 year, then use constant failure rate, basically the reciprocal of the TTF, to estimate PoF
If TTF99 greater than 1 year AND TTF greater than 1 year, then ‘use more optimistic relationship’ (such as lognormal(TTF99)) to estimate PoF from TTF
Scenario 1. If it is plausible to have a year one failure AND the best estimate of TTF is also less than one year.
If TTF is less than 1 year, then failure during year one is likely and PoF is assigned 99%. Pipeline segments are conservatively assigned this value when little information is available and a very short TTF cannot be ruled out.
Scenario 2. If it is plausible to have a year one failure AND the best estimate of TTF is greater than one year.
If TTF > 1 year but TTF99 is < 1 year, then we believe year one failure is unlikely but cannot be ruled out. PoF needs to reflect the probabilistic mpy embedded in the TTF estimate. Probabilistic mpy means that, for instance, a 10 mpy includes a scenario of ‘10% chance of a 100mpy degradation rate’. To ensure that the PoF estimate captures the small chance of a 100mpy rate actually occurring next year, a constant and conservative failure probability—PoF = 1/TTF—is associated with the 10mpy. Pipeline segments will fall into this analysis category when very short TTF is possible but the most probable TTF values exceed the year for which PoF is being assessed.
Scenario 3. If it is not plausible to have a year one failure, even using extreme values
If TTF99 > 1 year then we believe that, even under worst case scenarios, failure in year one will not happen. TTF99, rather than the actual TTF governs PoF. The relationship between TTF99 and PoF can be assumed to be lognormal or Weibull or some other distribution, with parameters selected from actual data or from judgments as to distribution shapes that are reflective of the degradation mechanism being modeled. Very low year one PoF’s will emerge. A new pipeline even with a high plausible degradation rate, will have a PoF governed by this analysis (for example, a 0.250” thick wall will not experience a through-wall leak in year one even with a 100 mpy pitting corrosion rate).
Scenario 4. TTF is very high
Consider yet one more scenario: when TTF is very high, it may override TTF99 for PoF. This is again logical. Even if TTF99 is close to one—PoF approaching 100%—TTF might indicate that the segment’s actual TTF (best estimate) is so far from this low probability event, that it should govern the final PoF estimate. A pipeline segment with very high confidence in both current pipe wall and a low possible degradation rate will have a high TTF. Even if a short TTF is theoretically possible—as shown by TTF99—a sufficiently high confidence in the estimated TTF can govern. Such high confidence is often obtained via repeated, robust inspections and when the degradation rate required for early failure would be an extreme aberration.
Scenarios 3 & 4 are appropriate only when TTF99 > 1 year or can be dismissed as implausible—virtually no chance of failure in year one. Then the worst case between scenario 3 and scenario 4 governs.
See the figure below showing the two-part curve, where PoF is on the vertical axis and time is on the horizontal axis.
- TTF to PoF
Again, the rationale for use of a two-part curve is intuitive. A new pipeline has little chance of corrosion leak in the early years, even when aggressive corrosion rates are possible. Therefore, even if a worst case TTF is 5 years, the new pipeline enjoys a very low PoF in year one. Use of the simple PoF = 1/TTF does not show this. It yields a 20% chance of failure in year one, requiring an extreme value analysis to demonstrate that this is over-conservative.
Alternatively, when conditions or uncertainty suggest a plausible near-term failure due to degradation, the use of TTF as a direct mean-time-to-failure link to PoF is more appropriate.
Example Application of PoF Triad
As an example (part of full example shown in Chap 1.4) of applying the PoF triad to a time-independent and a time-dependent failure mechanism, consider the following. For failure potential from third party excavations, the following inputs are identified for a hypothetical pipeline segment:
- Exposure (unmitigated) is estimated to be 3 excavation events per mile-year. A previous column discusses how these estimates can be made.
- Using a mitigation effectiveness analysis, SME’s estimate that 1 in 50 of these exposures will not be successfully kept away from the pipeline by the existing mitigation measures. This results in an overall mitigation effectiveness estimate of 98%.
- Of the exposures that result in contact with the pipe, despite mitigations, SME’s perform an analysis to estimate that 1 in 4 will result in failure, not just damage. This estimate includes the possible presence of weaknesses due to threat interaction and/or manufacturing and construction issues. So, the pipeline in this area is judged to be 75% resistive to failure from these excavation events, once contact occurs.
These inputs result in the following assessment:
(3 excavation events per mile-year) x (1–98% mitigated) x (1–75% resistive) = 0.015 failures per mile-year
This suggests an excavation-related failure about every 67 years along this mile of pipeline.
This is a very important estimate. It provides context for decision-makers. When subsequently coupled with consequence potential, it paints a valuable picture of this aspect of risk.
Note that a useful intermediate calculation, probability of damage (but not failure) also emerges from this assessment:
(3 excavation events per mile-year) x (1–98% mitigated) =
0.06 damage events/mile-year
This suggests excavation-related damage occurring about once every 17 years.
This damage estimate can be verified by future inspections. The frequency of new top-side dents or gouges, as detected by an ILI, may yield an actual damage rate from excavation activity. Differences between the actual and the estimate can be explored: for example, if the estimate was too high, was the exposure overestimated, mitigation underestimated, or both? This is a valuable learning opportunity.
This same approach is used for other time-independent failure mechanisms and for all portions of the pipeline.
For assessment of PoF for time-dependent failure mechanisms—those involving degradation of materials—the previous algorithms are slightly modified to yield a time-to-failure (TTF) value as an intermediate calculation in route to PoF.
PoF_time-dependent = f(TTF_time-dependent)
TTF_time-dependent = resistance / [exposure x (1–mitigation)]
As an example, SME’s have determined that, at certain locations along a pipeline, soil corrosivity creates a 5 mpy external corrosion exposure (unmitigated). Examination of coating and cathodic protection effectiveness leads SME’s to assign a mitigation effectiveness of 90% . Recent inspections, adjusted for uncertainty, result in a pipe wall thickness estimate of 0.220” (resistance). This includes allowances for possible weaknesses or susceptibilities, modeled as equivalent to a thinning of the component’s wall thickness.
Use of these inputs in the PoF assessment is shown below:
TTF = 220 mils / [5 mpy x (1–90%)] = 440 years.
Next, a relationship between TTF and PoF for the future period of interest, is chosen. For example, a simple and conservative relationship yields the following.
PoF = 1 / TTF = [5 mpy x (1–90%)] / 220 mils = 0.11% PoF.
In this example, an estimate for PoF from the two failure mechanisms examined—excavator damage and external corrosion—can be approximated by 1.5% + 0.1% = 1.6% per mile-year. If risk management processes deem this to be an actionable level of risk, then the exposure-mitigation-resistance details lead the way to risk reduction opportunities.
AND gates OR gates
Combining variables often involves the choice of multiplication versus addition. Each has advantages. Multiplication allows variables to independently have a great impact on a result. Addition better illustrates the layering of adverse conditions or mitigations. In formal probability calculations, multiplication usually represents the and operation.
Probabilistic math is used to combine variables to represent real-world phenomena. This means capturing various relationships among variables using “OR” & “AND” “gates.” This OR/AND terminology is borrowed from flowchart techniques. The use of OR/AND math in pipeline risk assessment modeling represents a dramatic improvement over most older methods that used simple summations, averages, maximums, and other summary mathematics or statistics that often masked critical information.
OR Gates
OR gates imply independent events that can be added. The OR function calculates the probability that any of the input events will occur. If there are i input events each assigned with a probability of occurrence, Pi, then the probability that any of the i events occurring is:
P = 1 – [(1-P1) * (1-P2) * (1-P3) *…*(1-Pi)]
This is the same as 1 – (the probability that none of the i events occur)
OR gates are extremely useful in that they capture, in a ‘real-world’ way, both the effects of single, large contributors as well as the accumulation of lesser contributors. With an OR gate, there is no ‘averaging away’ effect. In a pipeline risk assessment, this type of math better reflects reality since it uses probability theory of accumulating impacts to:
- Avoid masking some influences;
- Captures single, large impacts as well as accumulation of lesser effects;
- Shows diminishing returns;
- Avoids the need to have pre-set, pre-balanced list of variables;
- Provides an easy way to add new variables; and
- Avoids the need for re-balancing when new info arrives.
When summarizing the PoF of a component, the central question of ‘what is the PoF?’ is actually asking ‘what is the PoF from either PoF1 or PoF2 or PoF3 or…?’ where 1, 2, 3, etc represent all the ways in which the component can fail, ie, external corrosion, outside forces, human error, etc. The overall PoF can therefore be relatively high if any of the sub-PoF’s are high or if the accumulation of small sub-PoF’s adds up to something relatively high.
This is consistent with real-world risk. The question of overall PoF does NOT presume that all PoF’s must ‘fire’ before the overall PoF is realized—it only takes one. A segment survives only if failure does not occur via any of the failure mechanisms. So, the probability of surviving is (third-party damage survival) AND (corrosion survival) AND (design survival) AND (incorrect operations survival). Replacing the ANDs with multiplication signs provides the relationship for probability of survival. Subtracting this resulting product of multiplication from one (1.0) gives the probability of failure.
OR Gate Example:
To estimate the overall probability of failure based on the individual probabilities of failure for stress corrosion cracking (SCC), external corrosion (EC) and internal corrosion (IC), the following formula can be used.
Pfailure = OR[PSCC, PEC, PIC] = PSCC OR PEC OR PIC
= OR [1.05E-06, 7.99E-05, 3.08E-08] (using some sample values)
= 1- [(1-1.05E-06)*(1-7.99E-05)*(1-3.08E-08)]
= 8.10E-05
The OR gate is also used for calculating the overall mitigation effectiveness from several independent mitigation measures. This function captures the idea that probability (or mitigation effectiveness) rises due to the effect of either a single factor with a high influence or the accumulation of factors with lesser influences (or any combination).
Mitigation % = M1 OR M2 OR M3…..
= 1–[(1-M1) * (1-M2) * (1-M3) *…*(1-Mi)]
= 1 – [(1-0.40) * (1-0.10) * (1-0.05)]
= 49%
or examining this from a different perspective,
Mitigation % = 1 – [remaining threat]
Where
[remaining threat] = [(remnant from M1) AND (remnant from M2) AND (remnant from M3)] …The OR gate math assumes independence among the values being combined. While not always precisely correct, the advantages of assuming independence as a modeling convenience will generally outweigh any loss in accuracy.
The independence is often difficult to visualize, especially when assigning effectiveness values to mitigation. For instance, the effectiveness of a line locating program (see ) should be judged by estimating the fraction of future damaging events that are avoided by the line locating program ONLY—ie, imagining no depth of cover (but still out of sight), no signs, no markers, no public education, no patrol, etc.
AND Gates
AND gates imply “dependent” measures that should be combined by multiplication. With an AND gate, any sub-variable can alone have a dramatic influence. This is captured by multiplying all sub-variables together. In measuring mitigation, when all things have to happen in concert in order to achieve the mitigation benefit, a multiplication is used—an AND gate instead of OR gate. This implies a dependent relationship rather than the independent relationship that is implied by the OR gate.
AND Gate Example[3]:
The modeler is assessing a variable called “CP Effectiveness” (cathodic protection effectiveness) where confidence in all sub-variables is necessary in order to be confident of the CP Effectiveness—[good pipe-to-soil voltage readings] AND [readings close to segment of interest] AND [readings are recent] AND [proper consideration of IR was done] AND [low chance of interference] AND [low chance of shielding]… etc. If any sub-variable is not satisfactory, then overall confidence in CP effectiveness is dramatically reduced. This is captured by multiplying the sub-variables.
When the modeler wishes the contribution from each variable to be slight, the range for each contributor is kept fairly tight. Note that four things done pretty well, say 80% effective each, result in a combined effectiveness of only ~40% (0.8 x 0.8 x 0.8 x 0.8) using straight multiplication.
Nuances of Exposure, Mitigation, Resistance
In most instances, the categorization of each piece of information into one of these three is obvious—most variables are clearly telling us more about either the exposure, the mitigation, or the resistance. To some, the surrogate terms of ‘attack’, ‘defense’, and ‘survivability’ add clarity. Focusing on PoF only, here are some examples to help solidify the categorization:
The obvious
Variables can inform multiple aspects of a risk assessment, but usually, one category is more directly influenced by the variable. Soil corrosivity, excavator activity, vehicle traffic, seismic activity, flood potential, surge potential, landslides, are examples of phenomena that obviously inform exposure estimates. They tell us about the frequency and severity of ‘attack’.
Coatings, depth of cover, training, procedures, maintenance pigging, are examples that, to most, are clearly defenses against damage. They are best modeled as mitigation measures. When the same mitigation measure protects against multiple exposures, it is valid to include their benefits in all relevant threats. For instance, depth of cover protects against impacts, excavations, and some types of geohazards.
Metal loss, cracks, lack of toughness, SMYS, wall thickness are examples of variables that inform resistance estimates.
The less obvious
Casings: a casing (see full discussion later) sometimes causes confusion when one focuses on corrosion problems potentially caused by their presence and loses sight of the original intent. Casings are usually installed as mitigation to external forces. They also serve other purposes such as consequence reduction, but they are mostly intended to protect a carrier pipe. Their role in a risk assessment should show their benefit in preventing excavation damages, traffic loads, and others. However, a casing’s role as a corrosion issue should also be acknowledged. A casing changes the external corrosivity exposure (electrolyte in the annular space and possible electrical connections) and the ability to apply CP. Both should appear in the risk assessment. So, the presence of a casing is captured as a mitigation against external forces, an influencing factor for external corrosion exposure and mitigation (shielding of CP), and perhaps also in CoF.
ILI: some may initially think protection occurs with the activity of performing an ILI. Actually, as with other inspections and tests, neither the exposure nor the mitigation nor the resistance has changed because of the ILI. What has changed is the evidence—knowledge of resistance has increased, often dramatically, and uncertainty regarding exposure and mitigation is different because of the ILI. For instance, at every identified location of external metal loss on a buried pipeline, we know that both coating and CP have failed, so mitigation is reduced, perhaps to zero, pending repairs. We usually do not know when mitigation failed, so might not be able to directly modify exposure (mpy rate of corrosion) estimates without more information. So, the role of ILI is first in resistance estimates and secondarily in exposure and mitigation estimates. Of course, action prompted by the ILI will often change exposure and mitigation.
Laminations, wrinkle bends, and arc burns are resistance issues. They are not ‘attacking’ the pipe, nor do they contribute to or impair mitigation. They represent potential weaknesses, sometimes only under the influence of exacerbating factors such as certain loadings (for example, causing stress concentrations) or environment (for example, sources of H2 that aggravate laminations and facilitate blistering or HIC). They are best modeled as potential losses of strength—ie, as resistance issues.
Additional Gray Areas
When information can logically be categorized in more than one place, the choice is usually a matter of preference and does not weaken the assessment. Choices of the role of the information usually leads to the same mathematical result. So, the choice is often not critical to the PoF estimate. Some examples of such choices are discussed below.
Note that while several ‘gray area’ examples are discussed here, the vast majority of information is very easily and intuitively categorized into its appropriate place in the risk assessment. The reader should not leave this section believing that any more than a very few scenarios have some ambiguity regarding modeling choices.
What Constitutes ‘Exposure’? Normalizing Exposure and Resistance
Since PoF measures ‘failure’, the definition of exposure is linked to that of failure. An exposure must be able to cause a ‘failure’ if it is truly an exposure. If failure is defined, for instance, as ‘permanent deformation’, then exposures that could cause that to a pipe component, are counted. If failure is defined as ‘loss of integrity’, events causing immediate leaks/ruptures are obviously needed, but so are damage-only events. In fact, most assessments will appropriately include all events that can at least cause damage. Even when immediate failure from the event is not possible, the damage may contribute to a subsequent failure and is therefore of interest to the measurement of PoF.
Should excavation by hand shovel be considered an exposure for a steel pipeline? Yes, if any structural damage at all is possible—even a scratch. This scratch may directly reduce resistance to some future failure mechanism, although it is often an immeasurably small reduction. The scratch can also theoretically occur exactly at some point of pre-existing weakness, resulting in immediate failure.
Excavation by a plastic shovel probably cannot cause even minor scratch damage to a steel pipeline and need not be counted as an exposure. However, the indirect role of a ‘hand shovel contact’ event must be considered. Both the metal and plastic shovel should be counted as causes of damage to corrosion coating systems. Since coating is a mitigation measure, damage to a coating reduces mitigation effectiveness. This is different from an exposure. If concrete coating or rock shielding is present, it provides mitigation against coating damage.
Vandalism can be considered a type of sabotage. However, defacing (for example, spray painting) or minor theft of materials are actions that are readily resisted by most pipeline components. If the sabotage exposure count includes vandalism events, then resistance estimates must consider the fraction of exposure events that are vandalism spray-paint-type events and therefore 100% resisted by the component.
Exposure and resistance estimates for risk assessments of failure = ‘service interruption’ similarly revolve around the definition of failure. Just as with leak/rupture assessments, a probability of damage also emerges from the service interruption assessment. See full discussion in .
This nuance—what constitutes an ‘exposure’—revolves around failure definition and also the choice of baseline resistance, which warrants further discussion.
Continuous Exposure
Unlike the discreet events measured in other time-independent failure mechanisms, some aspects of failure potential involve continuous exposure—ie, there is a constant force present that can fail the component, rather than an intermittent threatening force. A common example is a component connected to a pressure source that can create pressure in excess of the component’s capability to withstand. This is not an uncommon scenario for pipelines since they are routinely connected to wells, pumps, compressors, foreign pipelines, and other pressure sources that, at times, can be too high for the connected components. The potentially damaging pressure source does not cause damage because control and safety systems protect downstream components.
Even desirable or normal loads can be viewed as continuous exposures. Any amount of internal pressure becomes a damage potential as resistance decreases; any span can be too long for a pipe with no resistance to gravity forces (weight). Pressure as a constant exposure is generally only mitigated when excessive, since some pressure is a desirable part of operability. Intended pressure does not lead to failure only because resistance prevents it. Gravity as a constant exposure is mitigated by having uniform support and, if mitigation fails, is resisted by the bending and shear capacities of the ‘structure’.
Measuring this type of exposure appropriately in a risk assessment model requires the correct coupling of the continuous exposure with a corresponding mitigation effectiveness. A high-demand or continuous exposure requires mitigation with very high reliability. The modeling issue with continuous exposure is the choice of time units in which to express the rate of exposure. How do you express ‘continuous’? In units of events per year? Or per day? Or even per minute? Since ‘continuous’ means an infinite number of occurrences per unit time, it is difficult to capture numerically.
For purposes of modeling, any unit can be chosen, so long as the mitigation is calibrated to the same unit. The continuous exposure can be counted as one event per day, once per hour, once per minute, once per second, or even less. Any of these is appropriate as long as the corresponding mitigation—for example, the regulator or relief valve effectiveness—is measured in the same per day, per hour, per minute, etc units of reliability. For instance, choosing units of one event per second to represent continuous exposure from a high pressure connecting pipeline requires that the pressure regulating valve’s reliability be expressed in the same units—ie, failure rate for each second in service. If one exposure event per day is chosen to represent continuous, then the regulator’s reliability must also be expressed in the context of how many days between failures of such regulators to prevent overpressure.
In some estimates, the use of ‘probability of failure on demand’ estimate for a safety or control device will automatically make the exposure and mitigation units of measure equivalent. However, in the above example, the regulator’s ‘demand’ is continuous—its function is being continuously demanded—requiring attention to the units of each. Therefore, the regulator’s mitigation effectiveness—its reliability—must be expressed in similar units—failures per day, per hour, per minute, per second, or even smaller. Then, when exposure is multiplied by (1 – mitigation), the resulting PoD is appropriate.
Spans
An interesting nuance arises in a risk assessment involving spans. A span makes the component susceptible to the effects of gravity. While the exposure of ‘gravity’ has always been present, its role goes unnoticed in a fully supported pipeline segment. If an event can result in loss of support, but not failure, how is it to be modeled? Has the span created an exposure—ie, a new attack? Or is it causing the loss of some resistance to an attack (gravity) that has always been there?
This warrants some discussion. The frequencies of exposures should include all events that can damage the theoretical component. Technically, only events that cause excessive stress cause damage. So, only spans of a certain length, given pipe and contents weight, buoyancy, lateral forces, vibration potential, other stresses, etc, are events that potentially result in damage. Rigid pipe and mechanical couplers generally have less resistance to spans compared to flexible, welded systems.
The full solution is to discriminate among events that cause varying amounts of span length to the component. This involves an initial measurement of the PoF of the supported pipe in terms of continuous exposure to gravity which is fully mitigated by the uniform support, with resistance available but uninvolved so long as the mitigation is in place. PoF from gravity effects would logically be nearly 0% as long as the support remains. If any portion of the length becomes unsupported, then the mitigation against the force of gravity is zero and damage is theoretically possible at that location. Realistically, only spans of a certain minimum length can result in damage for most pipeline components. Minor spans will typically have no effect on either damage or failure potential. A few inches of span rarely causes damage to any component.
As span length increases, damages become possible and then eventually failure occurs. Assigning probability estimates to each possible span length will be challenging in many real-world applications. Furthermore, determining minimum span lengths for various damage and failure scenarios involves structural calculations that are redundant to the resistance estimations.
Therefore, a modeling choice emerges. An exposure count may include all span-producing events or only those events generating potentially damaging span lengths. The former results in an over-estimation of damage producing events, since even the insignificant spans are counted. The latter requires a pre-determination of damaging span lengths. This is not a trivial exercise since the following considerations are important: material characteristics, dimensions, contents, internal pressure, lateral forces, etc.
A simplification may be appropriate for some risk assessments. From a modeling perspective, it may be simpler to count any span-producing event as an exposure rather than pre-determine what span length is critical for each set of conditions. With a conservative assumption that any span length can cause damage, the inaccuracy that is generated is the production of a PoD that is conservatively overstated. Components that are unharmed by loss of support will show low PoF after resistance estimates are applied. However, they may show inappropriate PoD levels due to the over-counting of exposures (ie, including exposure events that can’t cause damage). Perhaps this is tolerable in exchange for modeling convenience.
As an example of this simplification: consider a soil erosion event creating a one foot span as compared to a continuously supported 12″ steel pipeline. If the erosion event is counted as an exposure (an ‘attack’) with a frequency of 0.1/year and no mitigation is provided, the model reports a 0.1 frequency of damaging events, even though damage is realistically not occurring with only a one foot span. The PoF will not be impacted by this inaccuracy in the intermediate PoD estimate. In the absence of severe weakness, the resistance prevents failure virtually 100% of the time. The resistance of the 12″ steel pipe shows that essentially none of the 0.1 spans per year will result in failure.
Longer span lengths would generally require more resistance. Since some resistance is now being used to resist gravity, some load carrying capacity may no longer be available to resist other loads. So, a third modeling approach may state a definition of exposure as only events that can produce at least, say a 20 ft span (or whatever the calculation determines is a potentially damaging span, under a set of assumed component characteristics). A related solution is to create categories of span-producing events based on the length of span potentially produced. Each is assigned an exposure frequency. Some will exceed the point where damage is possible and some will be insignificant, from a structural damage perspective. A version of this approach is to begin with an exposure frequency that captures all span-creating events and then assign fractions to create categories of longer-span events. For example, 0.3 span-creating events per mile-year are expected; 55% of those will produce spans less than 3 ft in width; 40% produce spans greater than 3 ft but less than 10 ft; and 5% produce spans greater than 10 ft in width.
Mitigation vs Resistance
Some methods of protection from mechanical damage present a rare case where mitigation and resistance become a bit blurred. A concrete coating or casing reduces the frequency of contact with the pipe steel. That is a reduction in PoD and therefore can be thought of as a mitigation. This requires that the protection be viewed as independent from the component—it is something added to the component as a protective measure. That is clear for slabs and even casings, but a coating, even concrete coating, is often viewed as part of the component, especially when used as a buoyancy control. In that case, contacting the coating counts as contacting the component. This is also influenced by the definition of ‘damage’ implicit in the PoD. Does damage to a concrete coating constitute damage to the component? This is a matter of perspective and definition. The loss of a buoyancy control feature is analogous to the challenge of modeling spans, as previously discussed.
For consistency, the sample assessments offered here consider slabs, casings, and concrete coatings to be distinct from the component and therefore best treated as mitigation measures. Under this view, the component is not damaged when only the protection is damaged. Alternative views may be more appropriate for certain risk assessment situations.
Mitigation-by-others
Because mitigations can originate at facilities not under the control of the pipeline operator, there may be both foreign (owner of the origination point of the exposure) mitigations and operator (of the segment being assessed) mitigations. For instance, the highway department and law enforcement agencies will mitigate some of the threat of vehicle impact to nearby pipelines via barriers, speed limits, road configuration, etc. An operator of nearby facilities will mitigate the potential for rupture or explosion of their facilities, reducing the exposure to the assessed component.
From the perspective of the pipeline operator, the protective measures employed by others reduce the exposure to the pipeline. These actions taken by others are additive to the protective measures installed and maintained by the pipeline operator. Since these mitigations-by-others effectively change the rate of pipeline exposures, and since it will often be difficult to assess and track changes in mitigations of non-owned facilities, it is usually more efficient to include foreign mitigations in the exposure rate estimate assigned to the non-owned facility. Otherwise, the risk assessment tends to expand into an assessment of non-owned systems. The mitigations done by others are often still important to understand and perhaps quantify, but keeping them separate from mitigations applied by the assessed component owner is a modeling convenience.
Other examples include natural mitigation measures and indirect actions taken by others. Consider traffic impact potential where trees, berms, ditches, fences, etc are de facto barriers (mitigations) to vehicle impacts. Treatment of these features as mitigation-by-others, and including their role as exposure-reducers, is the simplest approach. However, should the trees be removed or die, the ground leveled, or the fence be removed, having the rates of ‘vehicle leaves roadway’ separate from the benefits of the features would be useful.
Similarly, when water depth is sufficient to preclude anchoring, dredging, fishing, and other third-party activities as possible damage sources, damage probability to offshore components is reduced. Just as with other natural barriers, the water depth can be treated as a mitigation in the risk assessment. The fact that the water depth may also preclude certain other activities can be factored into the exposure estimate without triggering an inappropriate ‘double-counting’ effect in the risk assessment.
A general rule of thumb may be to include all features and actions not under the control of the component owner as influences to exposure rates. Actions and features that are controlled by the component operator are treated as mitigation measures. That is, if foreign, then exposure, otherwise mitigation. An exception may be cases where it is desirable to develop an argument, via cost/benefit analyses, for a change in mitigation activities, even if performed by others.
Resistance Baseline
There is an interesting interplay between exposure and resistance since both are sensitive to the exact definition of ‘failure’. Exposure measurement implicitly involves a theoretical baseline for resistance since an exposure is defined as an event that causes ‘failure’ and resistance is a measure of invulnerability to ‘failure’. So, the definition of ‘failure’ is a component of resistance, just as it is for exposure. This is again best illustrated by examples. If failure = ‘permanent deformation’, then resistance measures the invulnerability to permanent deformation, given the presence of a force (an exposure) that can cause permanent deformation if there is insufficient resistance. If failure = ‘leak/rupture’, then resistance measures the invulnerability to leak/rupture, given the presence of a force (an exposure) that can cause leak/rupture if there is insufficient resistance.
If resistance is to be measured in simple terms of percentage or fraction of mitigated exposure events that do not result in failure, there is a need to define a starting point or baseline. That baseline must be consistent with the definition of the exposure event. If the baseline is to be ‘zero resistance’ then exposure involves imagining that there is no resistance at all. A thin-walled aluminum can or cardboard tube, egg-shell vessel, etc, crushable between two fingers—is the right mental image for almost complete lack of resistance. So, the image of an unprotected beverage can or cardboard tube sitting atop the ground, is the correct image to estimate exposure event frequencies when a ‘zero resistance’ baseline is chosen. If such a can may be broken /crushed/deformed by the event, then it should be counted as an exposure.
There are obviously many more exposure events that could break an aluminum beverage can compared to a steel pipeline. So exposure counts are dramatically increased when zero resistance is assumed. As a matter of fact, the number of potentially damaging events always increases when the threshold for damage is lowered.
If the risk assessment designer feels that zero resistance results in excessive exposure counts, he can define the resistance baseline as something other than zero. For instance, he may set the resistance baseline as the fraction of exposures above ‘normal’, which do not result in failure. Then resistance is the amount of ‘extra’ stress carrying capacity once ‘normal’ loads have been accommodated. This can theoretically lead to negative values. Perhaps failure has not yet occurred in a weakened component only because the upper limits of ‘normal’ have not recently occurred. If there is not only no ‘extra’ resistance, but not even ‘sufficient’ resistance, then a negative value is warranted.
This is a modeling choice. A changing resistance baseline—potentially different for each component under varying ‘normal’ loads—may be confusing to some. On the other hand, the imagineering of a no-resistance component and the associated need to count many seemingly minor exposures might be more troublesome for others.
Exposure Influenced by Resistance
When a resistance baseline other than ‘zero resistance’ is used, exposure varies, as was suggested in the previous section. Exposure rates are sensitive to changing resistance. When material characteristics degrade or are changed, a greater number of exposure scenarios can cause failure. Examples of such material changes include:
- creation of a HAZ,
- extreme temperatures effects reducing material stress-carrying capabilities,
- UV degradation,
- Hydrogen embrittlement.
Other examples of changing resistance include metal loss by corrosion, crack progression through a component wall, unanticipated or intermittent external loadings such as debris impingement in flowing water or gravity effects when support is lost, and others.
The most robust assessment can provide for a continuous updating of exposure estimates based on changing resistance. That is, if a resistance baseline other than zero has been chosen, then the count of exposures—events that can cause failure—will increase as resistance decreases.
Similarly, when modeling time-dependent failure mechanisms like cracking, the TTF shortens when either the modeled rate of cracking increases or the effective wall thickness is reduced. If material degradation or change (for example, creation of a HAZ) causes the material toughness/brittleness to change, is that better modeled as increased crack propagation rate (ie, more exposure)? or rather as reduced effective wall thickness (ie, less resistance)? Fortunately, the suggested mathematics ensures the same result regardless of chosen approach. While either will work in the proposed PoF model, it may be more intuitive to model this as a change in effective wall thickness. That way, this potential change in a material’s property is readily seen alongside any other potential change in component strength.
As another example of the modeling choices for exposure-resistance interaction, consider the role of an expansion loop in a pipeline. If the expansion loop is present to reduce thermal stresses and fatigue, most would agree that resistance has been improved rather than exposure reduced or mitigation improved. After all, the changes in temperature still occur and the pipe is not protected from those resulting forces. Only the pipe’s reaction, its ability to absorb the forces without damage, have changed. However, a counter could be that each temperature cycle is now no longer imparting the same stresses and, hence, exposure estimates should be reduced. Again, either choice yields the same PoF under the suggested modeling approach.
Aspects such as inclusion of suspect weaknesses will always be necessary in the risk assessment. Other aspects will be discretionary. The risk assessor can decide, in the context of desired PXX and trade-offs between complexity and robustness, the optimum way to handle resistance and resistance-exposure issues such as:
- Yield vs ultimate stress levels.
- Inclusion of intermittent loadings.
- The extent of simultaneous consideration of changing resistance with loadings potentially causing exceedance of stress-carrying capability. See discussions of unanticipated spans and loss of buoyancy control features in .
Frequency, statistics, and probability
There is a difference between ‘frequency’ and ‘probability’ even though in some uses, they are mostly interchangeable. As used in this book, frequency refers to a count of events while probability refers to the likelihood of one or more events over some future time period. Either frequency or probability are suitable metrics in a risk assessment. If values are small, the two are numerically equivalent, ie, at very low frequencies of occurrence, the probability of failure will be numerically equal to the frequency of failure.
The actual relationship between failure frequency and failure probability is often modeled by assuming an underlying distribution from which probabilities can be determined. For example, the Poisson equation relating spill probability and frequency is
P(X)SPILL = [(f *t)X/X ! ] * exp (-f * t)
Where
P(X)SPILL = probability of exactly X spills
f = the average spill frequency for a segment of interest (spills/year)
t = the time period for which the probability is sought (years)
X = the number of spills for which the probability is sought,
in the pipeline segment of interest.
The probability for one or more spills is evaluated as follows:
P(probability of one or more)SPILL = 1 – P(X)SPILL
Where X = 0
Frequency may be more useful when conservative risk assessments produce high probabilities. For example, a P99 risk assessment will often be more useful if estimates are expressed as frequencies versus probabilities, due to the high number of 90+% probability estimates that commonly emerge in initial, conservative assessments. Frequencies are able to discriminate between, say 10 events/year and 20 events/year, while a per-year probability estimate (of one or more events per year) based on 10 and 20 events/year yields high and virtually indistinguishable values (99+%, dependent upon the relationship between frequency and probability used). Large probability numbers typically also emerge in a pipeline risk assessment for high exposure rates—for example, 8 excavations per mile-year—and in other values generated from very conservative assumptions. These too are better captured as frequencies.
A statistic is a value calculated from a set of numbers—it is not a probability. Statistics refers to the analyses of data; and the most compelling definition of probability is “degree of belief,” which normally utilizes statistics but is rarely based entirely on them.
Statistics are methods of analyzing numbers or the numbers emerging from the analyses. While they are usually an important ingredient in predictions, statistics are based on past observations—past events. Statistics from historical incidents do not imply anything about future events until inductive reasoning is employed. As discussed in PRMM, historical failure frequencies—and the associated statistical values—are normally used in a risk assessment but must be used carefully. Extrapolating future failure probabilities from historical information alone can lead to significant under—or over—estimations of risk.
Failure rates
A failure rate is simply a count of failures over time, by some definition of ‘failure’.
Pipeline failure rates have historically been starting points for determining absolute risk values. Past failures on the pipeline being assessed are often pertinent to future performance. Beyond that, representative data from other pipelines are sought. Failure rates are commonly derived from historical failure rates of similar pipelines in similar environments. That derivation is by no means a straightforward exercise. In most cases, the evaluator must first find a general pipeline failure database and then make assumptions regarding the best “slice” of data to use. This involves attempts to extract from an existing database of pipeline failures a subset that approximates the characteristics of the pipeline being evaluating. Ideally, the evaluator desires a subset of pipelines with similar products, pressures, diameters, wall thicknesses, environments, age, operations and maintenances protocols, etc. It is very rare to find enough historical data on pipelines with enough similarities to provide data that can lead to confident estimates of future performance for a particular pipeline type. Even if such data are found, estimating the performance of the individual from the performance of the group presents another difficulty. In many cases, the results of the historical data analysis will only provide starting points or comparison points for detailed estimates of future failure frequency.
As a common damage state of interest, fatality rates are a subset of pipeline failure rates. Very few pipeline failures result in a fatality. A rudimentary frequency-based assessment will simply identify the number of fatalities or injuries per incident and use this ratio to predict future human effects. For example, even in a database with much missing detail (as is typically the case in pipeline failure databases), one can extract an overall failure rate and the number of fatalities per length-time (i.e., mile-year or km-year). From this, a “fatalities per failure” ratio can be calculated. These values can then be scaled to the length and design life of the subject pipeline to obtain some very high-level risk estimates on that pipeline. Samples of high-level data that are useful in frequency estimates for failure and fatality rates is given in PRMM Tables 14.1 through 14.4.
Several sources of failure data are cited and their data presented in this book. In most instances, details of the assumptions employed and the calculation procedures used to generate these data are not provided. Therefore, it is imperative that data tables not be used for specific applications unless the user has determined that such data appropriately reflect that application. The user must decide what information may be appropriate to use in any particular risk assessment. The evaluator will usually need to make adjustments to the historical failure frequencies in order to more appropriately capture a specific situation.
The recommendation is to make use of historical failure rates as calibration or benchmark tools. A risk assessment of a collection of components can be compared to relevant historical failure data as a validation tool. This is further discussed in later chapters.
Additional failure data
Historical failure rate data are also sometimes used to suggest statistical distinctions for pipeline characteristics such as wall thickness, diameter, depth of cover, and potential failure hole size. Such distinctions drawn from historical incident data are useful but can be misleading. The general warning regarding cause-and-effect before using any results of statistical analyses is germane: if a suggested correlation does not make sense, perhaps it does not really exist.
Several studies estimate the benefits of particular mitigation measures or design characteristics. These estimates are often based on statistical analyses of historical incidents. A study may often rely solely on the historical failure rate of a pipeline with a particular characteristic, such as a particular wall thickness or diameter or depth of cover. To be useful, this type of analyses must isolate the factor from other confounding factors and should also produce a rationale for the observation. For example, if data suggest that a larger diameter pipe ruptures less often on a per-length, per-year basis, is there a plausible explanation? In that particular case, higher strength due to geometrical factors, better quality control, and higher level of attention by operators are plausible explanations, so the premise could be tentatively accepted even though the diameter does not cause all of the effect seen. In other cases, the benefit from a mitigation is derived from engineering models or simply from logical analysis with assumptions. Observations from various studies are sometimes available and useful in assigning mitigation values.
Inferences drawn from statistical examinations of large populations of pipelines must be used carefully. They will not reflect conditions at specific locations of certain pipelines. Since risk management is ultimately interested in the specific locations, generalized data can be misleading.
Note discussions of individual behavior versus population behavior in many places in this text. If a certain population of pipeline segments does indeed ‘behave’ differently, that is useful insight, especially when a segment to be assessed can be assigned to that population.
Potential risk reduction benefits from several mitigation measures, as suggested by various references, have been compiled in PRMM Table 14.11. These are often based on statistical examinations of large populations of pipelines and may not reflect conditions at specific locations.
Other examples of statistical relationships include the possible mitigative effects of depth of cover and resistive benefits of pipe diameter, as discussed in ref [1043].
Consequences
Implicit in any risk assessment is the potential for consequences. This is the last of the three risk-defining questions: If something goes wrong, what are the consequences?
Consequence implies a loss of some kind. The loss or damage state of interest must be pre-determined for a risk assessment.
Consequences that are commonly measured in a risk assessment include:
- Leaks and ruptures.
- Leaks and ruptures beyond a pre-specified threshold of loss.
- Results of leaks and ruptures:
- Fatalities and injuries,
- Property loss,
- Environmental harm,
- Monetary losses, including service interruption costs.
Some losses are more readily quantified than others. Both direct and indirect costs are often included in a modern risk assessment. See and PRMM for further discussion.
Risk assessment
Risk assessment is a measuring process capturing both the probability and consequences of the potential events of interest. The most useful risk assessment results are expressed in verifiable measurement units such as incidents per year, dollars per mile-year, and many others.
Risk is not a static quantity. Along the length of a pipeline, conditions are changing. As they change, the risk is also changing in terms of what can go wrong, the likelihood of something going wrong, and/or the potential consequences. Because conditions also change with time, risk is not constant even at a fixed location. When we perform a risk evaluation, we are actually taking a snapshot of the risk picture at a moment in time.
It is important to recognize what a risk assessment can and cannot do, regardless of the methodology employed. The ability to predict pipeline failures—when and where they will occur—would obviously be a great advantage in reducing risk. Unfortunately, this cannot be done except in extreme cases. Pipeline accidents are relatively rare and often involve the simultaneous failure of several safety provisions. This makes accurate failure predictions almost impossible. So, modern risk assessment methodologies provide a surrogate for such predictions. Assessment efforts by pipeline operating companies are normally not attempts to predict how many failures will occur or where the next failure will occur. Rather, efforts are designed to systematically and objectively capture everything that can be known about the pipeline and its environment, to put this information into a risk context, and then to use it to make better decisions.
A common incompleteness in risk assessment is to characterize a risk solely in terms of an average from a population distribution thought to generally represent the pipeline being assessed. While it is appropriate to seek an understanding of the distribution from which this individual is likely a part, the individual’s position within that distribution must be characterized. The distribution of, for instance, event frequencies, will often include values that are orders of magnitude higher and lower than the average.
Risk assessment vs risk analyses tools
A risk assessment for a pipeline should meet a minimum set of requirements before it is labeled an assessment rather than a more limited analysis of risk. There are many risk analysis techniques that are better labeled as tools—ingredients or supplements to a risk assessment. See full discussion in .
Measurements and Estimates
Proper risk assessment uses all available information. Information used in a risk assessment takes two general forms—measurements and estimates. An often-used aspect of this assessment methodology is the simultaneous use of both actual measurements and inferential estimates. For purposes here, a measurement is a reading or value obtained using an instrument on a specific component while an inferred estimate emerges from secondary or indirect information, often produced based on the underlying physics or even from engineering judgment.
Measurements include inspections for corrosion feature dimensions, corrosion rates from coupons, crack depths, metal toughness, soil resistivity, and many others. Inferential or indirect information is often based on material science—for example, possible corrosion rates associated with metals in certain soils, potential crack growth rates in certain materials exposed to certain loadings, etc. For example, obtaining a pipe wall thickness by UT instrument is a measurement, while estimating wall thickness based on the pipe date and possible degradations (for example, mpy by corrosion) is an estimate. Inferential estimates are normally applied to all components for which a measurement is not available.
The final value to be used in the risk assessment emerges from an examination of both, after adjustments for information age and accuracy have been made to each. The assessment chooses the best value based on the strength of evidence—newer and more accurate information is chosen over older, less accurate information.
It is common for information collection on long, linear assets such as pipelines to be non-uniform. The disparities in information availability along the route is accommodated by this simultaneous use of measurements and estimates.
Examples of measurements used in a typical risk assessment include:
- Visual and NDE inspections performed on accessible components
- CIS[4] inspections for CP effectiveness
- DCVG/ACVG coating holiday surveys
- Coupon corrosion rates
- ILI anomalies, especially when UT is providing a direct pipe wall thickness measurement[5]
- Test lead readings of pipe-to-soil potential.
In addition to instrument inaccuracies and operator errors, additional nuances have to be considered in using measurements. For example, highly accurate measurements, but taken some distance from the point of interest (such as internal corrosion coupons and test lead readings). Where conditions are not consistent, extrapolations can be very inaccurate. The age of the measurement is also important. The pipe wall thickness measured at the pipe mill 20 years ago may have little relevance to the actual wall thickness of the buried pipe 20 years later.
Each measurement and each inferential estimate requires an adjustment for its age and accuracy. The superior value—for example, the value among all possible measurements and estimates with the best age/accuracy combination should determine the value used in the risk assessment. A process is used to adjust each of these to reflect the confidence in the current (for example, age-adjusted and accuracy-adjusted) validity of their values.
Once adjusted, selection of the more optimistic value ensures that better information overrides lesser information in a conservative risk assessment. This same technique compares and chooses better measurement data over lesser measurement data, when multiple measurements are available at the same location (for example, multiple ILI’s or multiple overline surveys on the same segments).
Again, with a consistent application of conservatism in uncertainty estimates, the more optimistic value—the information suggesting the best wall thickness—will usually govern.
Inappropriate overrides of inspection/test information are avoided by the careful application of consistent confidence values. When the confidence value is based upon ‘damages since inspection/test’, it must be ensured that equivalent PXX levels are used everywhere. For instance, it would not be correct to adjust P99 estimates from a 5-year old inspection by using P50 estimates of what may have happened in the last 5 years (for example, the wall thickness could be as low as 0.200” but let’s assume that only 0.1 mpy of corrosion could possibly have occurred.)
Nevertheless, even with carefully chosen uncertainty values, the more optimistic value will not always be the best value to use.
- This is illustrated in a specific example:
One set of measurements/estimates shows a 10% certainty that no more than 20 mils of wall loss has occurred in a certain thick-walled component. This uncertain estimate implies a maximum wall loss rate of 20 mils / 10% = 200 mils could actually have occurred. Taken from an original 0.500″ wall thickness, this suggests a current wall thickness of 0.300″.
As another piece of evidence, a recent ILI shows, with 90% confidence (including general- and run-specific inaccuracies) a 300 mil wall loss feature, leading to a current wall estimate of 0.500” – 0.300” = 0.200”.
In this case, the ILI value should obviously govern. The original estimate, while seemingly conservative (tending to overestimate actual risk) was actually not conservative enough, as demonstrated by the recent ILI. A real-world example of this occurred when an operator installed a new gathering system which, after only a few years in service, experienced internal corrosion leaks. Upon investigation, corrosion rates in excess of 200 mpy were discovered—far exceeding what was thought possible in the design phase.
Differences in measurements/estimates compared to actual values will be influenced by both uncertainty and conservatism. Uncertainty causes an unintentional, undesirable difference that must be tolerated. In conservatism, the difference is intentional, in acknowledgment of the natural (random) variability inherent in real-world phenomena. Both are discussed in following sections.
Uncertainty
The role of uncertainty in risk management is multifaceted as noted in PRMM:
- Risk assessment measurement error and uncertainty arise as a result of the limitations of the measuring tool, and the processes of taking the measurement, including the skills of the person performing the measurement. Pipeline risk assessment also involves the compilation of many other measurements (pipe strength, component wall thickness, depth of cover, pipe-to-soil voltages, pressure, etc.) and hence absorbs all of those measurement uncertainties. Risk assessment also makes use of engineering and scientific models (corrosion rates, stress formulae, thermal effects and overpressure estimates, etc.) each with accompanying errors and uncertainties.
- Adding to the uncertainty is the fact that the thing being measured in pipeline risk assessment is undergoing continuous change due to changing surroundings, as well as sometimes changing service conditions and possible degradation.
- A risk assessment must identify the role of uncertainty in its use of assumptions including how the condition of “no information” is to be handled in the assessment. For many applications of risk assessment results, it is advantageous to incorporate a conservative underlying philosophy of:
Uncertainty = increased risks
This not only encourages the frequent acquisition of information, but it also enhances the risk assessment’s credibility. Unless a conservative ‘guilty until proven innocent’ approach is used, there will be no incentive to regularly inspect and verify conditions that influence risk, Riskier conditions may only be discovered when incidents occur. Investigating the incident will inevitably find that the risk assessment had assumed favorable, low risk conditions, in the absence of confirmatory information. This often implicitly discredits all other results of the risk assessments.
Conservatism (PXX)
Conservatism is generally taken to mean an intentional bias towards over-estimation of the true risk. Risk assessment incorporating a high level of conservatism will tend to overstate the risks, perhaps by several orders of magnitude. This occurs through the use of input values and calculations that are based on worst-case or at least ‘higher risk’ assumptions. Risk assessment conducted with no conservatism always assumes the most likely values and the calculations that produce results that are most often true to the most common actual conditions.
Conservatism is a useful characteristic in many applications of risk management. However, conservatism may also be excessive, leading to inefficient and costly choices when not properly acknowledged in decision-making.
A risk assessment should be performed with a target level of conservatism. As used here, the PXX designations indicate a level of confidence that actual experience will be no worse than estimated. For instance, P90 is the point where 90% of future performance should be at or below this value. It is the point where one would be negatively surprised 10% of the time—once out of every ten episodes.
A P90+ assessment intentionally contains layers of conservatism. This is often done to encourage future data collection as a means of risk reduction and, more importantly, to ensure that risks are not underestimated.
For simplicity, the PXX refers to the conservatism of inputs rather than to the resulting conservatism of the assessment. Each risk assessment is obtained via a collection of inputs, each with an estimated level of uncertainty equal to PXX. Actual conservatism of final risk estimates often increase dramatically due to layering of conservatism due to a bias towards conservatism for each input. This layering also produces increasingly unlikely scenarios since multiple low probability events are assumed to occur simultaneously. Therefore, the PXX refers to the intended level of uncertainty associated with each input rather than the risk estimates. The PXX level of the final risk assessment is identified and managed in the calibration/validation phase (see ).
Less conservative assumptions are sometimes needed for practical reasons. For instance, a defect over 95% through a pipe wall could exist and survive a pressure test or be undetected in an inspection. It would be counter-productive to assume that such rare defects exist everywhere, even though such an assumption would be very conservative. Rather, the wall thickness implied by a Barlow stress calculation (perhaps adjusted by a factor showing some localized thinning could have occurred) can be used as the primary means to estimate the probable—and still conservative—wall thickness when no other confirmatory integrity information is available.
P84, as representing approximately one standard deviation of a one-sided normal distribution, might be appropriate as a target level of conservatism that may have some consistency with certain design practices.
Some practitioners also produce P10 or similar estimates, reflecting best case or at least more optimistic inputs. As with the more conservative layering, choosing multiple optimistic inputs produces a combination with even more optimism, as well as more rarity.
The user should determine the level of conservatism appropriate to his needs. Often a P99 level—negative surprises only 1% of the time—or higher is warranted for assessments supporting new projects or presentations in public forums. A P50 to P70 level of analysis might be more appropriate for budget setting or long range planning.
See also the discussion of calibration in .
Risk Profiles
While a profile can mean different things—risk changes over time, types of events possible (see FN curve discussion), etc—the focus here is on risk changes over ‘space’. Generation of a profile of changing risks along a pipeline is essential to the understanding of risk and the subsequent management of those risks.
The PoF profile is produced by the risk assessment and shows location-specific PoF values. PoF changes along a route in response to dozens of factors that might change, including pipe/component specification (age, wall thickness, diameter, etc), coating condition, soil corrosivity, pressure, road crossings, foreign pipeline crossings, AC electrical power lines, depth of cover, and many more. Per-incident consequence costs can vary dramatically along a pipeline, changing with differences in pressure, flowrate, topography, receptor proximities, and, to a lesser degree, differences in pipe characteristics (ie, age, coating, wall loss).
- Two very different risk profiles, but perhaps with the same cumulative risk
Risk profiles are critical aspects of risk assessment, as discussed in and in risk management, as noted in .
Cumulative risk
While the profile is an essential element in understanding, presenting, and managing risk, it is not an efficient tool for setting higher-level risk management strategies. Higher level risk management strategizing is distinct from foot-by-foot risk management. It involves risk summaries and comparisons of sometimes long segments.
Cumulative risk is a metric used to gauge the risk posed by any length of pipeline or any collection of components. Because risk values are very location specific along the pipeline, a method of ‘rolling up’ or aggregating all of the risks for any portion of a pipeline is important.
A typical pipeline risk assessment should show the level of risk that each point along the pipeline presents to its surroundings. Two pipeline segments, say 100 and 1,800ft, respectively, may have the same ‘rate-of-risk’, expressed in units such as $/mile or incidents/mile-year, for all portions. So each point along the 100-ft segment presents the same risk as does each point along the 1,800-ft length. Of course, the 1,800-ft length presents more overall risk than does the 100-ft length, because it has many more risk-producing locations. These two pipelines may also have exactly the same total risk, in which case, the shorter line has a much higher rate of risk than does the longer.
Longer pipeline lengths logically have higher risk values, since a longer line logically has a higher ‘area of opportunity’ for failure and generally exposes more receptors to consequences. Both the risk and the rate-of-risk are important to risk management.
In reality, both the 100 ft and 21,800 ft segments will be comprised of multiple components, each with its own length and contribution to risk. Many pipelines will have short lengths of relatively higher risk among long lengths of lower risk, as demonstrated in their risk profiles. In summarizing the risk for the entire pipeline, a simple average or median will hide the shorter, higher risk sections. A ‘weak-link-in-the-chain’ strategy—focusing on the maximum risk or rate-of-risk alone—will similarly not reflect the full risk. A cumulative risk—each portion with its respective length aggregated into a summary number—will produce the most meaningful measure. This is the area under the risk profile curve. As with any area-under-the-curve summarization, the shape of the curve—the profile, in this case—remains critical to the understanding.
The cumulative risk characteristic is also measured in order to track risk changes over time or compare widely different types of risk mitigation projects. Projects such as public education, ROW maintenance, and patrol are not usually assigned large mitigation benefits on a per-foot basis, but can impact many miles of pipe and hence have a large impact on risk. If, for example, we want to compare the risk benefit of clearing 20 miles of pipeline ROW and installing new signs to the value of lowering and re-coating 100 feet of pipeline. On one hand, the failure potential can be reduced significantly along a short stretch of pipeline. On the other hand, a more modest mitigation could be broadcast over a long length of pipeline. The comparison is not intuitive unless an accurate method of aggregation is established.
See for discussion of proper measurement of cumulative risk.
Changes over time
Note that the cumulative risk values can also demonstrate the natural risk increase over time. Recall the entropy analogies—risk will increase over time unless offsetting energies are applied. The risk assessment measures the risk at all points along a pipeline, at a specific point in time. The risk numbers are therefore a snapshot. They represent all conditions and activities at the time of the snapshot. If inspections and maintenance are not done, safety degrades. The most meaningful measure of changes in the risk situation will be how the risk for the length of interest changes over time.
Changes in risk are easily tracked by comparing risk snapshots. This can be done for a specific point on a pipeline, an entire pipeline, or any collection of components from any pipeline. It can also be done for any set of pipelines, such as “all pipelines in Texas,” “all propane lines,” “all mainlines > 12”, “all lines older than 20 years,” and so on.
The cumulative risk calculation also remedies the difficulties encountered in tracking risk changes when segment boundaries change after every assessment. The CR can be calculated for any length of pipe, regardless of segment boundaries.
Valuations (cost/benefit analyses)
Note that a superior risk assessment can show the value of changes in practice by estimating the corresponding changes in failure potential and/or consequence. Many common practices are intuitively important and necessary, but rely on subjective choices regarding level of rigor. Examples of practice whose role is otherwise difficult to estimate include:
- Instrument maintenance/calibration—can be linked to outage rates
- Training—can be linked to human error rates
- Procedures—can be linked to human error rates
- Monitoring—can be linked to intervention opportunities
- Marking/labeling of critical equipment—can be linked to human error rates.
While the risk-assessment-generated estimates of benefits will also contain some subjectivity, the reductionist approach allows many more opportunities for concurrence among stakeholders regarding specifics of the role of the practice in risk reduction, thereby helping to ensure more objective results.
For example, incident investigations frequently cite the role of inadequate procedures as an aspect of the incident. Absent such incidents, the role of procedures, and an argument to improve the practice within a company, may generate widely differing beliefs regarding expected benefits. However, the risk assessment approach that dissects the specific aspects of procedures and their role in incident prevention, as discussed in , allows all parties to identify specific points of divergence of opinion and opportunities to collect pertinent information or otherwise come to an agreement on appropriate valuations.
Risk Management
Risk management is the intentional changing of risk levels. As a reaction to perceived risk, it is the set of choices in action undertaken in support of a strategy towards a level of risk deemed acceptable. Like all management initiatives, risk management involves establishing priorities and making judgments about trade-offs such as cost vs. benefit. Even with very accurate risk assessment, risk management can be challenging, involving socio-economic and political decisions around acceptable or tolerable risk, urgency with which risks may need to be reduced, and many others.
Since risk is the product of probability of failure (PoF) and consequence of failure (CoF), either or both can be changed to change the risk. Typically, PoF offers more opportunities for controlling risk than CoF. For this reason, effective risk management programs generally concentrate more on PoF aspects.
Practically speaking, our objective is not the elimination of risk, but the management of it for an acceptable result. We cannot eliminate risk without sacrifices that would be unacceptable—like halting the benefits derived from the use of a pipeline.
See for a discussion of pipeline risk management.
-
This can be confusing to some since ‘exposure’ is a term also commonly applied to a location on an originally buried pipeline that has experienced a depletion of cover, rather than as one of the essential elements of a PoF measurement. ↑
-
Any future time can be used; producing risk estimates for the following year is common and used as an example here. ↑
-
This example assumes some basic knowledge of protection of buried steel pipeline by cathodic protection. See chapter 6 and PRMM if more background is needed. ↑
-
Some may argue that overline surveys such as CIS, DCVG, etc are inferential, ie, inferring conditions on a buried pipe some distance from the actual measurement. For purposes here, these surveys are considered measurements, recording actual values that represent a condition, even if that condition is used to infer other characteristics. Error rates increase by influences such as proficiency of surveyor and surface conditions. ↑
-
ILI by MFL can similarly be said to be an inferential measurement, but is also treated as a measurement for purposes here. ↑