Table of Contents

Assessing Risk

As far as the laws of mathematics refer to reality, they are not certain; and as far as they are certain, they do not refer to reality.

Albert Einstein

The risk management process comprises five basic steps:

Risk modeling
Data collection and preparation
Segmentation
Assessing risks
Managing risks.

The first four are actually components of assessing risk while the last is the reaction to what the assessment has revealed. This section provides some background to the assessment of risk with a focus on applications to pipeline systems.

Risk assessment building blocks

Risk assessment practitioners have varying ideas of how to understand and measure risk. Many tools and techniques are available to help. While almost all can improve understanding, few should be considered to be comprehensive risk assessment techniques. There is a real difference between identifying elements of risk and performing a risk assessment.

Ref [1052] provides a list and discussion of “risk assessment techniques”:

Brainstorming
Structured or semi-structured interviews
Delphi
Checklists
Primary hazard analysis
Hazard and operability studies (HAZOPS)
Hazard Analysis and Critical Control Points
Environmental risk assessment
What if? analysis
Scenario analysis
Business impact analysis
Root cause analysis
Failure mode effect analysis
Fault tree analysis
Event tree analysis
Cause and consequence analysis
Cause-and-effect analysis
Layer of protection analysis (LOPA)
Decision tree
Human reliability analysis
Bow tie analysis
Reliability centered maintenance
Sneak circuit analysis
Markov analysis
Monte Carlo simulation
Bayesian statistics and Bayes
FN Curves
Risk indices
Consequence/probability matrix
Cost/benefit analysis
Multi-criteria decision analysis.

Each are described in the reference along with a complexity rating and an opinion as to whether each can produce ‘quantitative’ results.

For improved clarity, these techniques should be categorized according to the role they play in risk assessment. Several ways to group them could be appropriate but for discussion purposes here, the following categories are suggested:

Risk Assessment techniques—full risk assessment methodologies, meeting all requirements of an actual risk assessment.
Risk Tools—ingredients or supplements to a risk assessment.

Where ‘tools’ can be further categorized into:

Hazard/threat identification—techniques focused on presenting lists of or confirming hazards or threats to a system. Examples include HAZOPS, brainstorming, check lists.
Scenario identification—techniques focused on the chain of events leading to a failure or unfolding once failure has occurred. Examples include event trees, fault trees, cause-effect analyses.
Analyses support—usually statistically based, these techniques work with a risk assessment model to improve outputs. Techniques are applied both to risk assessment inputs and outputs (results). Examples include Monte Carlo simulation, Bayesian statistics, and Markov analyses.
Visualization—techniques, usually with a strong graphical nature, used to support presentation or visualization of risk results or inputs. Examples include bowtie, matrix, FN curves.

Since many techniques can be used in differing ways, not all fit neatly into one of these categories. This does not detract from the central idea here that risk tools play various roles in a risk assessment, but are NOT complete risk assessment methodologies.

Tools vs Models

An important distinction has been drawn between risk assessments—meaning methodologies, techniques, etc that produce complete risk estimates; versus risk analyses tools that play a more limited role, such as hazard identification or analyses of specific cause-consequence pairings.

One of the simplest discriminators between a risk model and risk tool is the ‘map point’ test. This test simply means that, using a real risk assessment approach, one can pick any point on any pipeline and should have access to all pertinent risk information for that location. If the so-called risk assessment cannot support this straightforward and intuitive task, then it is probably a risk tool rather than a complete risk assessment model, at least for purposes of this discussion. This and other ways to identify a true risk assessment are presented in a later section. But first is an examination of some of the more popular risk tools.

Hazard Identification/Evaluation Techniques

In addition to the techniques from ref [1012], eleven hazard evaluation procedures used in the chemical industry have been identified [9]. Each of these tools has strengths and weaknesses, including amount of benefit derived from the application, costs of applying the tool, and appropriateness to a situation.

Some of the more formal risk tools in common use by the pipeline industry include:

HAZOP
Fault-tree/event-tree analysis.

See PRMM for details on these.

Analyses Support Tools

Some risk techniques, often noted as stand-alone risk assessments, are actually processes that can be applied to ‘real’ risk assessment models. Bayesian, Markov, Monte Carlo, and others are better viewed as processing techniques rather than risk assessments themselves. They supplement a risk assessment by providing better understanding of patterns and numerical ‘behavior’ of the data.

Other tools such as Layer of Protection Analysis (LOPA) typically focus on an aspect of risk such as control and safety equipment/instrumentation analyses.

Visualization Tools

Matrix

One of the simplest risk visualization structures is a matrix. It displays risks in terms of the likelihood and the potential consequences associated with an asset or process. The vertical and horizontal scales may be qualitative, using a simple scale, such as high, medium, or low, or using detailed descriptors guiding the assignments of matrix positions. The scales may also employ numbers; often relative—for example, from 1 to 5—or possibly using categories of absolute risk values. See .

Events or collection of events are assigned to cells of the matrix based on perceived or estimated likelihood and consequence. Risks with both a high likelihood and a high consequence appear in one corner, usually the upper right part of the matrix. This approach may simply use expert opinion or a more complicated application might use quantitative information to rank risks. While this tool cannot incorporate all pertinent factors and their relationships, it may help to crystallize thinking by at least displaying the risk as two parts (probability and consequence) for separate examination.

Some may believe that risks with, say highest consequences but low probability, require different management from those with, say lower consequences but higher probability, even if both scenarios show equal risk (see further discussion in ). A risk matrix therefore sometimes supports corporate decision-making or risk tolerance guidance, whereby response urgencies to manage risk emerge from the various combinations of probability and consequence.

While sometimes interesting presentation/visualization tools, matrices are not risk assessment models—they arguably fail all of the tests proposed to determine whether they can serve as an assessment technique. A matrix, even as only a presentation tool is also rather clumsy, since it cannot appropriately illustrate many important considerations. For example, maskings include differences in risk due solely to differences in pipeline length; whether risk is due to a handful of peaks or rather from a consistent, but high level; the range of consequence scenarios possible; etc. There is a certain disservice in presenting risk information in a way that is incomplete or potentially misleading.

Example of qualitative risk criteria matrix

Others

Other visualization tools often found in risk assessment presentations include FN curves and bowtie. An FN curve is a variation on the matrix, showing both probability and consequence of various event scenarios. The bowtie combines an event tree—the event leading to the ‘knot’—and the fault tree—event emerging from the ‘knot’, where the knot is the event or asset whose risk is being displayed.

Risks at specific locations are often shown on FN curves where the relationship between event frequency and severity is shown. FN curves display failure count or frequency (F) versus consequence, where consequences are often a number of fatalities (N). This type of risk presentation, often called a depiction of societal risk, is a usually a plot of the frequency, f, at which N or more persons are expected to be fatally injured.

Event and fault tree analyses also serve as visualization tools. The distinction is blurred when values are assigned to branches and nodes—it now becomes more than a simple visualization tool but is still unable to complete a risk assessment methodology for an entire pipeline.

Presentation graphics/charts are further discussed in . GIS also has a strong visualization aspect, as noted in later sections.

What is a risk assessment model?

Although we understand the underlying engineering concepts related to pipeline failure, predicting failures beyond a laboratory in a complex “real” environment can prove impossible. No one can definitively state where or when an accidental pipeline failure will occur. However, the more likely failure mechanisms and the more susceptible locations, can be identified in order to focus risk management efforts.

An assessment of failure probability requires the independent estimation of the three elements of PoF for each failure mechanism: exposure, mitigation, and resistance (see )

The potential consequences must also be assessed. Risk assessments can incorporate dose–response and exposure analyses into the risk evaluation by considering the possible pathways, the intensity of exposures, and the amount of time a receptor could be vulnerable. The possible effects of these, overlaid with possible receptor types and quantities, leads to consequence estimations.

A full and complete risk assessment captures all of these aspects for every portion of the system being assessed. The risk estimates produced should provide understanding and insights far beyond what can be done informally or with lesser tools.

Model scope and resolution

Assessment scope and resolution issues complicated previous risk assessment techniques. In both relative risk models and classical QRA, choices in the ranges of certain risk variables were required. The assessments of relative risk characteristics were especially sensitive to the range of possible characteristics in the pipeline systems to be assessed. If only natural gas transmission pipelines were to be assessed, then the model was not set up to capture liquid pipeline variables such as surge potential and contamination potential. The model designer had a choice of either keeping such variables and scoring them as “no threat” or he had to redistribute the weighting points to other variables that do impact the risk.

As another example, earth movements often pose a very localized threat on a relatively few stretches of pipeline. When the vast majority of a pipeline system to be evaluated is not exposed to any land movement threats, relative risk points assigned to earth movements did not help to make risk distinctions among most pipeline segments. To some, it appeared beneficial to reassign them to other variables, such as those that warrant full consideration. However, without the direct consideration for this variable, comparisons with the small portions of the system that are exposed, or future acquisitions of systems that have the threat, became problematic. Classical QRA had similar limitations since the historical data often forced the land-movement threat to be kept low, even for short segments for which it was the dominant threat. This is further discussed elsewhere in this text.

In a relative risk assessment, the ability to discriminate differences in risk was also sensitive to the characteristics of the systems to be assessed. A model that was built for parameters ranging from, say, a 40-inch, 2000-psig propane pipeline to a 1-inch, 20-psig fuel oil pipeline was not able to make many risk distinctions between a 6-inch natural gas pipeline and an 8-inch natural gas pipeline. Similarly, a model that was sensitive to differences between a pipeline at 1100 psig and one at 1200 psig might have to treat all lines above a certain pressure/diameter threshold as the same [PRMM]. Classical QRA’s had an analogous issue in determining the representative population of pipelines upon which to base the statistical future estimates.

Fortunately, such issues of model scope and resolution disappear with the advent of a physics-based approach to risk assessment. By mirroring real-world phenomena as closely as practical, the assessment automatically and appropriately responds to all changes in factors.

Historical Approaches

Pipeline Risk Modeling Options

Sidebar

Perspectives—Is Formal Risk Management Helping Me?

Ever consider that true risk management sometimes occurs only at the lower levels of some pipeline organizations? That is, personnel performing field activities are in effect setting risk levels for the company. Their choices of day-to-day activities are essentially driving risk management and thereby establishing corporate risk levels. This is not just theoretical—real choices are being made. While there are regulations and company-specific procedures to control certain actions, the on-the-ground team is often relied upon to prioritize, allocate, act, and request additional resources based solely on their perceptions.

Fortunately, we have a generally savvy work force that usually makes good choices. But why would top company executives choose to delegate company-wide risk management decision-making? In effect abdicating their own power to manage the risk of the organization?

In at least one sense, this delegation of risk management decision-making is a good thing. Those most knowledgeable in location-specific conditions/characteristics are often in the best position to make certain decisions. They are the subject matter experts in the pipeline’s often-highly-variable immediate environment.

But such distributed control also has its weaknesses. In their risk ‘assessments’, the field team may not utilize all of the available information, for example, ILI details, operational data, learnings from other pipelines, etc. They also may not use a formal structure to find and manage the non-obvious risks. Even if they do use formal techniques, without a centralized view of risks across the entire organization, imbalances are certain to occur.

So, if the alternative is not superior, then why is centralized risk management not the standard? At least one explanation lies in the perceived accuracy and usefulness of risk assessments. Some risk issues are very apparent and no formal assessment is needed to understand them. Good inspection techniques take much subjectivity out of certain resource allocations—a list of identified critical anomalies is like a ringing telephone that must be answered. The ‘fix-the-obvious’ opportunities for risk management are hopefully fully addressed in inspection follow-ups and in the day-to-day O&M. A regional approach can be very efficient in managing obvious risk issues.

However, there are other risks and risk reduction opportunities that are not so obvious. Humans can judge a thing based on a subjective and simultaneous interpretation of a handful of factors—maybe 3-5. Real risk scenarios may involve a dozen or more factors. Remember, many modern pipeline incidents are of the ‘perfect storm’ type. Rare chains of events, often involving multiple improbable and non-apparent factors, lead to the incident. This is where formality is needed. The formal risk assessment, when done properly, finds those highly improbable scenarios, involving multiple, non-intuitive, overlapping issues that can generate the perfect storm event. The previously unrecognized event is now revealed and quantified.

A Portfolio View

How can upper levels in the organization gain the risk understanding required to be fully engaged in risk management? By knowing the risk associated with every asset. The corporate-level decision-maker should seek a portfolio view of the company’s assets, showing all costs of ownership. Just as with a portfolio of stocks and bonds, each asset ties up capital and has carrying costs. The revenue streams, capital cost, the O&M costs, tax liabilities, etc, have always been well understood. The risk cost?—perhaps not as much. Most know that risk is part of the cost of ownership but how many really use that knowledge in everyday decision-making? The key lies in reliable risk assessments whose results truly represent real-world cost of ownership risks. Then, and only then, is the top level decision-maker in a position to most efficiently allocate resources across the entire organization.

So, in a moment of self-evaluation, perhaps this question arises: is your risk assessment helping you? Some may answer “sure, I get a checkmark on my regulatory audit form.” But most recognize that so much more is at stake. Beyond regulatory compliance, how much value emerges from the risk assessment effort? Some must admit that their assessments are mostly window-dressing—not really helping decision-making. Perhaps their risk assessment is only documenting what is already perceived. There is some value in such documentation. But there should also be some ‘ah-ha’ moments. After all, the whole point of a formal risk assessment is to provide the structure that can and does reveal the otherwise unknown.

Formal vs. informal risk management

PRMM discusses the transition from informal risk management to the formal processes. Some background on the maturation of the formal techniques is offered in the following section.

Scoring/Indexing models

Prior to the availability of an efficient physics-based approach to pipeline risk assessment, perhaps the most popular pipeline risk assessment technique in pipeline risk assessment efforts was the index or some similar scoring technique. Scoring systems are common in many applications, particularly where there is limited data or the information is subjective. Examples include sports and other competitive activities; finance and economics; and credit rating

In the pipeline risk assessment application of this approach, numerical values (scores) were assigned to conditions and activities on the pipeline system thought to contribute to the risk picture. This included both risk-reducing and risk-increasing factors, or variables. They were often a simple summation of numbers assigned to conditions and activities that are expected to influence risks. Whenever more risk-increasing conditions are present with fewer risk-reducing activities, risk was shown to be relatively higher. As risky conditions decrease or are offset by more risk-reduction measures, risk is relatively lower

Weightings were usually assigned to each risk variable or to groupings of factors. The relative weight reflected the importance of the item in the risk assessment and was based on statistics where available and on engineering judgment where data was not available. Pipeline sections were scored based on their attributes. The various pipe segments’ scores were then available for uses such as ranking according to their relative risk scores in order to prioritize repairs, inspections, and other risk mitigating efforts. This technique ranged from a simple one- or two-factor model (where only factors such as leak history and population density are considered) to models with dozens of factors considering numerous aspects of risk influences.

The form of these pipeline assessments was normally some variation on:

CondA +CondB +… CondN = Relative Probability of Failure (or relative Consequence of Failure)

Or sometimes:

(CondA x WeightA) + (CondB x WeightB) + … (CondN x WeightN) = Probability of Failure

Where

CondX represents some condition or factor believed to be related to risk, evaluated for a particular piece of pipeline.

WeightX represents the relative importance or weight placed on the corresponding condition or factor—more important variables have a greater impact on the perceived risk and are assigned a greater weight.

Even if the quantification of the risk factors was imperfect, the results were believed to give a reliable picture of places where risks are relatively lower (fewer “bad” factors present) and where they are relatively higher (more “bad” factors are present).

Early published works from the late 1980’s and early 1990’s in pipeline scoring type risk assessments are well documented.^{^[1]} Such scoring systems for specific pipeline operators can be traced back even further, notably in the earlier 1980’s with gas distribution companies faced with repair-replace decisions involving problematic cast iron pipe.

Variations on this type of scoring assessment were in common use by pipeline operators for many years. The choices of categorization into failure mechanisms, scale direction (higher points = higher risk or vice versa) variables, and the math used to combine factors are some of the differences among these type models.

The scoring approach was often chosen for its intuitive nature, ease of application, and ability to incorporate a wide-variety of data types. Prior to the year 2000, such models were used primarily by operators seeking more formal methods for resource allocation—how to best spend limited funds on pipeline maintenance, repair, and replacement. Risk assessment was not generally mandated and model results were seldom used for purposes beyond this resource allocation. There are of course some notable exceptions where some pipeline operators incorporated very rigorous risk assessments into their business practices, notably in Europe where such risk assessments were an offshoot of applications in other industries or already mandated by regulators.

The use of indexing/scoring methodologies came into question in the US with new regulations focusing on pipeline integrity management. The role of risk assessment expanded significantly in the early 2000’s when the DOT, OPS—now, Pipeline and Hazardous Materials Safety Administration (PHMSA)—began mandating risk analyses of all jurisdictional gas and hazardous liquid pipelines that could affect a High Consequence Area (HCA). Identified HCA segments were then scheduled for integrity assessment and application of preventative and mitigative measures depending on the integrity threats present. The entire integrity management process was intended to be risk-driven, with pipeline operators choosing risk assessment methodologies that could produce required integrity management decision-support.

The simple scoring assessments were generally not designed nor intended for use in applications where outside parties were requesting more rigorous risk assessments. Due in part to the US IMP regulations, risk assessment is now commonly used in project presentation and acceptance in public forums; legal disputes; setting design factors; addressing land use issues; etc, while previously, the assessment was typically used for internal decision support only.

Given their intended use, the earlier models did not really suffer from “limitations” since they met their design intent. They only now appear as limitations as the new uses are factored in. Those still using older scoring approaches recognized the limitations brought about by the original modeling compromises made.

In an attempt to simplify, these models actually introduced an extra and now unnecessary level of complexity. The real-world phenomena being modeled had to first be understood. Then a surrogate—the scoring process—for the actual phenomena was created and had to be maintained. The surrogate also had to keep up with a potentially evolving understanding of the underlying phenomenon.

Some of the more significant compromises arising from the use of the simple scoring type assessments included:

Without an anchor to absolute risk estimates, the assessment results were useful only in a rather small analysis space. The results offered little information regarding risk-related costs or appropriate responses to certain risk levels. Results expressed in relative numbers were useful for prioritizing and ranking but were limited in their ability to forecast real failure rates or costs of failure. They could not be readily compared to other quantified risks to judge acceptability.
Assessment inputs and results cannot be directly validated against actual occurrences of damages or other risk indicators. Even the passage of time and gaining of more experience, which normally improves past estimates, the scoring models’ inputs generally were not tracked and improved.
Results do not normally produce a time-to-failure, without which there is no technical defense for integrity assessments scheduling. Without additional analyses, the scores did not suggest appropriate timing of ILI, pressure testing, direct assessment, or other required integrity verification efforts.
Potential for masking of effects when simple expressions could not simultaneously show influences of large single contributors and accumulation of lesser contributors. An unacceptably large threat—very high chance of failure from a certain failure mechanism—could be hidden in the overall failure potential if the contributions from other failure mechanisms were very low. This was because, in some scoring models, failure likelihood only approached the highest levels when all failure modes were coincident. A very high threat from only one or two mechanisms would only appear at levels up to their pre-set cap (weighting). In actuality, only one failure mode will often dominate the real probability of failure. Similarly, in the scoring systems, mitigation was generally deemed ‘good’ only when all available mitigations were simultaneously applied. The benefit of a single, very effective mitigation measure was often lost when the maximum benefit from that measure was artificially capped. See note 1.
Some relative risk assessments were unclear as to whether they are assessing damage potential versus failure potential. For instance, the likelihood of corrosion occurring versus the likelihood of pipeline failure from corrosion is a subtle but important distinction since damage does not always result in failure.
Some previous approaches had limited modeling of interaction of variables, a requirement in some regulations. Older risk models often did not adequately represent the contribution of a variable in the context of all other variables. Simple summations would not properly integrate the interactions of some variables.
Some models forced results to parallel previous leak history—maintaining a certain percentage or weighting for corrosion leaks, third party leaks, etc.—even when such history might not be relevant for the pipeline being assessed.¹
Balancing or re-weighting was often required as models attempt to capture risk in terms that represent 100% of the threat or mitigation or other aspect. The appearance of new information or new mitigation techniques required re-balancing which in turn made comparison to previous risk assessments problematic.
Some models could only use attribute values that are bracketed into a series of ranges. This created a step change relationship between the data and risk scores. This approximation for the real relationship was sometimes problematic.
Some models allowed only mathematical addition, where other mathematical operations (multiply, divide, raise to a power, etc) would better parallel underlying engineering models and therefore better represent reality.
Simpler math did not allow orders of magnitude scales and such scales better represent real-world risks. Important event frequencies can commonly range, for example, from many times per year to less than 1 in ten million chance per year. An underlying difficulty in the calibration of any scoring type risk assessment are the limitations inherent in such methodologies. Since the scoring approaches usually make limited use of distributions and equations that truly mirror reality (see previous discussion on limitations), they will not always closely track ‘real-world’ experience. For example, a minor 1 or 2% change in a risk score may actually represent an equivalent change in absolute estimates for one threat but a 100 fold change in another threat.
Lack of transparency. a scoring system adds a layer of complexity and interferes with understanding of the basis of the risk assessment. Underlying assumptions and interactions are concealed from the casual observer and require an examination of the ‘rules’ by which inputs are made, consumed by the model, and results generated.

Note:

See cautions against the use of weightings, . The assumption of a predictable distribution of future leaks predicated on past leak history might be realistic in certain cases, especially when enough events are available and conditions and activities are constant. However, in some segments, a single failure mode will dominate the risk assessment and result in a very high probability of failure rather than only some small percentage of the total. Even if the assumed distribution is valid in the aggregate, there may be many locations along a pipeline where the pre-set distribution is not representative of the particular mechanisms at work there, leading to incorrect conclusions.

Serious practitioners always recognized these “limitations” and worked around them when more definitive applications were needed.

Classical QRA Models

Numerical techniques are required in order to obtain estimates of absolute risk values, expressed in fatalities, injuries, property damages, etc., per specific time period. The more rigorous and complex risk assessment approaches in common use in many industries are typically referred to as probabilistic risk assessment (PRA), quantitative risk assessment (QRA), or numerical risk assessment (NRA). While some recognize differences among these labels, they are often used interchangeably.

Recall the earlier discussion on Classical QRA—the statistics-driven approach to risk assessment. For discussion purposes here, currently documented methodologies labeled PRA, QRA, NRA, including their common supporting processes such as Monte Carlo simulation, Markov analyses, Bayesian statistics, and other statistics-centric approaches are treated as variations on a single technique, which we will call Classical QRA for convenience. Classical QRA will compared to a physics-based approach—the preferred approach in pipeline risk assessment—in an upcoming discussion on ‘myths’. Here, the discussion will examine this practice.

These techniques are assembled together under the premise that they all use statistics as the primary driver in understanding risk. The applicability of the oft-used supporting techniques further illustrates this point: Bayesian begins with statistics, sometimes modified by physics (a priori info). Markov links a future state with a current state, through initial state probabilities and probabilities of change. These are in contrast to an approach that begins with physics and then refines preliminary results using historical event frequencies. Both approaches benefit from the use of statistics but the primary focus is different.

Classical QRA is a technique used in the nuclear, chemical, and aerospace industries and, to some extent, in the petrochemical industry. The output of a classical QRA is usually in a form whereby its output can be directly compared to other risks such as motor vehicle fatalities or tornado damages. It can be thought of a statistical approach to the quantification of risks, emerging from numerical analyses applied to scenario structures such as event trees and fault trees (see discussion in ).

Classical QRA is a rigorous mathematical and statistical technique that relies heavily on historical failure data and event-tree/fault-tree analyses. Initiating events such as equipment failure and safety system malfunction are flow-charted forward to all possible concluding events, with probabilities being assigned to each branch along the way. Failures are backward flow-charted to all possible initiating events, again with probabilities assigned to all branches. All possible paths can then be quantified based on the branch probabilities along the way. Final accident probabilities are achieved by chaining the estimated probabilities of individual events.

This technique, when applied robustly, is usually very data intensive. It attempts to provide risk estimates of all possible future failure events based on historical experience. The more elaborate of these models are generally more costly than other risk assessments. They can be technologically more demanding to develop, require trained practitioners (statisticians), and need extensive data. A detailed classical QRA is usually the most expensive of the risk assessment techniques due to these issues.

The classical QRA methodology was first popularized through opposition to various controversial facilities, such as large chemical plants and nuclear reactors [88]. In addressing the concerns, the intent was to obtain objective assessments of risk that were grounded in indisputable rigorous analyses. The technique makes extensive use of failure statistics of components as foundations for estimates of future failure probabilities.

However, it was also recognized that statistics paints an incomplete picture at best, and many probabilities must still be based on expert judgment. In attempts to minimize subjectivity, applications of this technique became increasingly comprehensive and complex, requiring thousands of probability estimates and like numbers of pages to document. Nevertheless, variation in probability estimates remains, and the complexity and cost of this method does not seem to yield commensurate increases in accuracy or applicability. In addition to sometimes widely differing results from “duplicate” classical QRAs performed on the same system by different evaluators, another criticism includes the perception that underlying assumptions and input data can easily be adjusted to achieve some predetermined result [88]. Of course, this latter criticism can be applied to any process involving much uncertainty and the need for assumptions.

Myths

While the practice of formal pipeline risk assessment has been on-going for many years, the practice is by no means mature (as of this writing). There still exist some common misconceptions and myths. This is not unexpected, given the difficult nature of risk concepts themselves and the absence of detailed guidance documents (prior to this textbook).

Myth1: Some risk assessment models are better able to accommodate low data availability

Reality:

Strong data + strong model = most meaningful/useful results

Weak data + strong model = uncertain results

Weak data + weak model = meaningless results

First, let’s address the myth that low information suggests the use of a simple risk assessment—one that does not really quantify risk. Using a lesser risk assessment process in an attempt to compensate for low information is an error. Pairing weak data with a weak model generates nothing useful. The proper approach is to begin with a full risk assessment structure, make conservative assumptions where necessary, and then work on ‘back-filling’ the data that will ultimately drive the risk management.

So, we should use a robust risk assessment, regardless of the current data availability. There are two choices. Let’s compare how the statistics-based and the physics-based approaches in solving a typical risk assessment problem: how often will a specific segment of pipeline experience failure from outside excavator force (third party damage)?

Statistics-centric Approach:

In this approach, we focus on historical event frequencies. Let’s say that a slice of the national pipeline incident database shows that US transmission averages show 0.0003 reportable third party damage incidents per mile per year. With some investigation, we can get averages for ranges of pipe diameters, product type, or other characteristics should we believe that they are discriminating factors for third party damages. We can assume that some of the historical ‘unknown’ causes of failure (a significant proportion of the data) were also third party damage related. We can further assume that the entire population of third party failures is higher than the reportable-only count. At the end of this exercise, we have a decent estimate of a historical failure rate for an ‘average’ pipe segment.

Physics Approach:

In this approach, we focus on the physical phenomena that influence pipeline failure potential. We first make a series of estimates that show the individual contributions from exposure, mitigation, and resistance. For exposure, we ask ‘how often is there likely to be an excavator working near this pipeline?’ We perhaps examine records in planning and permitting departments; take note of nearby utilities, ditches, waterways, public works, etc that require routine excavation maintenance; and tap into other sources of information. Then we estimate the role of mitigation measures as applied to this particular segment of pipe. We ask: “what fraction of those excavators will have sufficient reach to damage the pipe (suggesting the benefit of cover depth)?” “What fraction of excavators will halt their progress due to one-call system use, recognition from signs, markers, and briefings?” “What fraction will halt their work due to intervention by pipeline patrol?” and others.

Finally, we discriminate among the fraction of excavation scenarios with sufficient force potential to puncture the pipe, based on pipe characteristics and the types of forces likely to be applied. This tells us the resistance—how often is there damage, but not failure? This discrimination between damage likelihood and failure likelihood is essential to our understanding.

All of these estimates can come from simple reasoning, at one extreme, to literature searches, market analyses, database mining, finite element analyses, and scenario analyses, at the other extreme. The level of effort should be proportional to the perceived contribution of the issue to the total risk picture.

Approach Comparisons:

Both of these approaches have merit and yield useful insight. But, only the latter provides the location-specific insights we need to truly manage risk. The statistics-only approach yields an average value, suggesting how a population of pipeline segments may behave over time. There are huge differences among all the pipeline segments that go into a summary statistic. Therefore, we cannot base risk management on such a summary value derived from generic historical data. Risk, and hence ‘risk management’, ultimately occurs at very specific locations, whose risk may be vastly different from the population average. Stated even more emphatically: “using averages will always result in missing the ‘generally rare but critical at this location’ evidence”. For example, most pipelines are not threatened by landslide, but in the few locations where they are, this apparently rare threat may well dominate the risk.

So, we use the physics-based approach to drive risk management. Using the statistics-based approach is very useful in calibrating risk estimates from populations of pipe segments. More about that in a later section.

Myth 2: QRA requires vast amounts of incident histories Reality:

QRA ‘requires’ no more data than other techniques

All assessments work better with better information

This is related to Myth1 but merits a bit of independent discussion. Some classical QRA does over-emphasize history, as noted in the discussion of statistician-designed risk assessment. Excessive reliance on history is an error in any methodology. The past is a relevant predictor of the future only in certain cases, as is also detailed elsewhere.

Choosing a risk assessment approach

Understanding the differences between tools and assessment models, as well as the strengths and weaknesses of the different risk assessment techniques is important in choosing approaches. A case can be made for using some techniques in certain situations. For example, a simple bowtie analysis approach helps to organize thinking and is a first step towards formal risk assessment. If the need is to evaluate specific events at any point in time—for example, an incident investigation—a narrowly focused scenario risk analysis (event tree or fault tree) might be the tool of choice.

Scoring or ranking type pipeline risk assessments have served the pipeline industry for many years. However, risk assessments are being routinely used today in ways that were not common even a few years ago. For example, many operators are asking questions today such as:

How to make full use of inspection data in a risk assessment
How to generate results that directly suggest timing of integrity assessments
How to quantify the risk reduction benefit of integrity assessment and other mitigations.
Beyond the prioritization, how big is the risk? Is it actionable?
How widespread is a particular risk issue?
How can subjectivity be reduced?
How to use past incident results in a risk assessment.

These questions are not consistently nor accurately answered by the relative models. As previously noted, these models were designed and created under a different set of questions.

Similarly, classical ORA techniques have been in use in some industries for decades. But, especially in the pipeline industry with so much variation between and within collections of pipeline segments, these solutions are sub-optimal (see previous discussion).

The new roles of risk assessments have prompted some changes to the way risk algorithms are being designed. The changes lead to more robust risk results that better reflect reality and, fortunately, are readily obtained from data that was also used in previous assessments.

New Generation Risk Assessment Algorithms

The focus of this book is on a comprehensive risk assessment methodology that is both robust and cost-effective to establish and maintain.

While the previous generation of relative algorithms served the industry well, the technical compromises made can be troublesome or unacceptable in today’s environment of increasing regulatory and public oversight. Risk assessments commonly become the centerpiece of any legal, regulatory, or public proceedings. This prompts the use of assessment techniques that more accurately model reality and also produce risk estimates that are anchored in absolute terms: “consequences per mile year,” for example. Fortunately, a new approach to algorithm design can do this while making use of all previously collected data and not increasing the costs of risk assessment. The advantages of the new algorithms are that they can overcome the previously noted limitations in both competing methodologies:

More intuitive;
Better models reality;
Removes much subjectivity;
Eliminates masking of significant effects;
Makes more complete and more appropriate use of all available and relevant data;
Greatly enhances ability to demonstrate compliance with U.S. IMP regulations;
Distinguishes between unmitigated exposure to a threat, mitigation effectiveness, and system resistance—this leads directly to better risk management decisions;
Eliminates need for unrealistic and expensive re-weighting of variables for new technologies, emerging/previously-unknown threats, or other changes; and
Flexibility to present results in either absolute (probabilistic) terms or relative terms, depending on the user’s needs.

The breakthrough evolution in overcoming both the scoring system limitations and the classical QRA limitations was the dissection of PoF phenomena into three separately measurable components. This allowed for a physics-based rather than statistics-based approach. It simultaneously ends the need for a secondary scoring system and the need for (often misleading) reliance on historical event rates—generic data.

The risk modeling approach recommended here falls into several common category labels of models. Generally, this methodology is a type of quantitative model since it numerically quantifies risks in a rigorous way (not a simple numerical scoring approach). It is a deterministic (or ‘mechanistic’) model, since it is a mathematical representation of physical processes that has been constructed from the modeler’s understanding of the science underlying the processes. It has probabilistic components since the real-world processes it mirrors are best represented by probabilities. It expresses results in absolute terms.

In this book, the term ‘physics-based’ is chosen to classify the new generation of risk assessment algorithms. Since physics as a science includes mechanics, energy, force, and even chemistry to some extent, it captures the fact that this type of risk assessment relies on such underlying science. This hopefully carries a connotation beyond terms like ‘mechanistic’ or ‘deterministic’, that the methodology is based on widely-accepted first principles of science and engineering.

Risk Assessment Specific to Pipelines

The recommended practice—a new generation of risk assessment algorithms classified as physics-based algorithmsis the result of years of development efforts^{^[2]}. Distinguishing characteristics of this risk assessment methodology from other past and current approaches are worth examining. Especially for the practicing risk assessor, some evidence will be sought that justifies a change to his/her current practice.

The primary distinguishing characteristic of these physics-based algorithms is the breakdown of PoF. This is an essential aspect of risk assessment that is not clearly and completely employed in any alternative methodology. It is a differentiating characteristic compared to all other methodologies. It is a critically important aspect of modern pipeline risk assessmenthere and elsewhere in this book.

Additional discriminating features of this recommended methodology, compared to alternative approaches, include the following:

Differences from classical QRA:

Profiles—ie, the ability to pass the ‘map point’ test of risk assessment sufficiency, described earlier.
Reduced reliance on generic historical event frequencies.
Directly integrates more relevant, location-specific information.

Differences from Indexing/Scoring Approaches:

Only verifiable measurements are used
Mathematics to fully represent real-world phenomena
Data-driven segmentation—full resolution
Improved use of information
More transparent—no need for protocols to assign point values.

Differences from HAZOPS/SIL/LOPA/FMEA

Profiles
Measurements instead of categories
Improved use of information
Aggregations for summarizing risk.

Differences from reliability based design (RBD)

RBD typically relies excessively on classical QRA, see list above.

Note that comparisons are not offered to other techniques that are more appropriately labeled as tools rather than risk assessments. This includes event/fault tree analyses, matrices, checklists, bowtie, dose-response assessments probit equations, dispersion modeling, hazard zone estimations, human reliability analyses, task-based assessments, what-if analyses, Markov analyses, Bayesian statistical analyses; root cause analyses; Delphi technique.

The criteria discriminating tools from complete risk assessments is detailed earlier in this chapter. Comparisons of the modern approach to selected techniques more often labeled as risk assessments, include differences in recommended methodology compared to PHA, HAZOPS, Matrix, Event Tree / Fault Tree Analyses:

Ability to broadcast a risk assessment over long, complex systems
Profiles
Aggregations for summarizing risk
Only verifiable measurements are used
Improved use of information.

Most alternative methodologies suffer from an inability to create a risk profile—changes in risk along a pipeline route. While a profile can also mean changes over time, the risk changes along a route is often the limiting factor of competing risk assessment techniques. Techniques that rely on specific cause-consequence pairings without an ability to aggregate all such pairings, cannot produce a complete profile and therefore cannot present an accurate risk picture. Without a risk profile, understanding and hence, optimum risk management, is compromised.

Modeling of Pipeline Risk

Quality, Reliability, and risk management

Risk management embodies and overlaps principles of quality assurance, quality control, and reliability. An interesting background discussion on these concepts and their relationship to risk can be found in PRMM.

Risk assessment issues

Quantitative vs. qualitative models

Modeling as a part of the scientific method is discussed in PRMM. As noted, labeling of modeling approaches has caused some confusion. Terms including quantitative, qualitative, semi-quantitative, scoring, indexing, and others have been used to describe types of pipeline risk assessment. There are no standard definitions in common use for these terms. Therefore, they often carry different meanings and different implications to various members of an audience hearing them.

The advice here is to always obtain clarifications when faced with this terminology.

Additional labels of probabilistic, mechanistic, and deterministic are also sometimes seen. These have more standardized definitions but can still cause confusion.

The risk modeling approach recommended here falls into several common categories of models. Generally, this methodology is a type of quantitative model since it numerically quantifies risks in a rigorous way (not a simple numerical scoring approach). It is a deterministic (or `mechanistic’) model, since it is a mathematical representation of physical processes that has been constructed from the modeler’s understanding of the science underlying the processes. It has probabilistic components since the real-world processes it mirrors are best represented by probabilities. It expresses results in absolute terms.

Absolute vs. relative risks

Closely paralleling the quantitative vs qualitative distinction is the issue of risk presented in absolute vs relative terms. Unlike the previous discussion highlighting potential confusion arising from terminology, the absolute vs relative adds clarity—giving strong indications to any audience of what type of assessment has been performed.

Risks can be expressed in absolute terms—risk estimates expressed in fatalities, injuries, property damages, or some other measure of consequence, in a certain time period and for a specific collection of components such as a pipeline system. For example, “number of fatalities per mile-year for permanent residents within one-half mile of pipeline…”. This requires concepts commonly seen in probabilistic risk assessments (PRAs), also called numerical risk assessments (NRAs) or quantitative risk assessments (QRAs), or deterministic or mechanistic models. Absolute risk assessment generates a frequency-based measure that estimates the probability of a specific type of failure consequence at a point in time and space. Also available is the use of a relative risk assessment methodology, whereby risks can make comparisons among components that have undergone the same assessment. Common relative risk measurement systems have been called scoring or indexing systems. Ref [PRMM] presented such a system. The relative risk measurement models are no longer recommended since they have many limitations and no advantages over a properly crafted absolute risk assessment.

The term ‘absolute’ should not be construed to indicate a level of certainty. It only speaks to the units with which risk estimates are produced.

The “absolute scale” offers the benefit of comparability with other types of risks and a more accurate representation of actual risk, while the “relative scale” was historically used as a compromise solution to avoid what was previously considered a challenging quantification of pipeline risks.

The absolute scale previously suffered from its heavy reliance on historical estimates. This criticism has been mitigated by the methodology presented here.

The absolute and relative risk assessment scales are not mutually exclusive. The absolute scale can be readily converted to relative scales by simple mathematical relationships, should this be deemed worthwhile. For instance, 1.6E-7 failures per year can be normalized to, say 15, on a 100 point PoF scale. A relative risk scale can theoretically be converted to an absolute scale by correlating relative risk scores with appropriate historical failure rates or other risk estimates expressed in absolute terms. In other words, the relative scale can be calibrated to some absolute numbers. This is often a problematic exercise given the mathematical and other limitations commonly associated with the relative risk models. For instance, orders of magnitude differences in real risk are difficult to show on simple point scale. However, when much information has been collected into an older relative risk assessment, that information can be salvaged and efficiently used in a migration to a modern absolute risk assessment. See .

A possible consideration underlying the presentation of any numerical modeling result is a common misconception that a precise-looking number, expressed in scientific notation, is more accurate than a simple number. A numerical scale can imply a precision that is simply not available. This effect has been called ‘the illusion of knowledge’.

A good risk assessment will require the generation of sufficient scenarios to represent all possible event sequences that lead to possible damage states (consequences). Each event in each sequence is assigned a probability—actually, an expected future frequency. The assigned probabilities are best assigned in absolute terms, leading to final risk estimates that are also in absolute units, for example: leaks per mile-year, dollars per km-year, fatalities per year, etc. The expression in absolute terms widens the uses of risk results and avoids the complications of the relative scales.

A damage state or consequence level of interest is identified and becomes part of the measurement units in an absolute estimate of risk. Most risk acceptability or tolerability criteria are based on fatalities as the consequence of interest.

Verification, Calibration, and Validation

Given enough time, a risk assessment can be proven by comparing predicted pipeline failures against actual. This is the basis of the testing of the risk assessment as a diagnostic tool, as discussed elsewhere. Pipeline failures on any specific system are usually not frequent enough to provide sample sizes sufficient to test the assessment performance. In most cases, initial examination of the assessment is best done by ensuring that risk estimates are consistent with all available information (including actual pipeline failures and near-failures) and consistent with the experiences and judgments of the most knowledgeable experts. The latter can be at least partially tested via structured testing sessions and/or model sensitivity analyses (discussed in and ). Additionally, the output of a risk model can be carefully examined for the behavior of the risk values compared with our knowledge of behavior of numbers in general.

More formal examinations of the risk assessment are also possible. The processes of verification, calibration, and validation are likely not familiar to most readers and, based on a brief literature search, are not even standardized among those who more routinely deal with them. Some background discussion to these processes, especially as they relate to pipeline risk management, are warranted.

In this text, a distinction is made between verification, validation, and calibration. Verification is the process of ‘de-bugging’ a model—ensuring that functions operate as intended. Calibration is tuning model output so that it mirrors actual event frequencies. This is a practical necessity when knowledge of underlying factors is incomplete (as it almost always is in natural systems). Validation is ensuring consistent and believable output from the model by comparing model prediction with actual observation. Defining these terms in the context of this discussion is important since they seem to have no universally accepted definitions.

An important aspect of proving a risk assessment is agreement with SME beliefs. Users should be vigilant against becoming too confident in using any risk assessment output without initial and periodic ‘reality checks’. But users should also recognize that SME beliefs can be wrong. Disconnects between risk assessment results and SME beliefs are opportunities for both to improve, as is discussed in .

Note also that the conclusions of any risk assessment can be no stronger than the inputs used. Especially when confidence in inputs is low, calibration to a judged performance is warranted.

Verification

Especially where software is used, verification ensures that the model has been programmed correctly and is, to the extent tested, error-free (no bugs). In a pre-acceptance review of a risk assessment, confirmation of calculations should be performed. Therefore, verification—checks to ensure that intended results are produced by the risk algorithms—confirms that the intended routines are functioning properly.

To ensure that all equations and point assignments are working as intended, some tools can be developed to produce test results using random or extreme value inputs.

Calibration

Risk assessment should be performed on individual pipe segments due to the changes along a pipeline route. These individual risk estimates can be combined (into ‘populations’) and compared to the known behavior of similar populations. For a variety of reasons, discrepancies in predicted population behavior will usually exist. Calibration serves to rectify the inappropriate discrepancies by adjusting the individual estimates en masse so that credible population characteristics emerge.

The process of calibrating risk assessment results begins with establishing plausible future leak rates of populations based on relevant historical experience, adjusted for relevance and other considerations. These rates become ‘targets’ for risk assessment outputs, with the belief that large populations of pipeline segments, over long periods of time, would have their overall failure estimates approach these targets. The risk assessment model is then adjusted so that its outputs do indeed approximate the target values for behavior of populations.

The choice of representative population is challenging. It is difficult to find a collection of components similar enough to the system being assessed and with a long enough history to make comparisons relevant. A selection of a population that is not sufficiently representative will weaken the calibration process.

Calibration is done using both a representative population and a target level of conservatism. Both are required as illustrated by this thought exercise. Imagine you could run experiments on real pipelines over long periods of time. Say you chose a 70 mile pipeline operating for 50 years. You would run multiple, maybe hundreds or thousands, of trials to see how the 70 mile pipeline performs over many different 50 year lifetimes. Each trial—that is, each 50 year lifetime—is influenced by random influences of exposures, mitigations, resistance, and consequence scenarios over its 50 years in service. In some of those lifetimes, there would be no incidents, so no actual consequences at all. Choosing these trial results as representative of future behavior of the next trial might reflect a P10 level of conservatism. Some of your multiple trials would result in dozens of leaks and ruptures, some producing very consequential results. Using this set of trials to represent future behavior would be choosing a P90 or so level of conservatism. The results of the majority of your trials would form the P50 portion of the distribution of all results, perhaps the center point of a normal or bell-shaped distribution.

With an appropriate comparison population, the chief goal of a calibration will often be the removal of unwanted conservatism. As discussed, conservatism plays an important and useful role in risk assessment for individual components. P90+ inputs are recommended for many initial risk assessments. However, the need for estimates as close as possible to actual risk levels is also important, especially for populations—collections of individuals. A decision-maker gains more insight from a P50 type risk assessment of a pipeline system than a system summary incorporating multiple P90+ inputs. The P50 estimate can become a part of company-wide strategic planning while the P90+ estimates ensure proper attention to risk management for each component.

In a simple calibration exercise, we seek a single factor representing the amount of conservatism included in a risk assessment’s P90+ estimates. This factor can be used to reduce the conservative estimates of each component’s risk to best-estimates of risk. The resulting collection of ‘best estimates’ should be close to the representative population’s historical risk levels.

One can track differences between P50 and P99 to see, at least partially, reduction in uncertainty. P50 and P90+ have both natural variability (apparent randomness) and uncertainty. Each PXX produces a distribution. The difference between, for instance, the midpoint of the P50 and P90+ distributions can be called the conservatism bias multiple.

Both P50 and P90+ risk assessments will often be needed—the former to represent likely system wide behavior and the latter to use in risk management. Some practitioners choose to run parallel P50 and P90+ assessments. Others perform the P90+ assessment, estimate the conservatism bias, and then use it to ‘back calculate’ P50 results.

Once calibrated, estimates could represent a wide range of possibilities. For example, a US natural gas transmission pipeline may have components with P50 PoF estimates from perhaps 0.00001 to 0.1 reportable events per mile-year, assuming that segments’ actual PoF’s could range from about 100 times higher or lower than the US average for reportable incidents on natural gas pipelines.

A similar process can be performed on overall risk values or any intermediate calculations. More calibration—calibrating to lower level algorithms—should produce more confidence in the overall correlation. This essentially provides more intermediate correlating points from which a correlation curve can be better developed.

Validation

Validation of a model is achieved by ensuring that appropriate relationships exist among input data and that produced outputs are representative of real-world experience. Validation seeks to authenticate or verify that the model produces risk estimates that are accurate.

While pipeline industry documents do not generally detail these processes, examples of how the pipeline industry typically uses the term ‘validation’ are noted in PHMSA and PRCI documents:

US Gas IMP Protocol C.04

Verify that the validation process includes a check that the risk results are logical and consistent with the operator’s and other industry experience. [§192.917(c) and ASME B31.8S-2004, Section 5.12] (http://primis.phmsa.dot.gov/gasimp/QstHome.gim?qst=145)

From PRCI, discussing validation of a risk-based model for pipelines:

The fault tree model and basic event probabilities were validated by analyzing a representative cross-country gas transmission pipeline and confirming that the results are in general agreement with relevant historical information.

Validation of risk assessment is also noted in US IMP documents.

ASME B31.8s

“…experience-based reviews should validate risk assessment output with other relevant factors not included in the process, the impact of assumptions, or the potential risk variability caused by missing or estimated data.”

As a part of the validation effort, the general relationship between model output and reality should be examined. When new or altered theories are proposed as part of a model, examination of those must be included in the validation process.

Theories applicable to pipeline risk assessment typically include:

Metallic corrosion
Mitigation of metallic corrosion—coatings and cathodic protection
Stresses in a shell structure (pipe)
Effect of wall loss on pressure-containing capability
Component rupture potential
Probability theory
Probability distributions as applied to observed phenomena
Structural theory
Materials science
Plastics and coatings performance.

The risk assessment methodology described in this book does not propose new theories of failure mechanisms. It relies upon thoroughly documented models of the above theories including widely accepted beliefs about impacts of certain factors on certain aspects of risk; for example, ‘increases in Factor X lead to increased risk’.

SME Validation

Similar to the use of a benchmark for model calibration, a carefully structured interview with SME’s can also identify model weaknesses (and also often be a learning experience for SME’s). If an SME reaches a risk conclusion that is different from the risk assessment results, a drill down (for example, a deeper examination) into both the model and the SME’s basis of belief should be done. Any disconnect between the two represents either a model error or an inappropriate conclusion by the SME. Either can be readily corrected. The key is to identify exactly where the model and the SME first diverge in their assumptions and/or conclusions.

An important step in validation is therefore to identify and correct ‘disconnects’ between subject matter experts’ beliefs and model outputs. This is similar to calibration discussed previously but differs in that validation should occur after calibration has been done. In the absence of calibration of risk results, validation can still be performed on intermediate calculations but the role of conservatism must be factored in. For relative, scoring models, validation can only be done in general terms, where SME’s can agree in relative changes to risk accompanying certain changes in inputs.

SME concurrence with assessment outputs should be a part of model validation. Risk assessment-identified higher—and lower—risk segments should comport with SME-identified higher— and lower—risk segments.

SME review should include concurrence with aspects such as:

Direction and magnitude of risk changes accompanying changes in factors and groups of factors
identified locations of higher- and lower-threats, considering each threat independently
identified locations of higher- and lower-consequences.

A good objective of risk assessment should be to have the risk assessment model capture the collective knowledge of the organization—anything that anyone knows about a pipeline’s condition or environment, or any new knowledge of how risk variables actually behave and interact, can and should be included in the analysis protocol.

Predictive Capability

Implicit in the notions of validation and verification is the idea of predictive capability. A good risk assessment always produces some estimate of failure probability. Theoretically, this can forecast, to some degree of accuracy, future failures on specific pipeline segments. Except in extreme cases, this is not a realistic expectation. A more realistic expectation is for the assessment to forecast behavior of populations of segments rather than individuals. A good risk assessment will, however, highlight areas where probability and consequence combinations warrant special attention.

Leak/break rate is related to estimated failure probability. In most transmission pipelines, insufficient system-specific information exists to build a meaningful prediction model solely from leak/break rate—events are so rare that any such prediction will have very large uncertainty bounds. Distribution systems, where leaks are precursors to “failures,” are often more viable candidates for producing predictions directly from leak/break rates.

A leak/break rate assessment may show both time-dependent failure mechanisms such as corrosion and fatigue and more random failure mechanisms such as third-party damages and seismic events. The random events will normally occur at a relatively constant rate over time for a constant set of conditions.

A leak/break rate is called a “deterioration” rate by some, but that phrase seems to be best applied specifically to time-dependent failure mechanisms only (corrosion and fatigue).

Even though they are commonly expressed as a single value, each failure probability estimate really represents an underlying distribution—all possible failure rates with associated probability of occurrence—with an average, median, and standard deviation. This distribution describes the range of failure rates that would accompany any pipeline section with a particular predicted failure rate.

Nonetheless, to test the predictive power of the risk assessment model, the incident and inspection history in recent years could be examined. Knowing what the risk assessment ‘thought’ about the risk on the day before the incident (or the day before an inspection) would provide insight into the predictive power of the assessment. Given the role of probability, spot samples from individual segments may appear to show inaccurate predictions, but actual accuracy can only be verified after sufficient data has been accumulated to compare the predicted versus actual long term behavior of a large population.

Evaluating a risk assessment technique

Note: Locating this discussion in this book was challenging. On one hand, a reader is often not terribly interested in this aspect until he is an active practitioner. On the other hand, a reader who has an existing risk assessment approach may need early incentivization to investigate alternative approaches. This latter rationale has obviously determined the issue for purposes of organizing this book. The early discussion has a further advantage of setting the stage—arming the reader with criteria that will later determine the quality of his assessments, even as he works his way through this text to learn about pipeline risk assessment.

In general, proving or confirming a risk assessment methodology addresses the extent to which the underlying model represents and correctly reproduces the actual system being modeled. Another view is that validation involves two main aspects:

1) ensuring that the model correctly uses its inputs and

2) model produces outputs that are useful representations of the underlying real-world processes being modeled.

Ref [1046] focuses on the need for transparency in any risk assessment:

Transparency provides explicitness in the risk assessment process. It ensures that any reader understands all the steps, logic, key assumptions, limitations, and decisions in the risk assessment, and comprehends the supporting rationale that lead to the outcome. Transparency achieves full disclosure in terms of:

the assessment approach employed
the use of assumptions and their impact on the assessment
the use of extrapolations and their impact on the assessment
the use of models vs. measurements and their impact on the assessment
plausible alternatives and the choices made among those alternatives
the impacts of one choice vs. another on the assessment
significant data gaps and their implications for the assessment
the scientific conclusions identified separately from default assumptions and policy calls
the major risk conclusions and the assessor’s confidence and uncertainties in them;
the relative strength of each risk assessment component and its impact on the overall assessment (e.g., the case for the agent posing a hazard is strong, but the overall assessment of risk is weak because the case for exposure is weak)

Process transparent and the risk characterization products clear, consistent and reasonable” (TCCR) became the underlying principle for a good risk characterization. [1046]

To properly support risk management, the superior risk assessment process will have additional characteristics, including:

QA/QC and error checking capabilities, perhaps automated
Ability to rapidly integrate new information and refresh risk estimates
Be able to rapidly incorporate new information on emerging threats, new mitigation opportunities, or any other changing aspect of risk.
Seamless integration with other databases and legacy data systems
Accessible, understandable to all decision-makers.

Diagnostic tool—Operator Characteristic Curve

For those seeking a more structured approach to proving a risk assessment, techniques are available. A pipeline risk assessment is really a diagnostic tool. Similar to a diagnostic test used by a doctor, the idea is to determine, with the least amount of cost and patient discomfort, whether the patient has the disease or doesn’t. He knows that in any population, a certain fraction of individuals will have the disease and most won’t. For a diagnosis to be successful, he must correctly determine into which group to place his patient. In making this determination, the doctor can choose a whole battery of expensive and intrusive tests and procedures in order to have the highest confidence of his diagnosis. On the other hand, he can choose minimal tests and accept a higher error rate in diagnoses. The most accurate test or set of tests will minimize the rate of false positives and false negatives. But there is a cost associated with such testing.

In the case of pipeline risk management, the manager is trying to determine which pipeline segments and components have the ‘disease’ of higher risk among the hopefully many which do not. His choice of tests to help in the diagnosis goes beyond the risk assessment itself. He can request surveys and inspections to improve the diagnosis, but with an accompanying expense and the potential for inefficient use of resources. The latter occurs when expensive ‘tests’ do not add much certainty to the assessment.

Both the doctor and the risk manager will be balancing the costs of the diagnostics and the costs of being wrong—false positives and false negatives.

Risk assessment as a diagnostic tool; trade-offs among true positives, true negative, false positives, false negatives (TP, TN, FP, FN)

Possible Outcomes from a Diagnosis

The tuner of a leak detection system is well aware of the false alarm phenomena. In order to find smaller leaks, it is necessary to alarm and investigate apparent smaller leaks that later prove to be only transient conditions. After too many false alarms, the investigators grow weary of responding and are less attentive, thereby increasing their error rate when an actual leak does appear. It is standard to sacrifice some leak detectability in order to avoid too many nuisance alarms. A modern leak detection system will state a probability associated with the indication, to assist the investigator in setting his response urgency.

Advanced applications of these ideas is found in signal detection theory, receiver operating characteristic curves, artificial intelligence (machine learning), and others. For our purposes here, it is useful to simply bear in mind the diagnostic intent behind a risk assessment and the corresponding ability to test its diagnostic power over time.

Risk model performance

Some sophisticated routines can be used to evaluate risk assessment outputs. For instance, a Monte Carlo simulation uses random numbers as part of the assessment inputs in order to produce distributions of all possible outputs from a set of risk algorithms. The resulting distribution of risk estimates might help evaluate the “fairness” of the assessment. In many cases a normal, or bell-shaped, distribution would be expected since this is a very common distribution of properties of materials and engineered structures as well as many naturally occurring characteristics. Alternative distributions are also common, such as those often used to represent rare events. All distributions that emerge should be explainable. If some implausible distribution appears, further examination may be warranted. For instance, excessive tails or gaps in the distributions might indicate discontinuities or biases in the results being generated.

Sensitivity analysis

The algorithms that underlie a risk assessment model must react appropriately—neither too much nor too little—to changes in any and all variables. In the absence of reliable data, this appropriate reaction is gauged to a large extent by expert judgment as to how the real-world risk is really impacted by a variable change.

A single variable can play a role as both risk increaser and risk reducer. A casing protects a pipe segment from external force damage but complicates corrosion control; in the offshore environment, water depth is a risk reducer when it makes anchoring damage less likely but it is a risk increaser when it heightens the chance for buckling. So the same variable, water depth, is a “good” thing in one part of the model and a “bad” thing somewhere else.

See discussion of data collection in for a deeper examination into types and roles of information

Some variables such as pressure and population density impact both the probability (often linked to lower resistance and higher activity levels) and consequence (larger hazard zone and more receptor damage) sides of the risk algorithm. In these cases, the impact on overall risk is not always obvious. When a variable is used in a more complex mathematical relationship, such as those sometimes used in resistance estimates, then influences of changes on final risk estimates will also not be apparent.

Sensitivity quantifications can be utilized for evaluating effects of changing factorsbut require fairly sophisticated analyses procedures. It is important to recognize that many variables will usually play lessor roles in overall risk but may occasionally be the single greatest determinant of risk.

Weightings

The use of ‘weightings’ should be a target of critical review of any risk assessment practice. Weightings have been used in some older risk assessments to give more importance to certain factors. They were usually based on a factor’s perceived importance in the majority of historical pipeline failure scenarios. For instance, the potential for AC induced corrosion is usually very low for many kilometers of pipeline, so assigning a low numerical weighting appeared appropriate for that phenomenon. This was intended to show that AC induced corrosion is a rare threat.

Used in this way, weightings steer risk assessment results towards pre-determined outcomes. Implicit in this use is the assumption of a predictable distribution of future incidents and, most often, an accompanying assumption that the future distribution will closely track the past distribution. This practice introduces a bias that will almost always lead to very wrong conclusions for some pipeline segments.

The first problem with the use of weightings is finding a representative basis for the weightings. Weightings were usually based on historical incident statistics—“20% of pipeline failures from external corrosion”; “30% from third party damage”; etc. These statistics were usually derived from experience with many kilometers of pipelines over many years of operation. However, different sets of pipeline kilometer-years shows different experience. Which past experience best represents the pipeline being assessed? What about changes in maintenance, inspection, and operation over time? Shouldn’t those influence which data sets are most representative to future expectations?

It is difficult if not impossible to know what set of historical population behavior best represents the future behavior of the segments undergoing the current risk assessment. If weightings are based on, say, average country-wide history, the non-average behavior of many miles of pipeline is discounted. Using national statistics means including many pipelines with vastly different characteristics from the system you are assessing.

If the weightings are based on a specific operator’s experience, then (hopefully) only a very limited amount of failure data is available. Statistics using small data sets is always problematic. Furthermore, a specific pipeline’s accident experience will probably change with the operator’s changing risk management focus. When an operator experiences many corrosion failures, he will presumably take actions to specifically reduce corrosion potential. Over time, a different mechanism should then become the chief failure cause. So, the weightings would need to change periodically and would always lag behind actual experience, therefore having no predictive contribution to risk management.

The bigger issue with the use of weightings is the underlying assumption that the past behavior of a large population will reliably predict the future of an individual. Even if an assumed distribution is valid for the long term population behavior, there will be many locations along a pipeline where the pre-set distribution is not representative of the particular mechanisms at work there. In fact, the weightings can fully obscure the true threat. The weighted modeling of risk may fail to highlight the most important threats when certain numerical values are kept artificially low, making them virtually unnoticeable.

The use of weightings as a significant source of inappropriate bias in risk assessment is readily demonstrated. One can easily envision numerous scenarios where, in some segments, a single failure mode should dominate the risk assessment and result in a very high probability of failure rather than only some percentage of the total.

Consider threats such as landslides, erosion, or subsidence as a class of failure mechanisms called geohazards. An assumed distribution of all failure mechanisms will almost certainly assign a very low weighting to this class since most pipelines are not significantly threatened by the phenomena and, hence, incidents are rare. For example, to match a historical record that shows 30% of pipeline incidents are caused by corrosion and 2% by geohazards, weightings might have been used to make corrosion point totals 15 times higher than geohazard point totals (assuming more points means higher risk) in an older scoring methodology.

But a geohazard phenomenon is a very localized and very significant threat for some pipelines. It will dominate all other threats in some segments. Assigning a 2% weighting masks the reality that, perhaps 90% of the failure probability on this segment is due to geohazards. So, while the assumed distribution may be valid on average, there will be locations along some pipelines where the pre-set distribution is very wrong. It would not at all be representative of the dominant failure mechanism at work there. The weightings will often completely mask the real threat at such locations.

This is a classic difficulty in moving between behaviors of statistical populations and individual behaviors. The former is often a reliable predictor—hence the success of the insurance actuarial analyses—but the latter is not.

In addition to masking location-specific failure potential, use of weightings can force only the higher weighted threats to be perceived ‘drivers’ of risk, at all points along all pipelines. This is rarely realistic. Risk management can become driven solely by the pre-set weightings rather than actual data and conditions along the pipelines. Forcing risk assessment results to resemble a pre-determined incident history will almost certainly create errors.

Since weightings can obscure the real risks and interfere with risk management, their use should be discontinued. Using actual measurements of risk factors avoids the incentive to apply artificial weightings (see previous column on the need for measurements). Therefore, migration away from older scoring or indexing approaches to a modern risk assessment methodology will automatically avoid the misstep of weightings.

Diagnosing Disconnects Between Results and ‘Reality’

PRMM provides a useful discussion of types of disconnects between reality and assessed results that may arise in a risk evaluation. Disconnects discussed there include those that may emerge from:

New inspection results, including visual inspections
Incident investigations, including root cause analyses
Leak history analyses
Populations vs individuals disconnects.

An important step in validation is to identify and correct ‘disconnects’ between sources such as subject matter experts’ beliefs and risk assessment outputs. Two types of potential disconnects should be explored. The first is comparisons of populations—the behavior of an assessed collection of components (for example, a pipeline system) with a representative population of similar components (other pipeline systems). The representative population will be called a benchmark for purposes here. Common benchmarks include average incident rates for many km of pipelines over several years, often country-wide (for example, US, Canadian, European, etc).

The second comparison disconnect type involves a risk assessment of a component or several components whose risk estimates do not comport with SME beliefs or other evidence. Other evidence includes results of inspections not available prior to the risk assessment.

If assessment results are not consistent with a benchmark believed to closely represent future performance of the system or when a discrepancy arises in a comparison of a component- or location-specific assessment with an SME belief or other evidence, any of several things might be happening:

Benchmark is not representative of the assessed segments
Effects of conservatism are not being fully considered
Both are correct (ie, within the range of expectations), but probability effects make them appear contradictory
Exposure estimates were too high or too low,
Mitigation effectiveness was judged too high or too low,
Resistance to failure was judged too high or too low.
Consequences estimates were too high or low
SME belief or contrarian evidence is flawed.

The distinction between PoF and probability of damage (damage without failure) can be useful in diagnosing where the assessment is not reflecting perceived reality. If damages are predicted but not occurring, then the exposure is overestimated and/or the mitigation is underestimated. Alternatively, consider a situation where damage potential is modeled as being very low but an inspection (perhaps ILI) discovers certain damages. It is often difficult to determine which estimate—exposure or mitigation—was most contributory to the damage underestimate, but insight has been gained nonetheless.

Mitigation measures have several aspects that can be tuned. The orders of magnitude range established for measuring mitigation is critical to the result, as is the maximum benefit from each mitigation, and the currently judged effectiveness of each. More research is becoming available and can often be used directly in judging the effectiveness of a mitigation measure.

Note that calibration might also be contributing to such disconnects. Calibrating to a target population of pipeline segments includes ‘outliers’ in the target distribution. So, disconnects involving very few segments may be only due to the outlier effect. More widespread disconnects may indicate that the target population used in calibration is not representative of the pipeline segments being assessed.

A trial and error procedure might be required to balance all these aspects so the assessment produces credible results for all inputs.

Incident Investigation

Incident investigation is both a useful input into a risk assessment and a consumer of risk assessment results. In the former, learnings from the incident are almost always relevant to other portions of other pipelines. In the latter, especially when responsibility (blame) is to be assigned, what should have been known, via risk assessment, prior to the incident is almost always relevant. From this, the risk management decision-making will normally be challenged by parties having suffered damage from the incident.

Retro-fitting a risk assessment for this type of application uses the same steps as any other risk assessment. Care must be exercised to not introduce hindsight, if the assessment is to truly reflect what was/should-have-been known immediately prior to the incident.

When evaluating what should have or ‘could have’ been known and what should have (or ‘could have’) been done prior to an accident, the investigation often seeks to determine if decision-makers acted in a reasonable and prudent manner. For more extreme behavior, the legal concept of negligence may also be applicable and some investigations will seek to demonstrate that.

The risk aspect of the investigation can focus on these issues by including the following:

List of evidence available prior to incident. This includes information that was readily available to decision-makers prior to the incident. Less available information—determining to what extents research, data collection, investigation, etc, should have been done—is a later consideration.
Risk implications of this evidence. This can be demonstrated via a translation, showing how each piece of evidence is translated into a measurement of exposure, mitigation, resistance, or consequence.
P50 and P90+ risk assessments prior to incident, using all available information, again, prior to incident. The assessment should model uncertainty as increased risk, reflecting a prudent decision-making practice of erring on the side of over-protection.
Decision-making context. Here, the risk report puts the assessment results into context for the reader. This can include at least two types of context:

Relative: how did the risk of the subject segment—the failed component—compare to other risks under the control of the risk manager, immediately prior to the incident? Should this have been a priority segment for the decision-makers? Did the failure mechanism that actually precipitated the event appear as a dominant threat? Should it have, given the information available at the time?

Acceptability Criteria: immediately prior to the incident, would the risk from this segment have been deemed ‘acceptable’ by any common measure of risk acceptability? Even when numerical criteria for ‘risk acceptability’ or ‘tolerable risk’ are unavailable for a specific pipeline, inferred and comparative criteria are always available. Examples are numerous and include:

Risk criteria used in similar applications; for example, siting of pipelines near public schools [1048].
General industrial risk criteria used in other countries; for example, ALARP
Land use and setback criteria suggested in some guidelines [1047] and applied in some municipalities
Risk criteria employed in other industries
Suggested target reliability levels. [95, 333]

Risk criteria often use fatalities as the consequence of interest. So, even if not directly applicable to the subject pipeline, the fact that a fatality-based risk level is tolerable (or not) in a similar area or for a similar application, may be relevant to the subject incident.

Care should be exercised to emphasize the probabilistic nature of a risk assessment. A risk assessment can easily fail to highlight a threat that later turns out to cause the next failure. But that does not mean that the assessment is incorrect. A 1% probability event can occur before a 90% probability event, but they may still be accurately depicted as 1% and 90% probability events, respectively. Of course, if several events assessed at 1% each happen before the 90% event, the assessment results should become increasingly suspect.

Mitigation options prior to the incident. A listing of all risk reduction opportunities available to decision-makers prior to the incident will be useful to the analyses. The reasonableness of each should not be a consideration at this stage—rather the focus should be on a comprehensive list.
Cost/benefit analyses of available mitigation prior to the incident. This addresses reasonableness and is also captured in ALARP. See . While spending to prevent consequences that are difficult to monetize (for example, fatality, threatened and endangered species harm, etc) evokes emotionalism in decision-making, there is nonetheless a concept of reasonableness in spending to prevent any type of potential loss. Monetization of all types of consequence is becoming more common. But even expressed in qualitative (non-monetized) ways, the costs of opportunities for consequence avoidance prior to the incident, will still be of use in the investigation.

Use of Inspection and Integrity Assessment Data

The first and primary use of inspection and integrity assessment data, including investigations from failures and damage incidents, is in determining resistance. This is detailed in . A secondary, but also very important use of this information is in revisiting previous assumptions used in the risk assessment. Since this latter use permeates so many inputs into a risk assessment, this topic is explored here in an early chapter.

When inspection does not find damages where they had been predicted by the risk assessment, a common cause is conservatism in the risk estimates. However, one should not discount the possibility of damages present but undetected by the inspection. In the case of ILI, such disconnects may warrant a re-examination of factors such as:

Assumed detection capabilities to various ILI types regarding various anomaly types and configurations.
Assumed reductions in detection capabilities to various types of ILI excursions.

When an inspection detects corrosion or cracking damage, it is logical to conclude that damage potential existed at one time and may still exist. When there is actual damage, but risk assessment results do not indicate a significant potential for such damage, then a conflict seemingly exists between the direct and the indirect evidence. Such conflicts are discussed in , especially .

Identifying the location of the inconsistency is necessary. The conflict could reflect an overly optimistic assessment of effectiveness of mitigation measures (coatings, CP, etc.) or it could reflect an underestimate of the harshness of the environment. Another possibility is that detected damages do not reflect active mechanisms but only old and now-inactive mechanisms. For instance, replacing anode beds, increasing current output from rectifiers, eliminating interferences, and re-coating are all actions that could halt previously active external corrosion. Finally, the apparent disconnect might not be a disconnect at all. It could simply be an actually very rare occurrence whose time had come. Even very low probability events will occur eventually.

The degradation estimates in a risk assessment should always include the best available inspection information. The risk assessment should preferentially use recent direct evidence over previous assumptions, until the conflicts between the two are investigated.

For example, suppose that, using information available prior to an ILI, the assessment concluded a low probability of subsurface corrosion because both coating and CP were estimated to be fully effective. If the ILI recent inspection, indicates that some external metal loss has occurred, then the subsurface corrosion assessment would be suspect, pending an investigation. The previous assessment based on indirect evidence should probably be initially overridden by the results of the ILI pending an investigation to determine the cause of the damage—how the mitigation measures may have failed and how the risk assessment failed to reflect that.

If the risk assessment is modified based upon un-verified ILI results, it can later be improved with results from more detailed examinations, that is, excavation, inspection, and verifications that anomalies are present and represent loss of resistance. If a root cause analysis of the detected damages concludes that active corrosion is not present, the original risk assessment may have been correct. The root cause analysis might demonstrate that corrosion damage is old and corrosion has been mitigated and values may have to again be revised.

A similar approach is used for integrity assessments such as pressure tests. If test results were not predicted by the risk assessment, investigation is warranted.

Techniques to assimilate ILI and other direct inspection information into risk estimates are discussed in .

Types of Pipeline Systems

An underlying premise in this book is that only one risk assessment methodology should be used, regardless of variations in system type and components within each system–regardless as well of variations in product transported, geography, pressures, flowrates, materials, etc. This methodology should be consistently applied. This way, even the most diverse collection of system types, components, products transported, geographies, etc. can be compared and managed appropriately. Even very specialized, rare pipeline designs, such as long, encased pipe—pipe-in-pipe configurations or special materials—are efficiently assessed by the same methodology.

The following chapters of this book discuss system-specific difference when such differences require special consideration in the assessment. In the following paragraphs, facility types are discussed and some general differences among pipeline systems are highlighted. Again, this does not suggest that alternate risk assessments are required to deal with these differences. A robust risk assessment framework readily handles all differences.

Differing definitions of ‘failure’—a key thing being measured in the risk assessment—for both integrity-focused risk assessment and service interruption risk assessments, may be desirable. Again, the same methodology is still efficiently applied to all asset types, components, and risk/failure definitions.

Background

The following definitions are offered as general discriminators of pipelines based on their differences in service. These definitions are not universally recognized. Regulatory definitions are often more specific, sometimes linking definitions to stress level or other factors. ‘Product’ generally refers to hydrocarbon products—oil and gas—but also generally apply to water and other substances moved by pipeline.

Conceptually, pipelined product travels from a wellhead to end consumers through a series of pipelines. These pipelines — including flowlines, gathering lines, transmission lines, distribution lines, and service lines — carry product at varying volumes, flowrates, and pressures. Related pipeline type terminology includes the following:

Feeder lines move products from batteries, processing facilities and storage tanks to the long-distance haulers of the pipeline industry, the transmission pipelines.
Flowlines connect to a single wellhead in a producing field. Flowlines move product from the wellhead to nearby storage tanks, transmission compressor stations, or processing plant booster stations.
Gathering lines collect product from multiple flowlines and move it to centralized points, such as processing facilities, tanks, or marine docks.
Distribution pipelines, also known as “mains,” are the middle step between high pressure transmission lines and low pressure service lines.
Service pipelines connect to a meter that delivers product to individual customers-the end users.

Many examples in this book are directed towards transmission pipelines. As typically the more regulated and higher stressed of the pipeline systems, risk management efforts have been very focused on these systems, especially more recently. There are many similarities between transmission and other pipeline systems, but there are also important differences from a risk standpoint. A transmission pipeline system is normally designed to transport large volumes of product over long distances to large end-users such as electrical power plants, oil refineries, chemical plants, and distribution systems. The distribution system delivers received product to numerous users in towns and cities e.g., natural gas for cooking and heating or water for multiple uses is delivered to homes and other buildings by the distribution system within a municipality. Gathering systems typically have lower pressures and volumes than transmission, are geographically constrained, and often less regulated. The similarities between transmission and other systems arise because a mostly sub-terrain, pressurized pipeline will experience common threats. All pipeline systems have similar risk influences acting on their risk profiles—changes in risk along their routes. All are vulnerable to varying degrees from external loadings, corrosion^{^[3]}, fatigue, and human error. All have consequences when they fail. When the pipelines are in similar environments (buried versus aboveground, urban versus rural, etc.) and have common materials (steel, polyethylene, etc.), the similarities become even more pronounced. Similar mitigations techniques are commonly chosen to address similar threats.

Differences arise due to varying material types, pipe connection designs, interconnectivity of components, pressure ranges, leak tolerance, and other factors. These are considered in various aspects of a risk assessment. In this section, the focus is primarily on the differences among steel pipelines. This focus is warranted since many newer pipeline regulations differentiate among steel pipelines based on relatively minor differences in their use.

Materials of Construction

The wide range of materials used in pipelines is discussed in . As noted, the focus in this section is primarily on the differences among steel pipelines. The history of steel in pipelines is useful background information:

While iron pipe for other uses in the U.S. dates back to the 1830s, the use of pipe for oil transportation started soon after the drilling of the first commercial oil well in 1859 by “Colonel” Edwin Drake in Titusville, Pennsylvania.

The first pipes were short and basic, to get oil from drill holes to nearby tanks or refineries. The rapid increase in demand for a useful product, in the early case kerosene, led to more wells and a greater need for transportation of the products to markets. Early transport by teamster wagon, wooden pipes, and rail rapidly led to the development of better and longer pipes and pipelines.

In the 1860s as the pipeline business grew, quality control of pipe manufacture became a reality and the quality and type of metal for pipes improved from wrought iron to steel.

Technology continues to make better pipes of better steel, and find better ways to install pipe in the ground, and continually analyze its condition once it is in the ground. At the same time, pipeline safety regulations become more complete, driven by better understanding of materials available and better techniques to operate and maintain pipelines.

They continue to play a major role in the petroleum industry providing safe, reliable and economical transportation. As the need for more energy increases and population growth continues to get further away from supply centers, pipelines are needed to continue to bring energy to you.

From the early days of wooden trenches and wooden barrels, the pipeline industry has grown and employed the latest technology in pipeline operations and maintenance. Today, the industry uses sophisticated controls and computer systems, advanced pipe materials, and corrosion prevention techniques. [1049]

Product Types Transported

The type of product in the pipeline impacts certain failure mechanisms as well as consequence potential. See listing of typical pipeline products and discussion of associated hazards in CoF, .

Gathering System Pipelines

Gathering systems are normally comprised of low-capacity pipelines^{^[4]}—typically less than 8 inches in diameter—that move produced fluids from subsurface wells to high-capacity transmission pipelines. Before leaving a hydrocarbon production field, the product is often processed to remove excess water, gases, and sediments as required to meet the quality specifications of transmission pipelines and the refineries they access.

Gathering pipelines are somewhat different from transmission pipelines in design, maintenance, operations, and in the quantity and quality of the liquids they carry. Historically designed, built, and operated under less regulations, gathering systems often have more leaks than transmission pipelines. They are generally lower stress systems, often located in less populated areas so consequences are usually less than in transmission and distribution.

It is not unusual for products such as natural gas being produced and transported through a gathering network to vary in composition from one section of pipeline to another, according to the production from each well.

Transmission Pipelines

Transmission pipelines are typically large-capacity pipe, usually 8 inches or more in diameter and generally transporting fluids over long distances and at relatively high pressures. They typically originate at one or more inlet stations, or terminals, where custody of a product shipment is transferred from the owner (shipper) to the pipeline operator. Accordingly, inlet stations can be access points for truck, rail, and tanker vessels as well as other pipelines, including gathering lines from production areas. Along with pumping stations, storage tanks, sampling and metering facilities can be located at inlets to ensure that the hydrocarbons injected into the pipeline meet the quality control requirements of the pipeline operator and intended recipients.

Distribution Systems

For purposes of this discussion, a distribution pipeline system will be considered to be the piping network that delivers product from the transmission pipeline to (or ‘from’ in the case of sewer systems) multiple final users (i.e., the consumer) in the same geographical area. This includes the low-pressure segments that operate at pressures close to those of the customers’ needs as well as the higher pressure segments that require pressure regulation to control the pressure to the customer. The most common distribution systems transport water, wastewater^{^[5]}, and natural gas, although steam, propane, and other product systems are also in use.

An easy way to picture a distribution system is as a network or grid of mains, service lines, and connections to customers. This grid can then be envisioned as overlaying (or at least having a close relationship with) the other grids of streets, sewers, electricity lines, phone lines, and other utilities.

Some operators of natural gas distribution systems have been more aggressive in applying risk management practices, specifically addressing repair-and-replace strategies for their more problematic components. These strategies incorporate many risk assessment and risk management issues, including the use of models for prioritizing replacements or assessing risk. Many of these concepts will also generally apply to water, wastewater, and any other pipeline systems operating in predominantly urban environments.

Since they are generally comprised of components of smaller volume with less pressure-containment requirements, a wider range of materials and appurtenance designs have been available to distribution systems. Many systems have evolved over many decades, with operators routinely changing from previous materials and practices in favor of better or more economical designs.

Distribution

Comparisons

Historical accident/incident data offer important insights into what causes pipeline failures. Municipal distribution systems, both water and gas, usually have much more documented leak data available than other pipeline systems. This is due to a higher leak tolerance in distribution systems compared to transmission and an often better (although still historically weak in most) attention to record keeping compared to gathering systems.

System characteristic data—even the basic specifications of pipe material, size, and exact locations—are, however, often less available compared to transmission pipelines. A common complaint among most distribution system operators is the incompleteness of general system data relating to material types, installation conditions, and general performance history. This situation is changing among operators, most likely driven by the increased availability and utility of computer systems to capture and maintain records as well as the growing recognition of the value of such records.

The primary differences, from a risk perspective, of and among distribution pipeline systems include:

Materials and components
Pressure/stress levels
Pipe installation techniques
Leak tolerance.

Distribution systems also differ fundamentally from transmission systems by having a much larger number of end-users or consumers, requiring specific equipment to facilitate product delivery. This equipment includes branches, meters, pressure reduction facilities, etc., along with associated piping, fittings, and valves. Curb valves are additional valves usually placed at the property line to shut off service to a building. A distribution, gas, or water main refers to a piece of pipe that has numerous branches, typically called service lines, that deliver the product to the final end-user. A main, therefore, usually carries more product at higher pressure than a service line. Where required, a service regulator often controls the pressure to the customer from the service line. In increasingly rare scenarios, customers are directly connected to long lengths of piping that are protected by common pressure control devices, rather than customer-specific control.

Although there are many overlaps, the typical operating environments of distribution systems are often materially different from that of most transmission pipeline segments. Normally located in heavily populated areas, distribution systems are generally operated at lower pressures, built from different materials, and installed under and among other infrastructure components such as roadways, Many distribution systems are older than most transmission lines and employ a myriad of design techniques and materials that were popular during various time periods. They also generally require fewer pieces of large equipment such as pumps and compressors (although water distribution systems usually require some amount of pumping). Operationally, significant differences from transmission lines include monitoring (SCADA, computer-based leak detection, etc.), right-of-way (ROW) control, inspection opportunities, and some aspects of corrosion control.

Because of the smaller pipe size and lower pressures, leak sizes are often smaller in distribution systems compared to leaks in transmission systems; however, because of the environment (e.g., in towns, cities, etc.), the consequences of distribution pipe breaks can be quite severe. Also, the number of leaks seen in distribution systems is often higher. This higher frequency is due to a number of factors that will be discussed later in this chapter.

Distribution System integrity

Pipeline system integrity is often defined differently for hydrocarbon transmission versus gathering and distribution systems. In the former, leakage of any size (beyond the microscopic, virtually undetectable amounts) is usually intolerable, so integrity normally means “leak free.” The intolerance of even the smallest leak in a transmission pipeline is due to several factors, including economics of product transport and potential consequences from integrity breaches in higher-stress systems. Many distribution systems, on the other hand, tolerate some amount of leakage—system integrity is considered compromised only when leakage becomes excessive.

The higher leak tolerance leads to a greater incidence of leaks in a distribution system. These are often documented, monitored, and placed on “to be repaired” lists. Knowledge of leaks and breaks is often the main source of system integrity knowledge. It, rather than inspection information, is usually the first alert of systemic issues of corrosion of steel, graphitization of cast iron, loss of joint integrity, inferior material performance (eg, lack of brittle failure resistance in certain plastics), and other signs of system deterioration. Consequently, risk modeling in urban distribution systems has historically been more focused on leak/break history. Coupled with the inability to inspect many portions of an urban distribution system, this makes data collection for leaks and breaks even more critical to those risk management programs.

of this book discusses the application of leak/break data to risk assessment and risk management.

When only certain types of integrity loss are of interest, a change in definition of ‘failure’ is in order. By simply changing from ‘loss of integrity’ to something like ‘significant loss of integrity’, the same methodology can be applied to generate a risk assessment for the desired types of failures.

Data

Since distribution systems typically evolve over decades of design, installation, maintenance, and repair practices, they typically harbor much more variety than does any transmission system. Note that portions of many urban distribution systems were designed in the absence of any industry standards governing material selection, quality control, installation techniques, and other practices that are part of a modern pipeline design effort.

The value of record keeping was typically unrecognized in previous decades. This has resulted in large information gaps, even regarding such basic information as exact locations, material types, connector types.

Offshore Pipeline Systems

The often-dynamic environment of pipeline operations offshore can make risk assessment more challenging than for onshore operations. The assessment for the offshore risks follows the same approach as the assessment for onshore facilities. These same risk assessment concepts will also apply to pipeline crossings of all water bodies, including rivers, lakes, and marshes.

Some additional considerations for certain risk aspects will be necessary to account for differences between the onshore and offshore pipelines. Common differences include external forces related to bottom stability —including hydrodynamic forces (inertia, oscillations, lateral forces, debris loadings, etc.) caused by water movements, an often higher potential for pipe spans and/or partial support scenarios, and storm implications—and activities of others (anchors shipwrecks, dropped objects, etc), availability of inspection data, and potential consequences.

Risers, platforms, and all other portions of the offshore systems are readily evaluated by this same risk assessment approach.

Components in Close Proximity

Components in ‘close proximity’ include those in facilities and shared corridors. Risk assessments for facilities—from large and complex tank farms, pump stations, compressor stations, gas processing plants, etc to simple valve and meter sites—can be conducted in exactly the same way as are risk assessments for simple lengths of pipeline. Likewise, congested pipeline corridors also require no change in methodology. There are, however, some nuances that, while readily accommodated in the suggested risk assessment methodology, warrant some discussion.

Modeling of components within shared corridors and facilities presents an interesting interplay when assessing risks. Each component endangers its neighbors. Neighboring components add to both PoF and CoF. The PoF from component #1 and component #2 add to the PoF for their neighbor, component #3. Conversely, the CoF’s from #1 & #2 also add to the CoF for #3. That is, if a failure in #3 damages #1 and/or #2, then the associated losses from those neighbors’ damages are additive to the losses arising just from #3.

The PoF aspect is often called a successive or sympathetic reaction and is discussed in . The CoF aspect is discussed in .

Facilities/Stations

Note definition of facility or ‘station’ as used in this book: Facility or station refers to one or more occurrences of, and often a collection of, equipment, piping, instrumentation, and/or appurtenances at a single location, typically where at least some portion is situated above-ground (unburied) and usually situated on property controlled by the owner.

A facility can be as small as a single valve site—perhaps a simple, uninstrumented mainline block valve, in an area covering only a few square feet. A facility can also be as large as a combined tank farm, underground storage field, truck-, rail, and marine-loading facilities, major pump station, electrical substation, and all associated appurtenances, situated on a site covering many acres of land surface. In between are all sizes of meter stations, city gate stations, pump stations, compressor stations, manifolds, and many others.

Comparisons between and among facilities is often desirable in risk management. Operators often want to compare risks associated with portions of pipeline with stations or parts of stations—components within stations. This might be for reasons of general risk management, project prioritization, or to assist in design decisions such as pipeline loops versus more pump stations.

Background

As noted, pipeline systems typically have surface (above ground) facilities in addition to buried pipe and include pump and compressor stations, tank farms, truck- rail- and marine loading appurtenances, metering and valve locations. Facilities must be included in most decisions regarding risk management.

Groups of components within a station facility to be evaluated in a risk assessment might include:

Atmospheric storage tanks (AST)
Underground storage tanks (UST)
Sumps
Racks (loading and unloading, truck, rail, marine)
Additive systems
Piping and manifolds
Valves
Pumps
Compressors
Subsurface storage caverns.

Sectioning and Summarizing Risk

For purposes of risk summarization, the contribution from each in-station section of piping, each valve, each tank, each transfer pump, each connector, etc. is aggregated. This allows any number of summarizations by sub-facility type, geographic location, or other grouping. For example, due to the potential increased hazard associated with the storage of large volumes of flammable liquids, one station risk summarization may consist of all components located in a bermed storage tank area, including tank components (floor, walls, roof), transfer pumping components, manifolds and other piping, safety systems, and secondary containment. This grouping would show a risk estimate reflecting the risks specific to that portion of the station. The risk evaluations for each grouping can be combined for an overall station risk summary or kept independent for comparisons with similar groupings in other stations.

In the design phase of a facility, understanding the risks of each grouping allows more strategic placements within the facility, perhaps relative to populations, roadways, and other risk-influencing features.

Segmenting a component such as pump, loading arm, compressor, etc will also be necessary, at least at a conceptual level. When a component is comprised of multiple parts and materials, failure potential is not consistent among those parts and materials. A tank bottom will be exposed to different failure mechanism severities compared to its sides and roofs. The pump casing has different resistance characteristics than does its suction piping, seals, and mechanical connectors. The most rigorous risk assessment will assess each sub-component for all possible failure mechanisms and consequences. This is not unlike PPM where each subcomponent carries its own maintenance requirements, except that many PPM’s focus on the potential for equipment unavailability rather than all consequences.

When a complex component is to be treated as a single component, compromises are required, similar to a manual segmentation strategy on a long pipeline. In either case, averages or worst-case subcomponents will dictate the component’s assessed values, potentially masking true risks. The loss of accuracy in a facility component will however normally be much less than the comparable loss for long pipeline segments. The conceptual segmentation will, for each failure mechanism, use the most vulnerable sub-component’s characteristics to characterize the entire component. For instance, a pump seal will often govern the leak potential for the entire pump assembly, and a mechanical coupling will often dictate the external force resistance for the entire assembly (discounting instrumentation connections).

Unique Risks

While the same risk assessment methodology is appropriate for both stations/facilities and ROW pipe, the differences must be accounted for. Examples of these differences include the following aspects, more commonly found inside fence limits (ie, in facilities, especially where some form of material processing occurs):

Materials handling and transfer. Adds risk issues associated with loading, unloading, and warehousing of materials.
Enclosed or indoor process units. Adds risk issues associated with enclosed or partially enclosed processes since the lack of free ventilation can increase damage potential. Consideration of effective mechanical ventilation is appropriate.
Access. Ease-of-access to the process unit by emergency personnel and equipment impacts consequence potential.
Drainage and spill control. Adds risk factors for situations where large spills could be contained around process equipment instead of being safely drained away. Increased risk, both PoF and CoF, from sympathetic or successive reactions; for example, one failure precipitates others in nearby components.

Corridors, Shared ROW

Pipelines are often co-located in common ROW with other pipeline, electric utility lines, or other utilities. While the risk picture is impacted by these scenarios, the risk assessment methodology requires no revision.

Note the similarities in risk assessment for these compared to facilities. In both cases, a component being assessed has some incremental increase in PoF due to the PoF from nearby components. This is normally a small fraction of the neighboring component’s PoF since only a fraction of its Pof events can impact the assessed components, especially when distance, earthen cover, or other barriers are involved. As another similarity, the potential consequences from the assessed component are increased by the potential consequences that could arise from neighboring components that fail due to the failure of the assessed component.

The ideal engineer is a composite… He is not a scientist, he is not a mathematician, he is not a sociologist or a writer; but he may use the knowledge and techniques of any or all of these disciplines in solving engineering problems.

N. W. Dougherty

Dr. John Kiefner’s work for AGA, Dr. Mike Kirkwood from British Gas, W. Kent Muhlbauer’s early editions of The Pipeline Risk Management Manual, and Mike Gloven’s work at Bass Trigon. ↑
Some may also use the label ‘deterministic’. ↑
Even plastics, concrete, and specialized metals have some exposure to corrosion, in the general use of the word. ↑
With notable exceptions such as the Alaska North Slope gathering system with large diameter, high pressure systems. ↑
Although technically a collection system, a wastewater systems shares characteristics with distribution as well as gathering systems. ↑

Ch3 risk assess

Assessing Risk

Risk assessment building blocks

Tools vs Models

Hazard Identification/Evaluation Techniques

Analyses Support Tools

Visualization Tools

Matrix

Others

What is a risk assessment model?

Model scope and resolution

Historical Approaches

Formal vs. informal risk management

Scoring/Indexing models

Classical QRA Models

Myths

Myth1: Some risk assessment models are better able to accommodate low data availability

Reality:

Statistics-centric Approach:

Physics Approach:

Approach Comparisons:

Myth 2: QRA requires vast amounts of incident histories Reality:

Choosing a risk assessment approach

New Generation Risk Assessment Algorithms

Risk Assessment Specific to Pipelines

Quality, Reliability, and risk management

Risk assessment issues

Quantitative vs. qualitative models

Absolute vs. relative risks

Verification, Calibration, and Validation

Verification

Calibration

Validation

US Gas IMP Protocol C.04

From PRCI, discussing validation of a risk-based model for pipelines:

ASME B31.8s

SME Validation

Predictive Capability

Evaluating a risk assessment technique

Diagnostic tool—Operator Characteristic Curve

Possible Outcomes from a Diagnosis

Risk model performance

Sensitivity analysis

Weightings

Diagnosing Disconnects Between Results and ‘Reality’

Incident Investigation

Use of Inspection and Integrity Assessment Data

Types of Pipeline Systems

Background

Materials of Construction

Product Types Transported

Gathering System Pipelines

Transmission Pipelines

Distribution Systems

Comparisons

Distribution System integrity

Data

Offshore Pipeline Systems

Components in Close Proximity

Facilities/Stations

Background

Sectioning and Summarizing Risk

Unique Risks

Corridors, Shared ROW

Myth 2: QRA requires vast amounts of incident histories Reality: