Data Management and Analyses

We begin by noting the importance of information to a risk assessment. The reliance of the risk assessment on full and complete knowledge cannot be overemphasized. While ‘full and complete’ information is rarely available, it is nonetheless a target.

A great deal of information is usually available in a pipeline operation. Information that can routinely be used to update the risk assessment includes

All survey results such as pipe-to-soil voltage readings, leak surveys, patrols, depth of cover, population density, etc.
Documentation of all repairs
Documentation of all excavations
Operational data including pressures and flow rates
Results of integrity assessments
Maintenance reports
Updated consequence information
Updated receptor information—new housing, high occupancy buildings, changes in population density or environmental sensitivities, etc.
Results of root cause analyses and incident investigations
Availability and capabilities of new technologies.

See PRMM for an introduction and background to the management, collection and sources of data typically used in a pipeline risk assessment. Additional and pipeline-specific observations are also offered.

Multiple Uses of Same Information

Information importance is even more amplified when it informs multiple elements of the risk assessment at the same time. It is often the case that individual pieces of data impact several different aspects of risk. For example, pipe wall thickness is a factor in almost all potential failure modes: It determines time to failure for a given corrosion rate, partly determines ability to survive external forces, and so on. Population density is a consequence variable as well as a third-party damage indicator (as a possible measure of potential activity). Inspection results yield evidence regarding current pipe integrity as well as possibly active failure mechanisms. A single detected defect can yield much information. It could change our beliefs about coating condition, CP effectiveness, pipe strength, overall operating safety margin, and maybe even provides new information about soil corrosivity, interference currents, third-party activity, and so on. All of this arises from a single piece of data (evidence).

Many companies now avoid the use of casings. But casings were put in place for a reason. The presence of a casing is a mitigation measure for external force damage potential, but is often seen to increase corrosion potential. The risk model should capture both of the risk implications from the presence of a casing.

Additional examples—a few among many—are shown below:

Examples of Multiple Usages of Information

Information	Application in Risk Assessment
Product flowrates	corrosion, erosion, surge
AC powerlines	corrosion, impacts from falling object
ILI results	resistance (degradation mechanisms, manufacturing/construction weaknesses, etc), corrosion exposure, corrosion mitigation, crack exposure, outside force damages

Surveys/maps/records

Maps and records of older pipeline system components are not normally as complete as operators would like. Many are faced with very limited information, given the past practices of record-keeping, and are engaged in decades-long efforts to capture critical data. Modern tools and techniques are available to support these efforts. Examples of these along with their applications are discussed in PRMM. The role of this information in risk assessment is multi-faceted, as is noted throughout this book.

Information degradation

Information often has a finite useful life span. Corrosion, for example, is time-dependent, and the timing of corrosion surveys can therefore introduce uncertainty and thus risk. The age of information should therefore be a consideration in any determination based on inspections, surveys, or tests such as pressure testing, where the objective is to identify presence or progression of damages. Since conditions should not be assumed to be static (in a conservative risk assessment), these types of information become increasingly less valuable as they age.

The best way to account for inspection/test age is to account for what might have happened since that inspection/test. This is effectively a measure of information degradation since older inspections/tests with their accompanying higher opportunity for ‘things’ to happen over the years, will automatically be less useful to the risk assessment. This approach also appropriately shows where inspections/tests might not necessarily need frequent refreshment. That is, where not many ‘things’ are apt to happen, there is less incentive to repeat the inspection. The value of inspection/test is readily quantified in terms of risk reduction.

Note that this is one of the two ways that age plays a role in risk assessment. The other has to do with era of manufacture and/or construction, as discussed next.

Terminology

As we get into specifics of data collection, let’s agree on some terminology that will be useful to following discussions. Several terms might be used in manners unfamiliar to the reader. Terminology is not consistent among all risk modelers so these definitions are more for convenience in describing risk assessment steps here. These definitions mostly relate to the use of a database as a data repository, as will be the case for almost all modern risk assessments.

In common database terminology, each row of data in a table or dataset is called a record and each column is called a field. So, each record is composed of one or more fields of information and each field contains information related to each record. A collection of records and fields can be called a database, a data set, or a data table. Information will usually be collected and maintained in a database (a spreadsheet can be a type of database). Results of risk assessments will also normally be put into a database environment.

Structured Query Language (SQL) is a commonly used programming language for databases. SQL can be used to cull information from the database or to render information in meaningful ways, such as applying algorithmic rules to disparate pieces of data to create estimates of risk. Creating risk assessment processes using SQL can be very efficient since they are readily deployed to numerous software environments.

Geographical Information Systems (GIS) have become an essential tool for managing pipelines. They combine database functionality with geographical, or spatial, information – maps in particular. These systems can be programmed to extract and analyze spatial data according to user-defined algorithms. Typical risk applications would be identification of pipeline intersections with roads, railroads, densely populated areas, etc. More advanced uses include modeling for flowpath or dispersion distances and directions, surface flow resistance, soil penetration, and hazard zone calculations. In simple terms, the GIS draws from data that has a spatial component—connected to points on the planet. While often displayed against a map environment, the data can also be tabulated. Most engineering data related to a pipeline will be tabulated and will have a link to spatial data via a stationing system (see definition). The database housing the tabulated data is not necessarily part of the GIS software—it may be only linked. A modern GIS can interface with a variety of databases, spreadsheets, and other files that house tabulated or spatial data. A linear representation of a pipeline is usually called a centerline. All data about the pipeline and its surroundings are tied to the centerline via a linear referencing system.

Using SQL or its own calculating language (sometimes called scripting language), a GIS can be the engine for calculating risk estimates. Programming risk assessment calculations with SQL is an option that allows the risk assessment to draw from multiple data sources and be portable—moved to different database environments.

Each record in a database must have an identifier that ties it to some particular element of the system, including facilities that are a part of that system. That is to say, a unique system identifier is needed. This identifier, along with a beginning station and ending station (or beginning/ending ‘measures’), uniquely identifies a specific component or group of components on a specific pipeline system. It is important that the identifier-stationing combination does indeed locate one and only one point on the system. An alphanumeric identification system, perhaps related to the pipeline’s name, geographic position, line size, or other common identifying characteristics, is sometimes used to increase the utility of the ID field.

For purposes here, stationing refer to a linear referencing system commonly used in land surveying and in pipeline alignment drawings. It is designed to show fixed distances from beginning points. A stationing system is designed to be unchangeable except through the use of equations that adjust for additions or deletions of lengths. Benefits of stationing as linear reference points are that the station values are persistent over time. They can reference old records based on the same stationing system. The main disadvantage is that true distances are unknown when using station values, until all station-equation adjustments are taken into account.

The term measures is commonly used in GIS and is also a linear referencing system. It is similar to stationing except that it represents a continuous system, free from intermediate adjustment equations or other aspects preventing a simple calculation of distance between two points on the pipeline. The continuous centerline distances required in risk assessment are usually based on measures. Unlike stationing, measures are dynamic. When a pipeline is modified—ie, pieces added or removed—measures downstream of the event will change. A GIS can readily maintain both stationing and measures in order to retain references to legacy data sets as well as enjoy the benefits of a centerline free from intermediate station-equation adjustments. An event is the common term for a risk variable in GIS jargon. As variables in the risk assessment, events can be named using standardized labels. Several industry database design standards are available. Standardization is necessary for the coherent and consistent exchange of information with external parties such as service companies, other pipeline companies, and regulators. Attributes is the GIS term for an event’s unique characteristics. Each event must have and attribute assigned, even if that attribute is assigned as ‘unknown’. Some attributes can be assigned as general defaults or as a system-wide characteristic. Each event–attribute combination defines a risk characteristic for a portion of the system.

For example, for the event ‘population density’, an attribute, perhaps in units of ‘persons/m²’, is assigned. In some cases, there will only be specific values that would be appropriately assigned. For the event ’pipe diameter’, the possible attributes are the available pipe sizes. For the event ‘pipe coating type’, a restricted vocabulary list of possible coating types would be the bases of the attributes assigned to the event.

The better GIS applications use a restricted vocabulary in which terms are pre-defined, and only those terms may be used. This avoids variation or inconsistent labeling of the same thing. SW for seam-weld, and not “seam weld” or “S weld,” for example.

All risk variables and their underlying sources are itemized in the data dictionary. The data dictionary should characterize and quantify the attributes of each event. This is the master reference document for the risk assessment, and it should identify the person who oversees the data (the “owner) as well as all other relevant records-management details, or metadata such as current last revision date, frequency of updates, accuracy, etc.

Additional terms relating to data preparation are discussed next. These include events tables, LUT’s, and point data vs continuous.

Sidebar

Data Availability

“I don’t have enough data to quantify risk”

I hear this often and have concluded that it is actually a short hand phrase reflecting two possible beliefs:

I don’t understand how to use the data I do have
I think that quantifying risk assessment means that I need large datasets of historical event frequencies.

The truth is, you can perform a credible risk assessment even with only a very limited amount of information. If you only know a product being transported, pressure, diameter, and general location, you could make plausible estimates—very coarse, but at least reasonable.

This reminds me of a lesson learned during a court room proceeding:

Attorney to expert witness, asking a slightly off-topic question: “Mr Expert, how often might there be a vehicle collision at this intersection each year?”

Expert: “I have no idea. I don’t have any data for that.”

Attorney, while winking to jury: “Ok, since you have no idea, then we can speculate that it can happen 1,000 times per year.”

Expert, surprised: “Oh no, it wouldn’t happen that often.”

Attorney: “Ah, so you DO have ‘some idea’. Ok then, let’s say it happens 500 times per year.”

Expert, beginning to see the hole he has fallen into: “Oh no, that also is way too high.”

Attorney: “How about 100 times a year?”

Expert, now somewhat apologetic: “Well, even that is too high because… .”

This went on until the attorney had obtained, for the court record, the expert’s high and low estimates, even when the expert claimed insufficient data and knowledge to speculate. The attorney knew that it is a simple reasoning exercise to ‘know’ that, say 2-3 vehicle incidents every day at the same place would not be long tolerated, at least in the U.S. Even 1-2 per week, every week of the year, would probably prompt action. This illustrates that, even in the absence of hard data, reasoning can at least bound an estimate.

Direct reasoning is often overlooked as a source of data. When it comes to probability and risk, we sometimes forget that we have a strong, physics-based understanding of real-world phenomena. Instead of using that understanding in our risk estimates, we tend to simply delegate the risk problem to the statisticians. The statisticians use event frequencies in their work so they base their estimates on historical events. They tell us ‘low data’—meaning low historical event frequencies—equates to low predictive power. True enough, especially from a statistics perspective.

But we forget that we still have the underlying physics. Physics tells us how much metal loss can be tolerated before leak or rupture, how much voltage is needed to halt corrosion, how much backhoe bucket force until the pipe breaks, how much landslide a length of pipe can withstand before yielding. We can estimate the numbers needed to calculate these things—often with great accuracy. We don’t have to rely on historical events to tell us how often a thing can happen. We are certainly remiss if we ignore history—it must definitely be used in our analyses whenever it is available. But we are also remiss if we ascribe too much relevance to the past or claim we are helpless without that history.

Let’s discuss low data availability when we’re performing a physics-based risk assessment. It is sometimes not apparent just how much info is readily available. Let’s say you know something simple about the soil type—where it’s rocky and where it’s mostly clay. Some of the risk factors that can be strongly influenced by just this simple piece of information include:

Potential soil moisture content, impacting corrosivity estimate
Likelihood of past coating damages during installation
Propensity of future coating damages to occur
Dispersion of liquid spills—infiltration vs surface flow
Amount of potential harm to certain receptors (for example, aquifers vs surface flow)
Exposure to third party excavation damages
Exposure to certain geotechnical phenomena (for example, subsidence, shrink/swell, landslide, etc)

Perhaps you can think of more. The point is that you may have more information than you first thought. In this example, a single piece of information—a simple soil characteristic; rock vs clay—has influenced seven different risk variables.

There are many other examples of how simple knowledge of surroundings leads to relevant and important risk information. This also emphasizes why dynamic segmentation—the creation of a risk profile—is essential. We would not understand changes in risk along a pipeline route if we failed to take note of changing soil conditions and integrated the implications of those changes.

The second part of the “I don’t have enough data” statement emerges from beliefs about how risk can be quantified. When the underlying belief is something like “we can’t quantify risk because we don’t have the data”, what is often implied is that databases full of incident frequencies—how often each pipeline component has failed by each failure mechanism—are needed before risk can be quantified. That’s simply not correct. To quantify how often a pipeline segment will fail from a certain threat, we don’t necessarily have to have numbers telling us how often similar pipelines have failed in the past from that threat. This myth is often a carryover from the old—let’s say ‘classical’—practice of QRA. That practice can be an almost purely statistical exercise. It relies heavily on data of past events as predictors of future events, as is standard practice in statistical analyses. While such data is helpful, it is by no means essential to risk assessment. And when it is used, it must be used carefully. The historical numbers are often not very relevant to the future—how often do conditions and reactions to previous incidents remain so static that this history can accurately predict the future?

With or without comparable data from history, the best way to predict future events is to understand and properly model the mechanisms that lead to the events. A robust risk assessment methodology forces SME’s to make careful and informed estimates based on their experience and judgment. With only minimal effort, a group of SME’s, in a properly facilitated meeting, can generate credible, defensible estimates of all manner of damage and failure potential along pipelines they know. From these estimates reasonable risk estimates emerge, to be confirmed or updated as actual events are tracked.

Another Aspect of Data Availability

However, let’s not dismiss the bona fide ‘absence of key information’ scenario. It is not uncommon for an operator to have inherited a system with a genuine lack of basic data. Perhaps a gathering or distribution system, assembled over decades, with very poor records has been acquired. Even basic location and materials of construction data might be missing. This is frustrating for a prudent operator wanting to understand risk. He might also encounter resistance in moving resources towards improving the information status.

Information acquisition can be considered risk reduction, when uncertainty is modeled as increased risk. Therefore, a cost-benefit for the information collection efforts can be shown. This is of use in demonstrating the value of information collection.

Here is one approach to, over time, remedy the absence-of-information situation using risk management techniques:

First, formalize and centralize ALL available information—collect and digitize every scrap of paper in every file cabinet and every piece of information in the minds of all the experienced personnel and all information that becomes available in the course of O&M. This means building a robust database and establishing processes to make it’s upkeep a part of day-to-day O&M processes.
Next, perform a risk assessment using all of this information plus conservative defaults to fill in the knowledge gaps. This will produce risk estimates based on both actual risk and risk driven by the conservative defaults.
Finally, use these risk estimates to drive an information collection process. This might require that resources be initially spent specifically on filling knowledge gaps—conducting surveys, inspections, tests, etc solely to gain the information that can replace the conservative defaults and thereby reduce the ‘possible’ risks.

In this approach, the risk assessment itself identifies the most critical information to collect. This is an efficient and defensible strategy to tackle the ‘lack of data’ issue.

Data preparation

Useful data can come in come in variety of different forms and formats. Some data may be in paper only and will need to be digitized. Location data may be derived from varying sources, such as mileposts, fixed-point measurements, or GPS. Location identifiers from alignment sheet stationings may be inconsistent with linear measurements due to the equations used to record route changes on the alignment sheets. In these cases translation routines or some other standardization technique will need to be employed to correlate the data for accuracy. As a rule, if alignment sheets or other legacy systems are in place and in common use, establishing a translation that preserves the old stationing system is worthwhile. When the older systems are not in common use, they can be replaced by newer, GPS-GIS based formats.

Events Table(s)

Much of the topic of data management will be subject to personal preferences. There will usually be several ways to accomplish the same result. However, experience has shown that one particular data collection format, often overlooked by even more advanced practitioners, has proven to be unexpectedly useful in data preparation, diagnostics, and risk management. This tool is referred to as an ‘events table’ in our discussion here. It is simply a complete listing of all data along the pipeline, with only 4 essential columns—the pipeline ID, the beginning station or measure, the ending point, the event (diameter, soil type, depth cover, population density, etc) and then the value assigned to the event between the begin and end points. See PRMM for additional explanation of how an event table is constructed.

An events table is a very useful tool for diagnostics and for QA/QC, and perhaps even for directly maintaining the data. It is often the most easily-researched source of information regarding changes along a route. In answering the inevitable questions of ‘what makes the risk change at location x?’, the events table is easily filtered to show all data inputs associated with location x. While other drill downs are possible, this is often the quickest method to determine fundamental reasons for changing risk estimates.

The events table also proves useful in summarizing the ranges of all data inputs as well as changes to input data over time. The table can readily show, for instance, that in the prior period’s assessment, 21 casings had been identified and now there are 23; or previous soil corrosion estimates ranged to a high of 19.5 mpy and now the maximum is 21.1 mpy. As part of QA/QC of input data, such changes should be understood and defensible, so identifying them efficiently is important.

Data events should determine segmentation based on a dynamic segmentation procedure as described later. Therefore, the events table is the input into the dynamic segmentation process.

Look Up Tables (LUT)

A modern risk assessment requires the assignment of a numerical value to each input that is to be included in an algorithm. For example, the event ’coating type’ with attributes such as FBE, coal tar, asphalt, tape, etc, is not usable in a calculation until some value is assigned to each attribute type. Qualitative descriptors are often ‘translated’ into the numbers needed. It is also useful to preserve the descriptive value of the attribute (FBE, tape, etc). When conversions from a descriptor to a numerical value will be routinely needed, a cross reference matrix, called a look up table (LUT) here, is a convenient tool.

Some examples of LUT’s include:

Assigning detection capabilities to various ILI types.
Assigning reduced detection capabilities to various types of ILI excursions (from ideal inspection conditions).
Assigning probability of manufacturing defects to various combinations of manufacture date and pipe mill.
Converting a USGS landslide or flood ranking category into a event frequency.

Spreadsheet and database software programs provide tools to efficiently use LUT’s. The LUT is accessed during the calculation routines to obtain the numerical equivalents to the qualitative terms.

As one of their benefits, LUT’s provide a simple means to document, preserve, and maintain the relationships between qualitative and quantitative interpretations. If changes are needed, they can be done in the LUT and will then be used in all subsequent calculations. A revision log should be used to track changes to any LUT since alternations will often have far-ranging implications.

Point events and continuous data

All data used in the risk assessment needs to have a dimension of length—a ‘from’ and ‘to’ along the pipeline. Some data will not always have this dimension, at least initially. Examples include overline surveys for soil resistivity, depth of cover, and many others, as well as calculated values at points along the pipeline such as for pressure profile, drain down volumes, and others. In these instances, the length dimensions will need to be added. ‘Rules’ such as ‘half the distance between points’ or fixed lengths either side of a data point are common ways to assign length. See detailed discussion in PRMM. See also a related discussion on ‘eliminating unnecessary segments’, in the following section.

Data quality/uncertainty

Additional data preparation issues are discussed in PRMM, including:

Creating categories of measurements
Assigning zones of influence
Countable events
Spatial analyses
Data quality/uncertainty.

For a discussion of QA/QC as it applies to data collection and preparation for pipeline risk assessment see PRMM.

See also the general discussions of uncertainty and conservatism in assigning defaults in and throughout this book.

Segmentation

Since data collection and segmentation go hand-in-hand, it is appropriate to detail the concepts of segmentation here, in the midst of the discussion on data management.

The conditions along a pipeline route are variable – the hazard potential is not constant – and for this reason a pipeline’s risk must be evaluated by examinations of individual components’ risks.

A mechanism is required to document the changes along a pipeline and assess their combined impact on failure probability and consequence. Lengths of pipeline (or other components) with similar characteristics are identified and assessed. A new segment is created when any risk condition changes, so each pipeline segment has a set of conditions unique from its immediate neighbors. A segment is not necessarily unique within the population of segments—only different from each of its adjacent neighbors.

Each segment will receive its own risk estimate, based on its conditions and characteristics. Therefore, segmentation plays a critical role in risk assessment. Segmentation supports the creation of profiles—a critical element of risk management, as described in and .

The risk evaluator must decide on a strategy for creating these sections in order to obtain an accurate risk picture. Breaking the line into many short sections increases the accuracy of the assessment. Longer sections, created by ignoring changes in risk, reduce accuracy because average or worst case characteristics must be used to approximate the changing conditions within the section, rather than assessing the actual changes within the section.

Historically, the creation of shorter segments to gain accuracy sometimes resulted in higher costs of data collection, handling, and maintenance. This is no longer the case. Especially with modern computing environments, a dynamic segmentation approach, as described later, is both more accurate and usually more efficient.

Segmentation Strategies

Segmentation is a key part of pipeline risk assessment. Three segmentation strategies have historically been used in pipeline risk assessment: fixed-length, manual, and dynamic segmentation. Only the last, dynamic segmentation, is appropriate for a modern risk assessment. The others are noted here, for perspective, but produce inappropriate section breaks leading to often serious weaknesses in a risk assessment.

Inappropriate section break points limit the model’s usefulness and hide risk hot spots if conditions are averaged in the section, or risks will be exaggerated if worst case conditions are used for the entire length. It will also interfere with an otherwise efficient ability of the risk model to identify risk mitigation projects.

If long segments are artificially created, then each pipeline segment would usually have non-uniform characteristics. For example, the pipe wall thickness, soil type, depth of cover, and population density might all change within a segment. If the segment was evaluated as a single entity, the non-uniformity had to be eliminated. This was typically done by using the average or worst case condition within the segment. This obscured actual risks. This significantly weakens the assessment. As an example, consider a 1,000 ft segment to be assessed with one 100 ft cased crossing within. Under a older segmentation strategy, the assessment must assume either all 1,000 ft is cased or all is uncased. Either is incorrect. The reality is that 90% of this segment is uncased and 10% is cased and the only way to fully assess the situation is to treat the uncased differently from the cased.

Fixed-length approach

In the first of the three historical segmentation approaches, an artifact of old risk assessment practice, some predetermined length such as 1 mile or 1000ft or even 1 ft is chosen as the length of pipeline that will be evaluated as a single entity. A new pipeline segment will be created at these lengths regardless of the pipeline characteristics. A fixed-length method of sectioning also included lengths based on rules such as “between pump stations” or “between block valves”. This was a popular method in the past and is sometimes proposed even today. While such an approach may be initially appealing (perhaps for reasons of consistency with existing accounting systems or corporate naming conventions), it will reduce accuracy and increase costs in risk assessment.

Attempts to avoid errors inherent to this approach by using short, but still fixed, lengths also resulted in inefficiencies, albeit less serious than inaccuracies produced when using longer lengths. If a shorter segment length was used, then processing inefficiencies resulted, with commercial software packages requiring days of continuous processing time to perform risk estimates even for relatively few miles of pipeline. The analyses had to deal with many unnecessary segments based on an arbitrarily chosen short segment length selected, for example, 1 ft, while still requiring averaging or worst-case compromises when even shorter features, such as ILI-detected anomalies, were present.

Manually establishing sections

Another previous approach, now also outdated, involved using a pre-determined list of criteria by which to create segments. Modern computational power has eliminated the need to segment the pipeline manually, but a look at the process is useful in understanding of the need for the superior technique that has replaced it.

In a manual segmentation, the risk evaluator would choose factors that he thinks are most impactful on risk in the pipeline system being studied and rank those items with regard to magnitude of change and frequency of change. This ranking would be subjective and incomplete, but it could serve as a basis for sectioning the pipeline(s).

Sections were then divided based on their priority rank of risk factors beginning from the top of the list. The resulting number of sections may have become too large; however, in which case the number of factors on the list was reduced by eliminating some of the low-ranking factors until a cost-effective sectioning—accommodating the computing power of the time—had been achieved.

See PRMM for an example manual segmentation.

Dynamic segmentation approach

The third strategy is the most robust approach while also being the most efficient. The modern segmentation strategy, and the only really correct approach, is dynamic segmentation. The idea is for each pipeline section to be unique, from a risk perspective, from its neighbors. When any characteristic changes, a new segment is created. This ensures that every risk variable, and only the risk variables themselves, determine segment breaks.

Since the risk variables measure unique conditions along the pipeline they can be visualized as bands of overlapping information. Under dynamic segmentation, a new segment is created every time any condition or characteristic changes, so each pipeline segment has a set of conditions unique from its neighbors. The data determines the number and location of segment breaks. The length of a segment depends on frequency of condition change: segments where variables change frequently may be an inch or less; segments with relatively constant conditions may be hundreds of feet in length.

Segments created with a dynamic segmentation process are iso-risk, ie, as far as all collected data and knowledge can determine, there are no changes in risk along a segment’s length. So, within a pipeline section, we recognize no differences in risk, from beginning to end. Each foot of pipe is the same as any other foot, as far as we know from our data. Should changes be later identified, then the segment should be further subdivided.

We also know that the neighboring sections do differ in at least one risk variable. It might be a change in pipe specification (wall thickness, diameter, etc.), soil conditions (pH, moisture, etc.), population, or any of dozens of other risk variables, but at least one aspect is different from section to section.

For some aspects of a risk assessment, conditions will remain constant for long stretches, prompting no new section breaks. Aspects such as training or procedures are generally applied uniformly across the entire pipeline system or at least within a single operations area. Section length is not important as long as characteristics remain constant. There is no reason to subdivide a 10-mile section of pipe if no real risk changes occur within those 10 miles. However, long section lengths suggests incomplete data and casts suspicion on the entire risk assessment.

Normally, there are many real and significant changes along a pipeline route, warranting many dynamic segments.

For purposes of risk assessment, dividing the pipeline into segments based on any criteria set other than all risk variables will lead to inefficiencies in risk assessment. Use of any segmentation strategy other than full dynamic segmentation compromises the assessment.

A computer routine can replace a rather tedious manual method of creating segments under a dynamic segmentation strategy. Related issues such as persistence of segments and cumulative risks are also more efficiently handled with software routines. A software program to be used in risk assessment should be evaluated for its handling of these aspects. Modern GIS software typically has this type of functionality built in. Alternatively, simple programming code performs this task in a variety of software environments.

Eliminating unnecessary segments

PRMM notes instances where data, collected at regular intervals (for example, pipe-to-soil voltages in a close interval survey, pressure changes every 100 ft, soil resistivity readings, depth of cover, etc), have changes that are insignificant from a risk standpoint. Capturing every minor change as a new dynamic segment is not necessary and leads to inefficiency. A useful ‘rule of thumb’ for when a minor change can be ignored is:

If an SME would not be interested in the minor difference between two measurements, then the risk assessment probably also should not react to the difference. Therefore, the data should be grouped or categorized to minimize unnecessary segment breaks.

For instance, typical pipe-to-soil voltage readings (a measure of CP performance) measurements such as 0.879, 0.882, and 0.875, could fall into a category of “0.850 to 0.900” and only values falling into categories outside of this range, warrant special attention. This does not eliminate all unnecessary segments, since values very close to boundaries of categories are arguably also not requiring discrimination. Nonetheless, such ‘bucketizing’ of values can improve data processing efficiency.

Auditing Support

Statistics on segment length are also useful auditing tools. As previously noted, long average lengths or maximum lengths of segments are suspicious. A pipeline in a natural environment would logically have conditions changing regularly along its length solely from changes in its surroundings—soil types, creek crossings, elevation changes, road crossings, population density changes, etc. Additional changes due to design specifications, hydraulic profile, installation specifics, and others, suggest that at least dozens of segments per kilometer would be expected for most pipelines. It is not unusual for a modern assessment to generate thousands of segments per kilometer when detailed inspection data such as from ILI is available. A high segment count should not be worrisome. It results in increased accuracy, normally without increased data or modeling costs. It should also not be viewed as excessive. After all, it is actually only a few millimeters of pipeline component that actually fails in most incidents, sometimes a few meters, when the failure forces are exceptional. When inspection data identifies a few millimeters of possible weakness, such as a metal loss feature, that information should be integrated into the risk assessment.

Segmentation of Facilities

Facilities also require segmentation in order to fully assess risk. Geographical or functional groupings (for example, tank batteries, pump houses, manifold area, truck loading area, injection facility, etc.) are commonly used for aggregation of risk results. However, individual components and even sub-components will still require risk assessments. For example, a pump can fail in a variety of ways, involving its casing, impeller, flanges, shaft, and any other component. Which subcomponent failed and the manner in which it failed may have a significant impact on the subsequent consequences of the pump failure. A full understanding of risk requires knowledge of pump failure potential which requires at least cursory attention to the failure potential of each sub-component of the pump.

Segmentation for Service Interruption Risk Assessment

When failure is defined as service interruption risk, some new dynamic segmentation considerations appear. Consistent with all risk assessments, the data collected to assess the risk will also inform the dynamic segmentation. However, since this expanded definition of ‘failure’ can make the risk assessment considerably larger and more complex, some segmentation shortcuts such as grouping leak/rupture PoF values, might be appropriate. See .

Sectioning/Segmentation of Distribution Systems

Dynamic segmentation is the preferred approach for assessing all types of pipeline systems including distribution systems and other networked components.

Due to sometimes weak data availability for older pipeline systems, it may not be practical to identify and assess each component, at least not for an initial risk assessment. Since dynamic segmentation is based on location-specific data, temporary alternative segmentation strategies might be needed, pending more data availability. This is especially true for older gathering and distribution systems.

As work-arounds to lack of location-specific information, screening approaches have historically been used to focus resources on portions of the system believed more likely to harbor higher risk. Therefore, areas with a history of leaks, materials more prone to leaks, and areas with higher population densities often already have more resources directed toward them.

Such screening approaches should not be considered to be complete risk assessment foundations. They are based on an initial bias—the pre-determined list of perceived priority risk elements—and will often miss important, but rare and non-obvious failure and consequence potential. A detailed, location-specific risk assessment can identify subtle interactions between many risk variables that will often point to areas that would not have otherwise been noticed as being higher risk. High level screening approaches should be thought of as only intermediate steps, sometimes required pending more data availability, towards the full risk assessment. Some of the possible, interim segmentation strategies such as a non-contiguous, characteristic-based or a geographical segmentation strategy are discussed in PRMM.

Persistence of segments

Under a dynamic segmentation strategy, segments are subject to change with each change of data. This results in the best risk assessments, and does not interfere with tracking changes in risk over time. The risk associated with any stretch of pipeline can always be determined and compared with previous estimates. The user simply picks the ‘from’ and ‘to’ boundaries of the section of interest and then obtains the total risk, the total PoF, the maximum CoF, or any other aspect of interest. This involves a summarization or roll up of the dynamic segments that make up the section of interest.

Results roll-ups

Having employed the modern dynamic segmentation approach, the risk assessment is ready to produce estimates of risk at many specific locations along the pipeline. However, any stretch of pipeline can now also be represented by summary risk values. The risk details—sometimes hundreds of segments per mile—will need to be summarized for many risk management activities. Valve-to-valve, trap-to-trap, accounting-based sections, and any other segmentation scheme, can be readily applied to the full risk assessment results in order to produce summary values for many management purposes. See and .

It is common practice to report risk results in terms of fixed lengths such as “per mile” or “between valve stations,” after a dynamic segmentation protocol has been applied. This “rolling up” of risk assessment results is necessary for summarization, reporting, establishing risk management strategies, and perhaps linking to other administrative systems such as accounting or geographic responsibility boundaries.

Summarizations of risks, if not done properly, can be very misleading. Many summarizing strategies will mask important information. Masking occurs when the important details of a collection of numbers is hidden by a summary value that purports to characterize that collection. Several masking scenarios are possible. One simple example is a short section of pipe with an extraordinarily high PoF—perhaps in a landslide zone or a location of CP interference causing corrosion. This problematic segment will often be masked in the summation of the other segments. Viewing a single value purporting to represent the risk of the entire length of pipe (collection of pipe segments) will not reveal to the observer the presence of the extraordinarily high PoF of the short segment unless an aggregation strategy is designed to avoid the masking.

It can be tempting to use an average risk value to summarize. This will clearly mask higher risk portions when most portions are lower risk. Length-weighted averages will also be misleading. A very short, but very risky stretch of pipe is still of concern, but the length-weighting masks this.

For example, the risk per mile of a 10 feet long component might be much higher than the risk per mile for any other segment. Since it is only 10 feet long, it’s contribution to overall risk is perhaps tolerable. But it is important to know that a high rate of risk is indeed being tolerated.

It may also be tempting to employ a ‘weakest link in the chain’ analogy and simply choose the maximum risk segment to represent the risk for the entire collection of segments. As a sole method of aggregation, this is not satisfactory strategy. Examples of difficulties include:

Seg A max = Seg B max but Seg A has only 1% of its length showing that high risk while Seg B has 80% of its length showing ‘high risk’.

Seg A max = Seg B max and each have the same length with the higher risk, but the rest of Seg A is only 1% better while the rest of Seg B is 50% better than its ‘high risk’ length.

Similar difficulties arise if averages or other summary statistics are used—masking of extremes and/or insufficient consideration of non-extremes are both errors in analyses. Simple summations of risk scores from certain older risk assessment methodologies are especially unsatisfactory since they often do not consider lengths of individual segments.

A system of calculating cumulative risk that will avoid all masking, all under-reporting, and over-reporting of risk, is needed. That system is simply an aggregation of all of the underlying segments comprising the section of interest. The aggregation is done by simple summation when elements are additive, such as EL and frequencies, or the application of OR gate summation when probabilities are combined, as in PoF.

See also the discussion of Cumulative Risk in .

Length Influences on Risk

For long, linear systems like pipelines, risk is sensitive to length. When all other aspects are equal, a longer pipeline segment will always show higher risk than a shorter.

The total risk generated by a segment uses the actual length. That is important to risk management decisions. However, the rate of risk—risk per unit length—is also important to decision-makers. It is important to understand when Segment A is higher risk than Segment B only because Segment A is longer. Subtly different is the critical understanding that Segment B may be less risky ONLY because it is shorter; for example, Segment B actually has a higher risk-per-unit length (for example, risk per km), but its short length makes its total risk low.

The segment with the highest risk value will often not be the same pipe segment when reported on a unitized basis versus a length basis. The riskiest length of pipe in the system is not necessarily the segment with the highest rate of risk, ie, risk per foot. It may actually have very low risk per foot, but simply be longer than other segments.

For example, the risk per mile of a 10 feet long component might be much higher than the risk per mile for any other segment. Since it is only 10 feet long, its contribution to overall risk is masked, unless the rate of risk is examined. As previously noted, a very short, but very risky stretch of pipe is still of concern, even if the length-weighting masks this.

This is why both the segment’s risk and its risk-per-unit-length values should be reported by the risk assessment. This is also true for all of the risk sub components since decision-making will also eventually focus on each PoF individually.

CoF is an element of risk that is not pipe length sensitive. CoF in ‘per incident’ units (for example, $/incident, fatalities/incident, etc) makes CoF a length independent measurement. The maximum CoF in a collection of segments (ie, a stretch of pipeline) will be of interest since it shows the worst consequences that could occur (to a certain PXX) in that collection. It may also be of interest to know when a system has a higher proportion or more overall length of higher CoF values than a system with lower CoF’s and/or less length of high CoF. In this case, a length-weighted average CoF, used to supplement the maximum CoF, is meaningful.

Assigning defaults

Any gaps in information must be filled prior to calculating risk values. Typical gaps could be lack of information regarding the depth of cover or coating condition on an older pipeline. To fill the knowledge gaps, the risk assessor must select a default input that is consistent with the desired level of conservatism of the assessment. Each event along the pipeline must have an assigned attribute – a value must be provided for the missing data. This is often most efficiently done in two steps. In the first, values are assigned based on SME knowledge of a specific region or system characteristics. For example, hurricane damage potential in Aspen, Colorado, US can confidently be assigned very low probabilities by SME’s, as can frost heave phenomena in the islands of the Caribbean. In the second phase, values are assigned in the absence of any available SME information. For instance, until an SME is able to say that landslides will not happen along a stretch of pipeline, then a very conservative default—perhaps 1 to 10 landslides per year for every mile of pipe—should be assigned as an exposure in a conservative risk assessment. After all, if no SME can say such numbers are not possible, then the assessment, especially the P90+ assessments, must assume that they are plausible.

This two-step approach completes a hierarchy of data input into the assessment, as shown by the following list:

Location-specific data measurements.
Location-specific data estimates.
Values assigned to general areas by SME’s.
Conservative defaults assigned when no other info is available.

These are in order of progressive uncertainty, with defaults carrying the highest level. Defaults are the values that are to be assigned in the absence of any other information. There are implications in the choice of default values and an overall risk assessment default philosophy should be established.

It is not possible to assign a default to all variables: pipe diameter and type of product are examples. Here, the missing data should lead to a non-assessed segment.

All defaults should be contained in one list. This makes the process of retrieving comparing, modifying and maintaining the default assignments simpler. Note that assignment of values might be governed by rules also. These rules can infer the default from some associated information. Conditional statements (“if X is true, then Y”) are especially useful. For example, the numerical equivalents of statements such as these may be used to assign values when direct information is unavailable:

If (land-use type) = “residential high” then (population density) =
22 persons/acre

If (pipe date) < 1970 AND if (seam type) = “ERW” OR “unknown” then
(pipe manufacture) = “LF ERW”^{^[1]}

Other special equations by which defaults will be assigned may also be desired.

When event frequencies are to be assigned by default for events that have never occurred, a useful exercise may be to quantify the intuitive ‘test of time’ aspect. See That is, if x miles of pipeline have existed for y number of years and the subject event has never occurred, this is useful evidence. Absent any other information, it can be assumed that if the event were to occur now, the historical rate thus created represents a useful predictive rate, at some PXX level of conservatism.

For example, an evaluation team wishes a quick, initial risk assessment and seeks the frequency of ground subsidence events along a pipeline. They believe that the land above their 200 miles of pipeline in this area has never shown any indication of land subsidence in the 20 years the pipeline has existed. Were subsidence to occur somewhere along the pipelines now, the frequency of occurrence could be estimated to be 1 event per 200 miles x 20 years = 0.0025 events/mile-year. Pending the acquisition of better information—perhaps via soils analyses and geotechnical calculations—the team chooses to use this value for their P70 estimate in this initial risk assessment. Given that other threats to system integrity may have estimates that far surpass this value, it may be that additional analyses to produce a better estimate is never warranted. The team could decide that this rough estimate alone is sufficient, unless some future evidence emerges suggesting the need for a better evaluation. This, in itself, is another exercise in risk management—choosing where resources are best applied.

Conservatism in assigning defaults will be appropriate in most risk assessments. A danger in assigning non-conservative values is that they are no longer noticed by risk managers. They are discovered to be non-conservative once an incident happens. At that point, many outside parties will legitimately question the value of an assessment that does not cause gaps in knowledge to be highlighted (ie, via use of conservatism). Credibility will have been lost in addition to the missed opportunity to better manage the risk.

Adhering to a practice of conservatism in defaults requires discipline. It is sometimes difficult to, for instance, use a default of 18” or 24” of cover for all portions of a pipeline that was installed with 36” of cover just 5 years ago. However, with a real chance that some short section has indeed lost cover, the default value reflects real uncertainty, perhaps prompting a depth of cover survey to verify the more likely 36” depth everywhere.

Sidebar

There are two ways to be wrong when assigning a default in the absence of information:

Call it ‘good’ when its really ‘bad’

Call it ‘bad’ when its really ‘good’

The first is the more expensive of the two possible errors. It masks the fact that something might be wrong and causes the whole risk assessment to lose credibility when its seen to have assumed that everything is ‘ok’. This error also disincentivizes the acquisition of better information.

It requires discipline to avoid error #1, avoiding temptation to ‘usually be correct’.

The second error prompts investigation which, arguably, may misdirect resources occasionally, but reducing uncertainty is usually a valuable exercise.

Error #2 is perhaps not even an error but rather a choice in adopting conservatism in order to avoid the costs of Error #1

Quality assurance and quality control

For a discussion of QA/QC as it applies to data collection and preparation for pipeline risk assessment, see PRMM

Data analysis

Much has been written about analyses of numerical data, including the roles of statistics and visualization tools (for example, charts and graphs). For a discussion of data analyses opportunities specific to pipeline risk assessment and risk management, see PRMM and also texts covering more general data analyses options.

Reference to ‘low frequency ERW pipe manufacture’, historically more problematic than most other pipe types ↑

Ch4 Data Mgmt Analyses