Skip to content

Ch8 Inc Ops

Incorrect Operations

  1. Assessing Human Error Potential—Sample of Data Used

Human error potential

The potential for human error is an important aspect of risk but challenging to quantify. It would be remiss to discount this potential threat and thereby diminish the importance of the many types of mitigation measures employed against it. Large budgets are spent on training, procedures, safety systems, and other mitigations, and that spending continues because it is widely believed that the benefits outweigh the costs. Even though only generalizations and subjective determinations may be available to quantify these benefits and many other aspects of human error potential, risk knowledge improves greatly simply from the efforts to measure this.

The error potential focus is sometimes directed more towards stations and facilities. A more complex environment such as a tank farm, pump/compressor station, processing plant, etc, normally provides many more opportunities for human error—first party and second party—compared to ROW miles. Offshore platforms and their onshore counterparts—pump/compressor stations, tank farms, meter facilities, etc—normally have a high density of components, a more complex design, and more frequent human activities compared to most portions of most pipelines,.

Since human error potential permeates every aspect of risk, it logically influences multiple portions of the risk assessment. Of the four main inputs into a risk assessment–CoF, Exposure, Mitigation, Resistance–consequence minimization and mitigation effectiveness are often quite sensitive to operator error, with less sensitivity usually associated with exposure rates and resistance factors in the assessment of PoF.

Despite the need to consider error potential in many specific processes, assessing error potential as an independent failure cause has the advantage of avoiding duplicate assessments for many of the pertinent risk variables. This recognizes that the same variables would apply in most other failure mechanisms and it makes sense to evaluate such variables in a single place in the assessment. For example, evaluating training and procedures programs is often done for all threats collectively, even though there may be differences in the programs for corrosion control versus excavator damage prevention, for instance. The more robust solution is of course to pair all incorrect operations aspects with specific elements of each failure mechanism.

The centralized approach for examining human error in a risk assessment provides a more efficient means of understanding error potential everywhere and is usually appropriate for most risk assessments.

In any approach, the role of human error should also be considered in all estimates that are used in the risk assessment, especially for exposures and mitigations. For instance, the effectiveness of many mitigation measures are sensitive to error rates—eg, line locates, safety device maintenance, evaluation of CP surveys, and others. Exposures to overpressure by surge and thermal events are similarly influenced by human error rates.

See PRMM for background discussions on many of the risk assessment considerations surrounding human error potential.

Human Error Potential Considered Elsewhere in Risk Assessment

The role of human error in risk requires an understanding of potential for pipeline failure caused by errors committed in designing, building, operating, or maintaining a pipeline.

Human error impacts all of the other probability-of-failure analyses. Active corrosion, for example, suggests an error in corrosion control activities, under an assumption that knowledge and resources to prevent corrosion exist.

The human error potential should be captured in the estimation of each mitigation measures’ effectiveness. If there are potential differences in human error potential for each failure mechanism or among exposures, one can pair error-reduction mitigation measures with specific exposures. For example, perhaps training and procedures for surge prevention are more robust than those for thermal overpressure events.

The focus in this chapter is on real time operational errors that directly precipitate failures. Design phase errors are addressed in many spots throughout a good risk assessment while manufacturing/construction errors are mostly addressed in estimates of Resistance (Chapter 10).

When failure is defined as leak/rupture, there are usually few relevant exposure scenarios caused by errors during operations, due to the common design principle of ‘fail safe’ operations. That is, it is normally difficult to accidentally and immediately threaten any pipeline component’s integrity solely by miss-operation of the pipeline’s devices and equipment. When using an expanded definition of ‘failure’ including the often higher potential for service interruption, human error scenarios become more common. In other words, it is easier to interrupt or otherwise compromise a pipeline’s operation (by improperly operating devices and equipment) than to cause a leak/rupture. See discussion of those scenarios in Chapter 12.

It is believed that error potential in the operations phase will often be relevant to error potential in other phases, if only in terms of the similar underlying causes of exposure and opportunities for mitigation. Therefore, the recommended centralized approach for examining human error in a risk assessment provides an efficient means of understanding error potential elsewhere.

Errors by outside parties are more efficiently modeled as part of the exposure rates of other failure mechanisms. This includes vehicle and equipment impacts and explosions from nearby facilities.

Non-operational errors are discussed here but usually better modeled in other portions of the risk assessment. Errors during design and construction tend to introduce weaknesses into the system. These are best considered in the evaluation of resistance. Maintenance errors tend to reduce reliability of equipment, decreasing mitigation effectiveness when the equipment is protective of integrity; for example, safety systems, monitoring instrumentation, etc. Design, construction, and maintenance errors are therefore contributors to failure frequency and consequence but not often initiators. If the assessed component has functioned correctly for some period of time under similar stresses prior to a failure, then the original error is a contributing factor but not the final failure mechanism. Operational errors, on the other hand, can and do precipitate failure directly.

Finally, human errors can fail to minimize consequences or even exacerbate them, as is discussed in the CoF assessment (Chapter 11).

Origination Locations

Operational human-error scenarios potentially causing damage to a pipeline may originate at a facility far from the damage location. Overpressure at a pump station may cause a rupture only at a weak point in a pipe segment miles away. Station operations typically have more opportunities for errors such as overpressure due to inadvertent valve closures and incorrect product transfer resulting in product to the wrong tank or to overfilled tanks.

Therefore, facilities, especially those with more frequent or complex human interfacing, will play a large role in risk assessment for portions of the system far beyond the facility boundaries. They are often initiating points for a failure manifesting elsewhere along a system.

Recall HAZOPS as a scenario-based analysis tool to identify events and sequences of events that can lead to failures, including operability issues. A HAZOPS will organize the evaluated facility into ‘nodes’—discreet portions of the facility being assessed individually. HAZOPS are often overlooked in a pipeline risk assessment due to a perception that they only apply to a station facility and not to ROW miles. In reality, they usually identify most, if not all, of the potential human error scenarios that could cause failures anywhere, including locations long distances from the facility being assessed. When a HAZOP node includes ROW pipe—perhaps shown as a delivery or receipt point on the P&ID schematic of the facility—then the applicability is most apparent. When specified as a node, the HAZOP facilitator should ensure that this node includes more than just the immediate receipt or delivery pipe components. It should include all features along the pipeline—low spots, weaknesses, etc.—even at long distances from the facility.

  1. Human error potential during operations

Continuous Exposure

Recall the discussion in . Incorrect operations has several examples of this. For instance, suppose that a high pressure source is connected to a pipeline via a pressure regulating (control) valve. The pressure source creates the threat exposure and the regulator is the mitigation in this elementary example. Failure is avoided through the use of control and safety systems. The source represents continuous exposure—the pipe downstream of the regulator is subject to immediate overpressure (and potential failure) if the regulator fails. For modeling purposes, it is an on-going, unrelenting, cause of immediate failure if unmitigated.

Measuring this type of exposure appropriately in a risk assessment model requires the correct coupling of the continuous exposure with a corresponding mitigation effectiveness. A high-demand or continuous exposure requires mitigation with very high reliability. The modeling issue with continuous exposure is the choice of time units in which to express the rate of exposure. The continuous exposure can be counted as one event per day, once per hour, once per minute, once per second, or even less. Any of these is appropriate as long as the corresponding mitigation—the regulator effectiveness—is measured in the same per day, per hour, per minute, etc units of reliability.

Another nuance of exposure measurement involves the baseline for resistance. There could be dramatically increased exposure when zero resistance is assumed. That is, the number of potentially damaging events increases when the threshold for damage is lowered. This is detailed in .

Errors of omission and commission

Some errors are actually reductions in mitigation effectiveness while others are direct exposure events. As a minimum, this distinction should be made in a risk assessment. A more general PoF assessment may use only these two categories, applying a numerical effectiveness value (or penalty) to all mitigations involving human actions and also estimating a frequency of future human-error-generated failures (in the absence of any mitigation).

It is important to understand the types of errors possible, perhaps appropriately categorized by their root causes. Error rates associated with each would be estimated in the more robust risk assessments. Then, since certain mitigations have varying effectiveness for each type of exposure, specific pairings would be needed. In preliminary or less robust assessments, satisfactory accuracy may be achieved by treating all exposures the same with all mitigations applied to the collective exposure frequencies.

One possible categorization scheme would group by underlying cause of the error. For example, errors due to:

  • Impairment
  • Lack of knowledge
  • Inattention
  • Apathy
  • Stress.

Another categorization scheme could group by the type of error, including skill-based errors, (memory lapse, slip of action), mistakes (rule-based, ie incorrect application of a good rule, application of a bad rule, or failure to apply a good rule; or knowledge-based) [1019].

PRMM provides a useful background discussion of stress influences on human error and how to incorporate research concepts into a risk assessment.

Cost/Benefit Analyses

As with many other elements of a strong risk assessment, an objective and defensible cost/benefit analysis can be conducted for error-prevention practices whose benefits were previously difficult to quantify. Instrument maintenance and calibration, training, procedures, personnel qualification programs, and many others provide measurable benefits in risk reduction. Their value was always recognized, hence their universal use over many decades of industrial application. However, determining the appropriate level of robustness and justifying additional efforts had to be debated rather than demonstrated via objective analyses.

A good risk assessment provides a more objective, consistent, and defensible way to show benefits—avoided losses—obtainable from risk reduction actions.

Assessing Human Error Potential

As with other failure mechanisms, the most detailed assessment will always pair specific exposures with corresponding mitigations. For example, substance abuse programs will logically reduce only exposure events involving impairment factors; training may only reduce errors having ‘lack of knowledge’ as an underlying cause. However, sufficient accuracy in assessment is often achieved by taking a more general approach, perhaps applying all mitigations equally to all types of exposures.

Although human error is involved in almost every failure, human errors that can directly threaten integrity are relatively rare in most pipeline systems. As noted previously, when service interruption events are included in ‘failure’, the number of possible human error events increases.

Assessing this failure mechanism begins with examinations of error potential in each phase of pipelining.

Design Phase Errors

Design phase errors include incorrect equipment sizing, inappropriate assumptions and/or incorrect calculations regarding loads or resistance, improper materials selection (considering stresses, fatigue, environmental factors of temperature, corrosivity, and others, etc), and others.

The risk assessment could begin with a baseline representing the completely unmitigated exposure—the error rate associated with designs originating from an uneducated, inexperienced, layman, attempting component designs while working in a harsh environment with no tools (ie, computer, calculator, graph paper, etc). A very high error frequency would be expected—perhaps 50% to 90+% (error rates ranging from one in ‘every other designed component’ to ‘every component’)—depending on design complexities and nuances. This error rate would be reduced by the commonplace error reduction measures such as education, training, procedures, certifications, quality checks, etc.

This robust approach has the advantages of valuing each aspect of error-reduction. However, a completely unmitigated error rate may be hard to visualize, given educational, credential, continuing education requirements normally associated with most design practices, not to mention common workplace conditions and tools that further help to improve the processes. A modeling convenience that will often not result to excessive loss of accuracy, might be to begin with an error rate reasonably attributable to a ‘standard’ design process common to the region and era. This standard process may, for example, be a design team of 2-year technical college designer/drafters overseen by an experienced licensed professional engineer. Perhaps this team produces component designs with serious integrity-threatening errors once every 100 designs. Using this as a baseline case, error-reduction measures such as those discussed in this chapter, would reduce the damage potential from that 0.01 error rate.

In identifying where in the design processes errors were more likely, techniques such as HAZOPS can be effective in re-constructing or at least acting as a surrogate for the original design and operations intents.

Construction Phase Errors

Potential errors during construction are similarly assessed. Error rates could be assessed for either a completely unmitigated construction scenario or for some ‘standard’ practice. The latter is more intuitive. Then, additional mitigation measures that are in place can be valued. Risk assessments will require an estimate of weakness probabilities along lengths of the system. The rate of occurrence of weaknesses is logically influenced by error rates during design, manufacture, and installation. The ‘test of time’ rationale (see Exposure ) may be appropriate for both design and construction errors. Recognize however, that some slow-acting phenomena may be attributable to design errors but have simply not yet manifested. Perhaps installations performed in challenging conditions or with questionable quality control can be associated with rates of weaknesses (increased susceptibilities to certain damages, on a per mile basis), despite successful pressure testing and years of operations.

Error Potential in Maintenance

Errors in maintaining equipment, control, and safety systems leads to reduced effectiveness of those systems. An otherwise high reliability of a pressure control regulator or relief valve is lessened when proper maintenance is not performed. The device’s mitigation effectiveness estimate in the risk assessment should reflect this. That is, a poorly maintained pressure regulator may have 10X or more reduced reliability compared to a well-maintained device.

Similarly, errors in corrosion control, marking/locating, patrol, and even public education, all reduce the ability of the mitigation to protect the system.

Operational Errors

Error potential during operations is potentially a direct initiator of failure. An immediate damage or failure event is possible since personnel are actively operating equipment such as valves, pumps, compressors, and many others where incorrect actions or sequences produce unintended results and may cause damages. Emphasis therefore is on error prevention rather than error detection.

For estimating error rates during operations, the unmitigated exposure rate may again be difficult to imagine—an operation with no procedures, no training, no control or safety devices, etc. However, there are some very ‘stout’ systems that, even with no standard mitigations, would still not be damaged or fail by any conceivable operator action, much less an error. For example, if there are no pressure sources that could exceed design limits, including surge potential and blocked-in, liquid-full, heating scenarios, then it may be physically impossible to overpressure any component. In this case, the inherently low risk operation should show very low exposure and perhaps suggest that mitigation is largely unnecessary. Therefore, the estimation of the unmitigated exposure rate to operational errors is important. Distinguishing between systems’ exposure rates may be more important to the determination of PoF than all possible mitigation measures.

Most hazardous substance pipelines are designed with sufficient redundancy in control and safety systems that it takes a highly unlikely chain of events to cause a leak/rupture type failure solely by the improper use of system components. A system can be made to be even more insensitive to human error through physical barriers and intervention opportunities. Nonetheless, history has demonstrated that the seemingly unlikely event sequences occur more often than would be intuitively predicted.

As noted, human error potential involves difficult to assess aspects of a working environment. As a starting point, the evaluator can look for a sense of professionalism in the way operations are conducted. Corporate culture typically guides this. Seemingly unrelated aspects such as a strong safety program, housekeeping, or facility attractiveness can all be evidence of attention and standard of care, which usually also translate to improved error prevention.

The mitigation measures commonly employed are intertwined. For example, better procedures enhance training and vice versa; safety systems supplement procedures; mechanical devices complement training.

Activities requiring high levels of supervision are logically more susceptible to error. Better training and professionalism usually mean less supervision is required.

Special product issues are often affected by human actions, especially when assessing service interruption potential, and can be considered here. For example, hydrate formation (production of ice as water vapor precipitates from a hydrocarbon flow stream, under special conditions) has been identified as a service interruption threat and also, under special conditions, an integrity threat. The latter occurs if formed ice travels down the pipeline with high velocity, possibly causing damages. Similarly, pressure surge events are often generated by human actions. Because such special occurrences are often controlled through operational procedures, they warrant attention here.

A manned facility with no site-specific operating procedures and/or less training emphasis may have a greater incorrect operations-related likelihood of human error than one with appropriate level of procedures and personnel training.

Exceeding Design Limits

The possibility of exceeding any threshold for which the system was designed is an important element of a leak/rupture risk assessment. A measure of the susceptibility of the facility to overstressing is modeled here as a part of the incorrect operations failure assessment. While design limits related to temperature, product velocity, and others are used, pressure exceedances are by far the most common integrity threats to a pipeline. Internal pressure is the most important design threshold for most pipelines and is often the primary design limit of interest. Overpressure will be the focus of this discussion while also illustrating the approach to assess any other relevant design exceedance potential. Other limit states such as temperature, level, flowrate, etc, can follow a parallel assessment path as the one outlined here for overpressure potential. For instance, vessel overfill/overflow can be included in leak/rupture scenarios and modeled in a fashion very similar to overpressure.

The safest scenario occurs when no pressure source exists that can generate sufficient pressure to exceed allowable limits. A system in which it is not physically possible to exceed the design pressure is inherently safer than one where the possibility exists. A pump that, when operated in a deadheaded condition, can produce a maximum of 900-psig pressure cannot, theoretically, overpressure components designed for 1800 psig. In the absence of any other pressure source (including heat) or scenario, this situation suggests that no overpressure exposure exists.

A pipeline system operated at levels well below its original design intent can also be inherently safe from overpressure. This is a relatively common occurrence as pipeline systems, originally designed for more severe conditions, change service or ownership or as throughputs decline. It is also common for pipeline systems to have pressure sources that can exceed allowable stresses, should control/safety systems fail. Note that the adequacy of safety systems and the potential for specialized stresses such as surges and fatigue are examined elsewhere in this model.

Where pressure sources can overstress systems and control and safety systems are needed to protect the facility, then risk increases. This includes consideration of the maximum pumping head and thermally induced pressure increases. Pumps and compressors are often the primary sources of pressure. Inherent overpressure safety occurs when that prime mover is incapable of creating excessive pressure in the assessed component. Certain pumps and compressors are unable to generate excessive pressures, even under ‘deadhead’ (pumping against a blockage) conditions.

Allowable stresses may change with changes in environmental factors such as temperature. For instance, extreme heat or cold can change the stress-carrying capacity of a material, making failure under normal operating pressure possible.

Potential for Threshold Exceedance

Required for a complete risk assessment are knowledge of the source pressure (pump, compressor, connecting pipelines, tank, well, the often-overlooked thermal sources, etc.) and knowledge of the system strength. The first includes pump and compressor deadhead limits; foreign pipeline connections; well connections; and even position along the hydraulic profile (where sufficient pressures to exceed limits cannot be generated). A pump running in a “deadheaded” condition by the accidental closing of a valve or a surge created by the rapid introduction of relatively high volumes of compressible fluids are classic examples of overpressure scenarios. It is important to exclude all considerations of pressure control and overpressure safety systems at this point.

Sources of overpressure should include scenarios of ‘blocked-in, fluid-full with subsequent heating’ (where the fluid has no room to expand) that aren’t already captured in elsewhere. For instance, daytime heating of liquid trapped in a pipe segment, valve body, etc, is efficiently captured here, while an external fire scenario is probably better captured in geohazard or sympathetic reaction scenarios.

It is sometimes difficult to obtain the maximum pressure potential as it must be defined for the ‘exposure’ assignment, ie assuming absence of all safety and pressure-limiting devices. This is especially true when a foreign entities owns and operates a pressure source. Foreign ownership is common when the source is a connecting pipeline, a storage facility, or other non-owned delivery into a system being assessed. When the pressure source is not under operator control, the evaluation can be either more complex or involve more simplifying assumptions. In examining the overpressure potential, the evaluator may have to obtain information from operators of owned-by-others connecting equipment to understand the maximum source pressure potential. When another division, group, company etc controls both exposure and mitigation, their applied mitigation is usually more efficiently embedded in the exposure estimate. See discussion in .

Ultimately, a simple yes/no answer should be available to answer this first question of ‘can a threshold be exceeded?’ Rare scenarios should still generate a ‘yes’ answer. Their improbability will be captured in the exposure value assigned. For instance, in a high volume system transporting highly compressible fluid (gas), overpressure might be conceivable, but only after many hours of ‘packing’. This scenario still warrants a ‘yes’ answer to the ‘is overpressure possible?’ question, but the high improbability should be considered when assigning exposure rates.

When the answer is ‘yes’, an exposure is estimated for each plausible scenario. All sources of overpressure (or other threshold exceedances) should first be identified. Then all credible scenarios generating overpressures should be identified. Risk analyses tools such as HAZOPS and PHA are often very efficient in providing an exhaustive list of scenarios. Sometimes, frequencies are also assigned to each as part of those analyses. The frequencies of each scenario is then estimated, under the assumption that there is no mitigation—above that available to resist the MOP—exists. Some scenarios may only manifest under a relatively complex chain of events. In assigning a rate of exposure, the evaluator must sometimes determine the implied time period for an overpressure event to manifest. Would it take only the inadvertent closure of one valve to instantly build a pressure that is too high? Or would it take many hours (and many missed opportunities to intervene) before pressure levels were raised to a dangerous level?

To define the ease of reaching MOP (whichever definition of MOP is used) some qualitative descriptors can be created to envision the possibilities. A range of possibilities is illustrated by the following:

A. Continuous exposure, for example, one exposure per minute occurs[1]

Where routine, normal operations would, absent preventive measures, continuously expose the component to design pressure or higher. Overpressure is prevented by pressure control equipment, procedure, or safety device.

B. Rare exposure, for example, once every few years of operation

Where overpressure can occur only through a combination of multiple procedural errors or omissions or would require long periods of ‘packing’. In these cases, exposure estimates may be challenging to produce, perhaps generated from a PHA/HAZOPS type process that quantifies the likelihood of each step in such unlikely scenarios

C. Impossible, for example, essentially zero incident potential per year

Where direct or indirect pressure sources cannot, under any conceivable chain of events, overpressure the pipeline.

Overpressure can occur rather easily in some systems. Overpressure could occur fairly rapidly due, perhaps, to ‘packing’ a segment of incompressible fluid For example, The only protective measures may be procedural, where the operator is relied on to operate 100% error free, or a simple safety device that is designed to close a valve, shut down a pressure source, or relieve pressure from the pipeline.

If exceedance of some design limit is avoided only through perfect operator performance and one safety device, a higher probability of exceedance—often leading to failure—is being accepted. Error-free work activities are not realistic and industry experience shows that reliance on a single safety device, either mechanical or electronic, inevitably leads to gaps in protection.

In other systems, overpressure is possible and protection is achieved via redundant levels of control or safety devices. These may be any combination of controllers (for example, pressure, flowrate, etc), relief valves; rupture disks; mechanical, electrical, or pneumatic shutdown switches; or computer safeties (programmable logic controllers, supervisory control and data acquisition systems, or any kind of logic devices that may trigger an overpressure prevention action). When at least two independently operated devices are available to prevent overpressure of the pipeline, the accidental failure of at least one safety device, is offset by the backup protection provided by another.

Operator procedures are normally also in place to ensure the pipeline is always operated at levels below design limits. Any safety device can be thought of as a backup to proper operating procedures and, hence, as an independent mitigation measure. Industry experience shows a procedural error coincident with the failure of two or more levels of safety is not as unlikely an occurrence as it may first appear.

In other systems, situation where sufficient pressure could be introduced and the pipeline segment could theoretically be overpressured, but the scenario is extremely unlikely. An example would be a compressible fluid in a larger volume pipeline segment, requiring longer times to reach critical pressures. For example, a large diameter gas line would experience overpressure if a mainline valve were closed but only if the situation went undetected for hours.

In order to assess the exposure rate for a particular design limit exceedance, say, ‘overpressure’, a measure of tolerable pressures is needed. The most readily available measure of this will normally be the documented maximum operating pressure or MOP. Design pressure and/or maximum allowable pressures values may also be available. These values must be dissected to understand the true strength of the component, free from safety factors and influences of other intermittent loadings and nearby weaknesses. The risk assessor must decide, in the context of desired PXX and trade-offs between complexity and robustness, the extent of simultaneous consideration of changing resistance (for example, from extreme temperature effects reducing material capabilities, unanticipated external loadings such as debris impingement in flowing water, etc) with loadings potentially contributing to overpressure. This is also discussed in .

Surge potential

The potential for pressure surges, or water hammer effects, is assessed as a form of human error. A background discussion is provided in PRMM.

When surges are possible, operating procedures to prevent surge scenarios are normally in place. Additional mitigation may include mechanical devices such as surge tanks, relief valves, and slow valve closures.

In a robust risk assessment, the surge required to cause damage to the component being assessed (or a hypothetical component without resistance), would be calculated. This would also consider weakness potential since a component, weakened by corrosion, cracking, gouges, additional stresses, or others, may be able to withstand only a fraction of the surge load otherwise tolerable.

  1. Assessment of surge potential:

Consider the surge example from PRMM: A crude oil pipeline has flow rates and product characteristics that are supportive of pressure surges in excess of MOP. The only identified initiation scenario is the rapid closure of a mainline gate valve. All of these valves are equipped with automatic electric actuators that are geared to operate at a rate less than the critical closure time. If a valve must be closed manually, it is still not possible to close the valve too quickly—many turns of the valve handwheel are required for each 10% valve closure.

In a preliminary P90 assessment, the evaluator assigns an exposure of about one valve closure event per month with a 98% reliability for each valve actuation; PoD from surge = 5 events/year x (1 – 98%) = 0.1 damages/year (a damage scenario about once every 10 years, involving the failure of an actuator to properly close the valve). Sources of conservatism (P90) in this estimate are documented by the evaluator and include intentional overestimation of aspects such as the expected annual frequency of valve operations, the fraction of the year where flowing conditions are sufficient to generate a significant surge, the number of surges that could cause damage, etc.

Mitigation

Control and Safety systems

Control systems and safety devices, as an important aspect of the risk picture, are included here in the incorrect operations assessment. This is done under the premise that control systems are a surrogate for human actions—operating the system within design parameters; and safety systems exist as a backup for situations in which human error causes or allows design thresholds to be reached. Both systems therefore impact the possibility of a pipeline failure due to human error.

This discussion will focus on the role of control and safety systems in preventing leaks/ruptures. Their expanded role into preventing or mitigating service interruption scenarios is covered in . The role of control and safety systems in consequence potential is discussed in .

A control or safety device continuously mitigates against exceedance of a threshold. Control and safety systems can be as simple as a single device—perhaps a regulator, pressure switch, or a relief valve. They can be also extremely sophisticated and complicated: completely orchestrating product movements through multiple prime movers—pumps or compressors—associated with multiple pipeline systems, while monitoring and reacting to all events that may lead to a design parameter excursion, and recording and archiving all events and status conditions. A wide array of sensors, switches, and computers accompany most modern pipeline control/safety systems. Flowrate or pressure regulation valves are examples of devices that often mitigate against overpressure while also ensuring operational efficiencies.

For purposes of this part of the assessment, control and safety systems can both be treated as mitigation. When terminology ‘safety system’ or ‘safety device’ is used, the intention is to also include control system and control device.

Control/safety systems that employ computer-based logic are common. These allow more complex actions and sequences to be orchestrated, controlled, and protected but also create additional failure points. A modern risk assessment will need to include an evaluation of all computer permissives programs for all facilities, including PLC, SCADA, and other logic-based processes.

As in other aspects of this risk assessments, it is important to separate mitigation and resistance from exposure for systems under the operator’s control, but this separation is often problematic when estimating exposure rates from systems controlled by others. A distinction between safety systems controlled by the pipeline operator and those outside his direct control is usually warranted. Risk assessment expanded into an assessment of non-owned systems is certainly possible, but requires cooperation from the other owner.

Safety systems evaluation  

Failure potential is reduced as safety systems are able to reliably interrupt a sequence of events that would otherwise result in damage or failure. Understanding of this intervention opportunity began with the identification of exposure scenarios and now requires identification and evaluation of the various actions that initiate, or are initiated by, devices involving, for example, changing level, flow, temperature, and pressure conditions. When devices are established to initiate independent action—without human intervention—to protect systems, they offer direct mitigation benefit. If false alarms can be minimized, then safety systems that automatically close valves, stop pumps, and/or isolate equipment in extreme conditions are very valuable. When complete autonomous action is not appropriate, human action in combination with safety systems provide mitigation. Early warning alarms and status alerts when actions are taken should ideally be sent to a monitored control center. Also valuable is the ability of a manned control center to remotely activate equipment, including isolation and shutdown devices, to avoid or minimize damage scenarios. Less effective, especially for unmanned, infrequently visited sites, but still useful are safety systems that merely produce a local indication of abnormal conditions.

Safety systems that provide increasing station facility overpressure protection beyond specific equipment shutdown and isolation, include equipment lock-out, station isolation, station lock-out, and relief systems. Lock-out typically requires a person to inspect the station conditions prior to resetting trips and restarting systems.

A sometimes complex chain of events needs to be identified and scrutinized to fully understand certain failure scenarios involving failures of control systems, especially when interacting electronic components are involved. Electronic systems can often fail in multiple ways by a variety of effects (for example, EM pulses) that do not threaten most other components.

To ensure the on-going adequacy of safety systems, periodic reviews are valuable. Such reviews should also be triggered by formal management of change policies or anytime a change in made in a facility. HAZOPS or other hazard evaluation techniques as well as instrument-specific techniques such as LOPA, are commonly used to first assess the need and/or adequacy of safety systems. This is often followed by a review of the design calculations and supporting assumptions used in specifying the type and actions of the device. The most successful program will have responsibilities, frequencies, and personnel qualifications clearly spelled out. Many regulations for pipelines require or imply an annual review frequency for overpressure safety devices.

As an early step in the risk assessment, each portion of the pipeline system being assessed must be associated with its potential exposure scenarios and relevant control/safety systems. Each safety device located at a pump/compressor stations, metering facility, storage facility, or control center will often influence, if not protect, many miles of the system. For instance, a pressure regulator impacts all system components downstream of its location and possibly upstream as well. A pump motor shut off switch often impacts miles of system both upstream and downstream of its location.

The next step is to assess the reliability of each safety device, considering all potential device failure modes including loss of power or communications. Some valves and switches are designed to “fail closed” on such interruptions. Others are designed to “fail open,” or remain in its last position: “fail last.” The important thing is that the equipment fails in a mode that leaves the system in the least vulnerable condition, ie ‘fail safe’.

This can be a very complex process, as is detailed in industry standards for SIL and LOPA. Alternatively, reasonable estimates can also be generated with only a few inputs and in a short time. Of course, the latter approach will be less robust and, consequently less defensible, but perhaps sufficient, especially for preliminary risk estimates.

For all control/safety devices, the evaluator should examine the status of the devices under loss of power or communications scenarios.

In a more robust analysis, guidance is available from sources such as ref [1002], as excerpted below:

Multiple Protection Layers (PLs) are normally provided in the process industry. Each protection layer consists of a grouping of equipment and/or administrative controls that function in concert with the other layers. Protection layers that perform their function with a high degree of reliability may qualify as Independent Protection Layers (IPL). The criteria to qualify a Protection Layer (PL) as an IPL are:

  • The protection provided reduces the identified risk by a large amount, that is, a minimum of a 10-fold reduction. The protective function is provided with a high degree of availability (90% or greater).
  • It has the following important characteristics:
  1. Specificity: An IPL is designed solely to prevent or to mitigate the consequences of one potentially hazardous event (e.g., a runaway reaction, release of toxic material, a loss of containment, or a fire). Multiple causes may lead to the same hazardous event; and, therefore, multiple event scenarios may initiate action of one IPL.
  2. Independence: An IPL is independent of the other protection layers associated with the identified danger.
  3. Dependability: It can be counted on to do what it was designed to do. Both random and systematic failures modes are addressed in the design.
  4. Auditability: It is designed to facilitate regular validation of the protective functions. Proof testing and maintenance of the safety system is necessary.
  5. Only those protection layers that meet the tests of availability, specificity, independence, dependability, and auditability are classified as Independent Protection Layers.

This reference cites some typical probability of failure on demand (PFD) values for certain independent protection layers.

independent protection layers

PFD

relief valve

10-2

human performance (no stress)

10-2

human performance (under stress)

0.5 to 1.0

operator response to alarms

10-1

overpressure of well maintained vessel

10-4

Some annual failure rate examples are also offered [1002]

Low

a failure or series of failures with a very low probability of occurrence within the expected lifetime of the plant,

eg 3 or more simultaeous instrument, valve, or human failures; spontaneous failure of single tank or process vessel

<10-4

Medium

a failure or series of failures with a low probability of occurrence within the expected lifetime of the plant;

eg dual instrument or valve failure; combination of instrument failure and operator error;

single failure of small process lines or fittings

<10-4 to 10-2

High

a failure can reasonable be expected to occur within the lifetime of the plant; eg process leaks single instrument or valve failure; human errors that result in material releases

>10-2

Alarms and other systems that rely on human intervention are logically more susceptible to failure on demand. Error potential is reduced when the condition-sensing device or permissive limit exceedances automatically initiate a full, or partial, shutdown of affected station equipment, with an alarm to remote/local personnel. In the absence of automatic actions, condition-sensing device or permissive limit exceedances may issue an alarm at a continuously manned location that requires operators to evaluate the conditions and remotely initiate a full, or partial, shutdown of affected station equipment.

The potential for human error to incorrectly/inadvertently isolate the safety device from the component(s) being protected is also an important part of this analysis. Note that some systems provide no plausible scenario where such human error could cause such isolation, for example a three way valve with redundant devices.

The maintenance and calibration protocols used on the safety device should also be included in the analyses. Most published reliability rates would assume adherence to the device manufacturer’s recommended maintenance and calibration practice. In practice, however, it is not uncommon for a company to choose a more- or a less-robust protocol. Note that a superior risk assessment can show the value of changes in maintenance/calibration practice by estimating the corresponding changes in device reliability.

Different reliability values are acceptable depending on the criticality of the process being protected. At the highest levels of protection, reliabilities such as the following would be expected:

Low Demand Mode of Operation

High Demand or Continuous Mode of Operation

PFD

Probability of dangerous failure per hour

10-5 to 10-4

10-9 to 10-8

at the lowest protection level, values such as the following may be appropriate:

Low Demand Mode of Operation

High Demand or Continuous Mode of Operation

PFD

Probability of dangerous failure per hour

10-2 to 10-1

10-6 to 10-5

[1003]

Finally, the reliability of each sub-system is combined for an estimate of the overall reliability. Manufacturer’s stated reliability values will usually be based on ideal conditions and maintenance practices. Variations from ideal should be considered in the risk assessment. For maintenance, this will require at least some understanding of various control/safety system’s “predictive and preventative maintenance” (PPM) programs, including equipment/component inspections, monitoring, cleaning, testing, calibration, measurements, repair, modifications, and replacements. See further discussion of maintenance later in this chapter.

The reliability and timeliness of SCADA dispatch processes would also need to be assessed as part of the overall mitigation effectiveness of safety systems providing alerts only.

  1. Assessing a set of safety systems:

Consider a pipeline connected to a pump capable of overpressuring a component. A pressure regulator and multiple safety devices are installed to avoid overpressure. A pressure-sensitive switch halts flow upon high pressure indications; and a relief valve will open and vent the entire pumped product stream to a flare upon an extremely high pressure indication. This facility is remotely monitored by a SCADA system, transmitting appropriate data (including pressures) that is continuously monitored in a control center. Remote shutdown of the pump from the control center is possible. Communications for data received in the control room as well as control instructions generated by the control center are deemed to be 98% reliable.

Exposure is assessed as ‘continuous’ and quantified as ‘every minute’: 60x24x365 = 525,600 events/yr.

Note that four levels of mitigation are present (regulator, pressure switch, relief valve, control room monitoring), any of which is capable of providing full protection. With preliminary, conservative reliability values of 99% assigned to each of the first three and 50% to the last (with consideration of human error and communications outage rates), combined mitigation effectiveness is 99% OR 99% OR 99% OR 50% = 99.99995%.

This results in a PoD estimate of 0.26 events/yr [525,600 events/yr x (1 – 99.99995%) = 0.26], a damaging overpressure event, perhaps causing at least a minor permanent deformation, about once every 4 years.

Procedures

The use of procedures to ensure correct operations and avoid errors is well known. As a means of mitigating scenarios that precipitate failure, procedures and their use should be a part of the mitigation effectiveness estimates.

A range of quality, rigor, and utility exists among operators’ procedures and often within different functional or geographical areas of the same operator. A list of ingredients that distinguishes the most effective use of procedures can first be created as the program that warrants the highest effectiveness estimate. Perhaps first among the ingredients is a corporate culture that requires the adherence to procedures—ie, their correctness and everyday use. Without this, the desire to follow a procedure correctly may be missing.

Since each mitigation measure is evaluated independently from others, we assume there has been no training on the procedures. Some might think this is an unreasonable position—training and procedures are so intertwined that independent evaluations of the two seems nonsensical to many. But this is not necessarily the case. Procedures alone can be clear and complete enough to produce error-free operations in some cases. Here’s an example to illustrate. Good procedures allow the purchaser of a shipped, disassembled table to assemble that table properly and without incident, even though there has been no training on table assembly. The procedure stands on its own merit. However, the desire by the purchaser to correctly complete the assembly is critical to the success rate.

Most would agree that the highest rated, ie, most effective, procedure system would have all of the following ingredients:

  • Strong corporate culture mandating their prominent role in day-to-day activities
  • Clearly written
  • Complete coverage of all tasks in all procedures
  • User-friendly format and beyond—perhaps even enticing and entertaining to the user
  • Use of video, photographs, illustrations, etc as appropriate for optimum understandability and utility
  • Regularly reviewed and refreshed
  • Field-tested and verified regularly
  • Validated by independent audit
  • Readily retrieved and protected (version control) by robust document management system.

Many technical writing ‘best practices’ could be consulted to provide further guidelines for “what makes an excellent procedure”.

In a superior program, there should be evidence that procedures are actively used, reviewed, and revised. Such evidence might include filled-in checklists and procedures in active use in field locations and with field personnel.

Activities near a pipeline, but not actually on it, are also appropriately included when such activities may have risk implications. For instance, nearby excavations can impact a pipeline’s support conditions, perhaps increasing exposure from landslide, erosion, or subsidence.

Locating processes—finding and marking buried utilities prior to excavation activities—are important for any subsurface system, but perhaps especially so for distribution systems that often coexist with many other subsurface structures. Such procedures may warrant additional attention in this evaluation.

A protocol should exist that covers procedures maintenance: who develops them, who approves them, how training is done, how compliance is verified, how often they are reviewed, what is the update process, etc. A document management system should be in place to ensure version control and proper access to most current documents. This is commonly done in a computer environment, but can also be done with paper filing systems.

While procedures are normally a mitigation measure, they may alternatively generate exposures, especially in abnormal operations. Procedure execution during operations that can put the system integrity at risk, are a part of the exposure rate in the risk assessment.

Any recent history of station procedure-related problems should be investigated for evidence of procedure effectiveness.

Mitigation Effectiveness

Transmission pipeline company SME’s have typically assigned maximum effectiveness values in the range of 30% to over 90%, based on their experiences and ideas of how effective the highest quality procedures program could be, as a stand-alone error prevention item. For perspective, the higher end of this range assumes that fewer than 1 out of 10 otherwise damaging events would occur solely through the hypothetical best procedures program (assuming no training or other mitigations)—9 out of 10 are avoided—while the lower end assumes only 3 out of 10 events are avoided by the best program. Actual effectiveness values are then assigned based on differences from the idealized, perfect program.

SCADA/communications

Background

A SCADA system allows remote monitoring (of parameters such as pressures, flows, temperatures, and product compositions) and some remote control functions, normally from a central location, such as a control center. Standard industry practice for hydrocarbon transmission pipelines in most western countries is 24-hours-per-day monitoring of “real-time” critical data with audible and visible indicators (alarms) set for abnormal conditions. At a minimum, control center operators normally have the ability to safely shut down critical equipment remotely when abnormal conditions are seen.

Interfaces between the pipeline data-gathering instruments and conventional communication paths such as telephone lines, satellite transmission links, fiber optic cables, radio waves, or microwaves facilitate the delivery of information to and from the control center. Modern communication pathways and scan rates can refresh data at least every few seconds with 99.9% + reliability and often include redundant (sometimes even manually implemented dial-up telephone lines) pathways in case of extreme pathway interruptions.

A SCADA system often serves also as safety devices, when computer logic is used to control critical operational parameters.

In providing an overall view of the entire pipeline from one location, a SCADA system facilitates system diagnosis, leak detection, transient analysis, and work coordination, thereby impacting risk in several ways including:

  • human error avoidance,
  • surge avoidance,
  • leak detection,
  • emergency response.

The focus in this part of the risk assessment is on the role of SCADA in human error avoidance; for example, mitigation of incorrect operations.

SCADA Capabilities

See PRMM for a discussion of SCADA system concepts.

When the SCADA provides control or safety functions, its role in damage/failure prevention is captured as another level of safety system (see previous discussions). The more technical aspects of kind and quality of data and control (incident detection) and the use of that capability in consequence minimization (ie, leak detection and emergency response), can be assessed in the measure of consequence potential (see ).

  1. Control Center as part of SCADA

Error Prevention

Setting aside for now its role as a safety system and consequence minimizer, the emphasis here is on the SCADA role in reducing human error-type incidents. From the human error perspective only, the major considerations are that a second “set of eyes” is monitoring, is hopefully consulted prior to field operations, is involved with all critical activities, and that more reliable coordination of the system operations is provided. Although human error potential exists in the SCADA loop itself—more humans involved may imply more error potential, both from the field and from the control center—the cross-checking opportunities offered by SCADA can reduce the probability of human error in operations. One emphasis should therefore be placed on how well the two locations are cooperating and cross-checking each other.

Protocols that require field personnel to coordinate all station activities with a control room offer an opportunity for a second set of eyes to interrupt an error sequence. In the best practices, critical stations are identified and must be physically occupied if SCADA communications are interrupted for specified periods of time. Proven reliable voice communications between the control center and field should be present. When a host computer provides calculations and control functions in addition to local station logic, all control and alarm functions should be routinely tested from the data source all the way through final actions.

While transmission pipeline systems are common users of SCADA, these mitigation concepts apply to offshore, distribution, gathering pipelines, as well as tank farms, pumps stations, platforms, etc., even where a standard SCADA is not being used. As a means of reducing human errors, the use any system or protocol of regular coordination of actions between multiple observers, such as field operations and a central control is an intervention point for human error reduction. Some systems and facilities have protocols for communications/coordination producing benefits of multiple eyes and minds confirming actions, although a SCADA type system is not present. Some facilities will have distributed control and monitoring (DCM) systems that act like SCADA albeit in a more limited geographical area.

Mitigation Effectiveness

Transmission pipeline company SME’s have typically assigned maximum effectiveness values in the range of 5% to 30%, based on their experiences with SCADA systems in human error avoidance. For perspective, the higher end of this range assumes that 3 out of 10 otherwise damaging events are avoided solely through the use of a superior SCADA system while the lower end assumes only 5 out of 100 events are avoided. Actual effectiveness values are then assigned based on differences from the idealized, perfect program.

Substance Abuse

Errors with an underlying cause of ‘impairment’ can be partially mitigated by programs to manage substance abuse. In some countries, government regulations or common industry practice require drug and alcohol testing programs for certain classes of employees in the transportation industry.

Since these mitigation measures are focused on specific types of human errors—those involving impairments—they are most correctly applied only to those exposures.

Mitigation Effectiveness

In transmission pipeline companies that operate free of significant substance abuse issues, SME’s have typically assigned maximum effectiveness values in the range of 1% to 5% for exceptional substance abuse programs. For perspective, even the higher end of this range assumes that only 5 out of 100 otherwise damaging events are avoided solely through this program while the lower end assumes only 1 out of 100 events are avoided. Actual effectiveness values are then assigned based on differences from the idealized, perfect program.

Safety/Focus programs

With inattention being an underlying factor in many human error events, company programs that provide focus may act as mitigation, even when not directed specifically at the failure being measured by the risk assessment. A focus on employee safety is an example. An employee safety program[2] is a nearly intangible but still important factor in a risk assessment (although very central to employee safety risk management).

It is intangible in the sense that the impact on human error potential derived from a strong safety program is difficult to quantify. However, most would agree that the extra care and attention to routine tasks that is fostered by a high level of safety awareness and a corporate culture of safety should translate to some benefits in all types of human error avoidance.

Similarly, other peripheral company focuses such as on good “housekeeping” practices can be revealing. Housekeeping can include treatment of critical equipment and materials so they are easily identifiable (using, for instance, a high-contrast or multiple-color scheme), easily accessible (next to work area or central storage building), clearly identified (signs, markings, ID tags), and clean (washed, painted, repaired). Housekeeping also includes general grounds maintenance so that tools, equipment, or debris are not left unattended or equipment left disassembled. All safety-related materials and equipment should be maintained in good working order and replaced as recommended by the manufacturer. Station logs, reference materials, and drawings should be current and easily accessible, in the more effective programs.

Mitigation Effectiveness

Transmission pipeline company SME’s have typically assigned maximum effectiveness values in the range of 1% to 5%, based on their experience. For perspective, even the higher end of this range assumes that only 5 out of 100 otherwise damaging events are avoided solely through this type program, even the best conceivable, while the lower end assumes only 1 out of 100 events are avoided. Actual effectiveness values are then assigned based on differences from the idealized, perfect program.

Training

Training is a key mitigation measure protecting against human error. PRMM discusses a list of key ingredients in a training program:

Documented minimum requirements

  • Testing
  • Topics covered:
  • Observed and assessed performance of actions
  • Job procedures (as appropriate)
  • Scheduled retraining
  • Proficiency testing and periodic re-testing
  • Detailed record-keeping
  • Progress/performance tracking.

Training on tasks whose execution can put the system integrity at risk, are especially critical to the risk assessment. A high level of worker turnover makes training even more critical. Both of these aspects should be included in the risk assessment.

For maximum effectiveness as a risk mitigation, written procedures dealing with all operational actions, abnormal and emergency actions, repairs, and routine maintenance should be readily available. Not only should these exist, it should also be clear that they are in active use by the personnel. The recommendation here is to look for checklists, revision dates, and other evidence of their use. Procedures supplement training by helping to ensure consistency. Specialized procedures are required to ensure that original design factors are still considered long after the designers are gone. A prime example is welding, where material changes such as hardness, fracture toughness, and corrosion resistance can be seriously affected by the subsequent maintenance activities involving welding.

The assessment should consider the effectiveness of the retraining schedule and the periodic retesting in terms of their ability to adequately verify employee skills. Higher workforce turnover rates have been correlated to increased error rates, due to loss of experience and training benefits that otherwise accrue to a more stable workforce. This could be an influencing factor when assigning mitigation effectiveness.

Mitigation Effectiveness

Transmission pipeline company SME’s have typically assigned maximum effectiveness values in the range of 30% to over 90%, based on their experiences and ideas of how effective the highest quality training program could be, as a stand-alone error prevention item. For perspective, the higher end of this range assumes that fewer than 1 out of 10 otherwise damaging events would occur, prevented solely through the hypothetical best training program (assuming no procedures or other mitigations) while the lower end assumes only 3 out of 10 events are avoided by the best program. Actual effectiveness values are then assigned based on differences from the idealized, perfect program.

Mechanical error preventers

The role of mechanical error preventers as mitigation measures should reflect the combined effectiveness of the devices/measures being rated. Examples of common devices/measures are noted in PRMM as:

  • Three-way valves with dual instrumentation
  • Lock-out devices
  • Key-lock sequence programs
  • Computer permissives—logic controls that will prevent certain actions from being performed out of sequence
  • Highlighting of critical instruments.

Effectiveness should reflect the combined (OR gate addition) of each application. An application is valid only if the mechanical preventer is used in all instances of the scenario it is designed to prevent.

Transmission pipeline company SME’s have typically assigned maximum effectiveness values in the range of 30% to over 90%, based on their experiences and ideas of how effective the highest quality mechanical error prevention program could be, as a stand-alone error prevention item. This varies widely based on the type of facility being assessed since, for some, a wide range of mechanical devices are possible and practical, but for others, few devices are available. For perspective, the higher end of this range assumes that fewer than 1 out of 10 otherwise damaging events would occur solely through the hypothetical best program (assuming no training, procedures, or other mitigations) —9 out of 10 are avoided—while the lower end assumes only 3 out of 10 events are avoided by the best program. Actual effectiveness values are then assigned based on differences from the idealized, perfect program.

Resistance

As discussed here, many of the damage scenarios for leak/rupture that are directly caused by human error involve overpressure. Therefore, internal pressure related stress-carrying capacity is a main consideration for resistance from human errors. The defect-free stress carrying capacity is readily calculated for most pipeline components. Inclusion of possible defects is then added to the analyses as detailed in .

When scenarios such as vessel overflow/overfill are included, an assessment approach directly analogous to overpressure can be efficiently employed. Resistance may be minimal for such scenarios, unless features such as secondary containments are included as resistance rather than as consequence minimizers.

Under an expanded definition of ‘failure’, a system’s resistance to human error is more complex. A system’s ability to absorb excursions of contaminants, flowrate deviations, etc can be multi-faceted. Aspects such as time to overpressure (or exceed some other threshold) should be included in either exposure estimates or resistance estimates. For instance, a high volume system transporting a highly compressible fluid (gas), will often have a degree of inherent resistance (or reduced exposure) since overpressure is possible, but only after many hours of ‘packing’. See also the discussion in .

Introduction of Weaknesses

In addition to real time failures, human error contributes to delayed failures via the introduction of unintended weaknesses into a pipeline system. These may occur in any of the four phases introduced previously: design, construction, operations, and maintenance. For each phase, estimates of types and frequencies of weaknesses created should be estimated. For each potential type of weakness, a reduction in load-carrying capacity will be required in order to fully understand the impact on risk. This is fully discussed in . In this chapter, discussion of sources of human-error types of weaknesses is offered, to assist in the estimation of possible weaknesses in any component being risk-assessed.

Potential errors committed during the design/construction can be difficult to assess for an existing pipeline. Historic design processes are often not well defined or documented and are often highly variable. Nonetheless, an assessment resulting in at least a rough estimate of weakness potential is prudent.

Errors or, by today’s standards, inferior practices in past design/construction practice, tend to introduce weaknesses and will most often appear as resistance issues in a modern risk assessment. Even though a system with a weakness may operate with sound integrity for many decades, the presence of that weakness, coupled with certain loadings, can eventually precipitate a failure. The risk assessment should identify and quantify the types of weaknesses that may be present.

The suggested approach is for the evaluator to seek evidence that error-preventing actions were taken during the design/construction phases. If design/construction documents are available, a check or certification can be done to verify that no obvious errors have been made. Otherwise, evidence such as from inspections and testimony of SME’s may drive the assessment.

details the types of weaknesses commonly encountered in pipelines. Here, the potential for such weaknesses is discussed.

Design

A formal hazard identification process during design helps to ensure that all threats are understood and appropriately mitigated. HAZOP studies and other appropriate hazard identification techniques are discussed in . These techniques provide value inputs into estimates of exposure, mitigation, resistance, and consequence. Thoroughness and timeliness are important: if this type of analysis is not available from original design, it can be performed at any time and results used to strengthen the risk assessment.

Potential design errors include flaws revealed during operations and maintenance practices. While often more ‘real-time’, apparent O&M errors can also conceivably manifest long after the actual error-introducing activity has occurred. For example, a mis-designed flow/pressure control system that operates satisfactorily for years until a rare combination of factors causes the controls to overpressure a component.

Material selection

The assessment should consider that the rigor with which proper materials were identified and specified with regard to all plausible stresses.

Notably in distribution systems, a wide range of pipe and appurtenance materials have been used with a variety of different joining techniques. Some of these choices have later proven to be problematic, from an integrity standpoint. Certain installations of cast iron, plastics, tees, and couplings have generated a disproportionate amount of failures for some operators.

Given that a certain amount of care and prudence is associated with the ‘standard’ practice, risk reduction for this item can be based on the existence and use of additional control documents and procedures—beyond standard practice— that govern all aspects of pipeline material selection and installation. Superior practices can influence the risk assessment via reduced incidences of weaknesses.

QA/QC Checks

The risk assessment should consider the extent to which design calculations and decisions were checked for errors at key points during the design, material procurement, and installation processes.

Given a certain amount of error-checking in the ‘standard’ practice, assignment of additional mitigation effectiveness would be warranted for systems whose design process was more carefully monitored and checked. This would be reflected in a reduced rate of weaknesses to be associated with the components/segments benefiting from the more aggressive quality assurance programs.

Construction/installation

Typical construction-error risk elements are discussed in PRMM and here in . When a mitigation or an exposure exceeds the norm assumed in the error rate produced by ‘standard’ practice, the influences of these factors should be included in the assessment.

For assessing the potential for construction phase weaknesses in a system, the evaluator should seek evidence regarding the steps that were taken to ensure that the pipeline section was constructed correctly. This includes the construction specifications as well as checks on the quality of workmanship during installation.

Challenging installation conditions are logically linked to potentially higher error rates. Offshore, arctic and tropical environments, congested urban areas are a few examples of more difficult conditions. When it can be determined that an installation period involved difficulties due to weather, labor disputes, resistance from outside parties, excessively aggressive time urgencies, and other influences, error rates would similarly be expected to increase. Delayed effects from sabotage activities can also be included here. For instance, an intentionally drilled hole partially through a pipe wall can be treated as a resistance reduction just as a defective girth weld.

Construction errors on distribution systems may be more common due to the increased level of continuous construction activity coupled with the variability of construction crews, and materials used, all often spanning several decades of installation.

Weaker inspection practices during construction suggest higher incidence rates of errors; for example, an assumption that more weaknesses were introduced.

Questionable materials purchase, receipt, or installation practices should result in higher estimates of weaknesses in a system.

Less than 100% inspection of all joints, failure to meet minimum industry-accepted practices, questionable practices, or other uncertainties should lead to higher estimated incidences of weaknesses when conducting a conservative risk assessment.

Uncertain practice of backfill/support techniques during construction warrants consideration of higher rates of coating defects as well as strength reductions such as dents and gouges.

High levels of residual stresses due to improper handling have played a role in historical failures. Transportation fatigue—the growth of cracks in larger diameter pipes, transported by rail prior to improved handling protocols—is another example of a handling-related failure contributor.

The evaluator may assume reduced incidences of weaknesses when he sees evidence of superior materials handling practices and storage techniques during and prior to construction. Calculations can be performed to assess the susceptibility of certain pipe specifications to damage by improper handling. When susceptible, weaker handling practices warrants higher incidences of weaknesses.

Field-applied coatings (normally required for joining) are problematic because quality control, including the effects of ambient conditions, are difficult to manage. Careful control of temperature and moisture is normally required and all coating systems will be sensitive to some extent to surface preparation.

A major integrity threat to some pipelines is the presence of CP-shielding, poorly applied coatings over girth welds. When just one of these issues is present, the pipeline may still experience a long life with few extraordinary considerations needed. However, the presence of both shielding coating and disbondment creates a systemic threat to integrity that is challenging to manage.

Because overpressure protection is identified as a critical aspect in a many pipeline systems, maintenance of regulators and other pressure control devices is critical. The evaluator should seek evidence that regulator activity is monitored and periodic overhauls are conducted to ensure proper performance. Other pressure control devices should similarly be closely maintained.

The care of an odorization system in a gas distribution system should also be considered, with questionable maintenance practices leading to reduced leak detection capabilities.

Severe weather preparatory programs are common for many facilities and are logically included in a risk assessment. These might include hurricane, windstorm, flood, ice/hail, wildfire, and extreme temperature events such as freeze protection programs.

Other preparatory events can be examined in a similar fashion, with results informing risk assessment inputs.

  1. See discussion under

  2. A safety program is different than a safety system, with the latter referring to physical devices that prevent exceedances of pressure, flowrates, etc.