Older Techniques: Scoring/Indexing

“Attempts to assess risk without estimating frequencies and intensities is like designing a bridge without considering loads/forces. You’re not really assessing/designing, you’re only intuiting.”

What’s wrong with my current scoring, indexing approach?
From Text 3.2.2
Some of the more significant compromises arising from the use of the simple scoring type assessments include:

-Without an anchor to absolute risk estimates, the assessment results were useful only in a rather small analysis space. The results offered little information regarding risk-related costs or appropriate responses to certain risk levels. Results expressed in relative numbers were useful for prioritizing and ranking but were limited in their ability to forecast real failure rates or costs of failure. They could not be readily compared to other quantified risks to judge acceptability.

-Assessment inputs and results cannot be directly validated against actual occur¬rences of damages or other risk indicators. Even the passage of time and gaining of more experience, which normally improves past estimates, the scoring mod¬els’ inputs generally were not tracked and improved.

-Results do not normally produce a time-to-failure, without which there is no technical defense for integrity assessments scheduling. Without additional anal¬yses, the scores did not suggest appropriate timing of ILI, pressure testing, direct assessment, or other required integrity verification efforts.

-Potential for masking of effects when simple expressions could not simultane¬ously show influences of large single contributors and accumulation of lesser contributors. An unacceptably large threat—very high chance of failure from a certain failure mechanism—could be hidden in the overall failure potential if the contributions from other failure mechanisms were very low. This was because, in some scoring models, failure likelihood only approached the highest levels when all failure modes were coincident. A very high threat from only one or two mechanisms would only appear at levels up to their pre-set cap (weighting). In actuality, only one failure mode will often dominate the real probability of fail¬ure. Similarly, in the scoring systems, mitigation was generally deemed ‘good’ only when all available mitigations were simultaneously applied. The benefit of a single, very effective mitigation measure was often lost when the maximum benefit from that measure was artificially capped. See note 1.

-Some relative risk assessments were unclear as to whether they are assessing damage potential versus failure potential. For instance, the likelihood of corro¬sion occurring versus the likelihood of pipeline failure from corrosion is a subtle but important distinction since damage does not always result in failure.

-Some previous approaches had limited modeling of interaction of variables, a requirement in some regulations. Older risk models often did not adequately represent the contribution of a variable in the context of all other variables. Sim¬ple summations would not properly integrate the interactions of some variables.

-Some models forced results to parallel previous leak history—maintaining a certain percentage or weighting for corrosion leaks, third party leaks, etc.—even when such history might not be relevant for the pipeline being assessed.1

-Balancing or re-weighting was often required as models attempt to capture risk in terms that represent 100% of the threat or mitigation or other aspect. The ap¬pearance of new information or new mitigation techniques required re-balancing which in turn made comparison to previous risk assessments problematic.

-Some models could only use attribute values that are bracketed into a series of ranges. This created a step change relationship between the data and risk scores. This approximation for the real relationship was sometimes problematic.
Some models allowed only mathematical addition, where other mathematical operations (multiply, divide, raise to a power, etc) would better parallel underly¬ing engineering models and therefore better represent reality.

-Simpler math did not allow orders of magnitude scales and such scales better represent real-world risks. Important event frequencies can commonly range, for example, from many times per year to less than 1 in ten million chance per year. An underlying difficulty in the calibration of any scoring type risk assessment are the limitations inherent in such methodologies. Since the scoring approaches usually make limited use of distributions and equations that truly mirror reality (see previous discussion on limitations), they will not always closely track ‘re¬al-world’ experience. For example, a minor 1 or 2% change in a risk score may actually represent an equivalent change in absolute estimates for one threat but a 100 fold change in another threat.

-Lack of transparency. a scoring system adds a layer of complexity and interferes with understanding of the basis of the risk assessment. Underlying assumptions and interactions are concealed from the casual observer and require an examination of the ‘rules’ by which inputs are made, consumed by the model, and results generated.

What are they missing?
Modern QRA
Measuring failure potential
Essential elements of good risk assessment