“I don’t have enough data to quantify risk”
I hear this often and have concluded that it is actually a short hand phrase reflecting two possible beliefs:
- I don’t understand how to use the data I do have
- I think that quantifying risk assessment means that I need large datasets of historical event frequencies.
The truth is, you can perform a credible risk assessment even with only a very limited amount of information. If you only know a product being transported, pressure, diameter, and general location, you could make plausible estimates—very coarse, but at least reasonable.
This reminds me of a lesson learned during a court room proceeding:
Attorney to expert witness, asking a slightly off-topic question: “Mr Expert, how often might there be a vehicle collision at this intersection each year?”
Expert: “I have no idea. I don’t have any data for that.”
Attorney, while winking to jury: “Ok, since you have no idea, then we can speculate that it can happen 1,000 times per year.”
Expert, surprised: “Oh no, it wouldn’t happen that often.”
Attny: “Ah, so you DO have ‘some idea’. Ok then, let’s say it happens 500 times per year.”
Expert, beginning to see the hole he has fallen into: “Oh no, that also is way too high.”
Attny: “How about 100 times a year?”
Expert, now somewhat apologetic: “Well, even that is too high because . . . .”
This went on until the attorney had obtained, for the court record, the expert’s high and low estimates, even when the expert claimed insufficient data and knowledge to speculate. The attorney knew that is it a simple reasoning exercise to ‘know’ that, say 2-3 vehicle incidents every day at the same place would not be long tolerated. Even 1-2 per month would probably prompt action. This illustrates that, even in the absence of hard data, reasoning can at least bound an estimate.
Direct reasoning is often overlooked as a source of data. When it comes to probability and risk, we sometimes forget that we have a strong, physics-based understanding of real world phenomena. Instead of using that understanding in our risk estimates, some tend to simply delegate the risk problem to the statisticians. The statisticians use event frequencies in their work so they base their estimates on historical events. They tell us ‘low data’—meaning low historical event frequencies—equates to low predictive power. True enough, especially from a statistics perspective.
But we forget that we still have the underlying physics. Physics tells us how much metal loss can be tolerated before leak or rupture, how much voltage is needed to halt corrosion, how much backhoe bucket force until the pipe breaks, how much landslide a length of pipe can withstand before yielding. We can estimate the numbers needed to calculate these things—often with great accuracy. We don’t have to rely on historical events to tell us how often a thing can happen. We are certainly remiss if we ignore history—it must definitely be used in our analyses whenever it is available. But we are also remiss if we ascribe too much relevance to the past or, at the other extreme, claim we are helpless without that history.
Let’s discuss low data availability when we’re performing a physics-based risk assessment. It is sometimes not apparent just how much info is readily available. Let’s say you know something simple about the soil type—where it’s rocky and where it’s mostly clay. Some of the risk factors that can be strongly influenced by just this simple piece of information include:
- Potential soil moisture content, impacting corrosivity estimate
- Likelihood of past coating damages during installation
- Propensity of future coating damages to occur
- Dispersion of liquid spills—infiltration vs surface flow
- Amount of potential harm to certain receptors (eg, aquifers vs surface flow)
- Exposure to third party excavation damages
- Exposure to certain geotechnical phenomena (eg, subsidence, shrink/swell, landslide, etc)
Perhaps you can think of more. The point is that you may have more information than you first thought. In this example, a single piece of information—a simple soil characteristic; rock vs clay—has influenced seven different risk variables.
There are many other examples of how simple knowledge of surroundings leads to relevant and important risk information. This also emphasizes why dynamic segmentation—the creation of a risk profile—is essential. We would not understand changes in risk along a pipeline route if we failed to take note of changing soil conditions and integrated the implications of those changes.
The second part of the “I don’t have enough data” statement emerges from beliefs about how risk can be quantified. When the underlying belief is something like “we can’t quantify risk because we don’t have the data”, what is often implied is that databases full of incident frequencies—how often each pipeline component has failed by each failure mechanism—are needed before risk can be quantified. That’s simply not correct. To quantify how often a pipeline segment will fail from a certain threat, we don’t necessarily have to have numbers telling us how often similar pipelines have failed in the past from that threat. This myth is often a carryover from the old—let’s say ‘classical’—practice of QRA. That practice can be an almost purely statistical exercise. It relies heavily on data of past events as predictors of future events, as is standard practice in statistical analyses. While such data is helpful, it is by no means essential to risk assessment. And when it is used, it must be used carefully. The historical numbers are often not very relevant to the future—how often do conditions and reactions to previous incidents remain so static that this history can accurately predict the future?
With or without comparable data from history, the best way to predict future events is to understand and properly model the mechanisms that lead to the events. A robust risk assessment methodology forces SME’s to make careful and informed estimates based on their experience and judgment. With only minimal effort, a group of SME’s, in a properly facilitated meeting, can generate credible, defensible estimates of all manner of damage and failure potential along pipelines they know. From these estimates reasonable risk estimates emerge, to be confirmed or updated as actual events are tracked.
In Part 2 of this myth-busting, we’ll look at an example of the statistics-only approach to risk quantification compared to the physics-based approach. We’ll also offer an approach to address the dilemma of serious information gaps.