Focus
High Reliability
October 2010
RoSPA's occupational safety adviser Roger Bibbings explores the idea of High Reliability Organisations. A concept based on the theory that accidents can be prevented through good organisational design and management.
Inquiries now well underway into the BP subsea spill in the Gulf of Mexico will no doubt serve to refocus debate onto what it is to be a 'High Reliability Organisation' and how far BP's and its partners' operations fell short of this ideal.
Indeed it will be impossible for those leading and guiding the various inquiries to avoid this question, especially given the BP Texas City refinery fire of July 2005 (www.texascityexplosion.com) and the findings of the Baker report published in April 2007 (www.bp.com/liveassets/bp_internet/globalbp/STAGING/global_assets/downloads/Baker_panel_report.pdf). This report found that BP management had not distinguished between "occupational safety" (ie. slips, trips and falls, driving safety etc.) and "process safety" (ie. safe case considerations, maintenance, process upset reporting etc).
The current inquiry process, however, has become intensely politicised. Serious students of the events that led up to the accident on 20 April this year - and the subsequent response to it - may need to adopt an independent and critical frame of mind when trying to understand all the complex technological, organisational and interface issues involved.
Accidents (particularly major ones) concentrate minds and present unique opportunities for capturing the attention of those who need to think more deeply about how to move further towards High Reliability Organisation (HRO) status. The Deepwater Horizon tragedy is one such opportunity.
The concept of the HRO arose from work by US researchers in the late eighties who studied organisations that had obviously succeeded in avoiding catastrophes in hazardous environments where accidents might be expected to occur quite frequently due to significant risk factors and complexity.
HRO theory has developed against the background of disasters such as the Tenerife air crash (March 1977), the Three Mile Island nuclear incident (March 1979), the Bhopal chemical leak (December 1984), the Challenger explosion (January 1986), the Chernobyl fire (April 1986) and the Columbia explosion (February 2003) - to mention only some. The understanding that these events had complex technological, organisational and behavioural roots has led to an ongoing search for a set of characteristics which can 'disaster proof' an organisation.
In the disaster rich world of the 1980s and 1990s there was an intense study of what one might think of as the corporate health and safety genome, to see if it was possible to identify the kind of organisational DNA that could ensure reliability and guarantee the ability of organisations to cope with the unexpected, including stopping incipient disasters escalating out of control.
The project had obvious attractions. Apart from this being the Holy Grail in safety, it would obviously be a great feather in any corporate cap to be able to say that you had the right safety DNA and then regulators and all other stakeholders could all sleep a little easier in their institutional beds.
Needless to say, it's not that simple.
First of all you cannot describe yourself as an HRO simply by showing that you have a high ratio of potential disaster situations to actual ones (on the assumption that you must be inherently safe because you seem to be very successfully 'barriering off' the pre-cursors of disasters large and small). For example, you may have an impressively high number of unsafe act/condition reports at the base of your Heinrich triangle but still quite a number of fatalities at the top. And then there is the problem of latency of effect, be it the accident waiting to happen in the future because of a poor decision taken some time ago - or the long-term (but as yet undetected) effects of exposure to a carcinogen.
How inherently safe you are cannot be measured simply by asking how infrequently things have actually gone very badly wrong compared with the large number of occasions on which conceivably they could have done. Indeed such a measure is probably the perfect prescription for complacency and the kind of assessment which can actually increase the risk of major corporate safety failure in the future.
Neither can you describe yourself as an HRO simply because you have an exhaustive set of procedural responses to every conceivable threat. This too may actually increase risk. Just as SMEs can be swamped with too much information and guidance and need a lot of hand-holding to be able to navigate to the precise information they need, so too this can be the case with the busy manager in the large organisation who is expected to have an encyclopaedic recall of every corporate policy or guideline. Having a prescription for every eventuality does not make you safe (although there is a still a worrying tendency among many health and safety auditors to spend too much time examining the completeness of documented processes rather than studying what actually happens in day-to-day practice and why).
HRO principles
So whether you are an HRO is actually quite hard to pin down and thus the more thoughtful students of the concept have tried to construct a set of principles which describe the way the 'HRO organisation' goes about its work. According to Professor K. E. Weick and other leading thinkers in this field, an HRO exhibits:
- Preoccupation with failure: recognising that success can breed complacency; being always alert to the possibility of failure; always searching for lapses and errors which can precipitate disaster; in this context having good systems for reporting near misses, process upsets and failures of all sorts which might be indicators or even precursors of serious adverse events; being rigorous in analysing and prioritising the warning signs received so as to sort out and distinguish between important signals of impending disaster (however weak) and 'background noise'; anticipating the unexpected; and always being prepared to listen and act in a timely way in response to early warnings.
- Reluctance to simplify interpretations: when simplifying their data in order to make decisions, not discarding information as unimportant or irrelevant, especially when it may have implications for safety; in this context, recognising complexity and unpredictability, deliberately creating scenarios, encouraging staff to notice more, investing sufficiently in monitoring and checking; having sufficient organisational 'slack' to be able to analyse 'weak signals' and determine the significance of warning signs and to question and learn from operational experience.
- Sensitivity to operations: having front line operators who strive to maintain situational awareness, being highly informed about operations as a whole, not being 'siloed' within their own small sphere of influence and failing to consider the wider impact of their activities; managers being sensitive to the experience of their front line operators and empowering them to speak up and voice their concerns and being attentive to the detail of work activities.
- Commitment to resilience: not being disabled by errors or crises but being able to mobilise in special ways when such events occur so as to be able to deal with them; in other words not being error free but error proof; having sufficient redundancy, diversity and variety available to catch, cope with and correct errors; and being able to capture and learn from the such experiences.
- Deference to expertise: recognising that when operations are being carried out under pressure, the focus of decision-making needs to move to those with the greatest expertise or knowledge, even though they may be lower in the organisational hierarchy (but protecting against unintentional consequences by focusing on principle three) and recognising that when pressures ease, the focus of decision-making moves back up the hierarchy.
In summary, these characteristics are described by Professor Weick as a state of 'organisational mindfulness' and they were used, for example, in the wake of the Columbia space shuttle disaster as a template against which to judge the behaviour of NASA.
At this level of generality the whole idea of an HRO can seem, paradoxically, to be both hideously complex and blindingly obvious. So what can be learned from this approach by non-major accident hazard organisations, especially those chasing the elusive goal of 'zero harm'? Is HRO theory applicable only to large complex organisations and not to small ones, which after all can be just as complex but in a more organic way perhaps?
Best practice
Clearly the idea of an HRO is a model, an ideal type of undertaking which organisations can aspire to become. They may exhibit some of the essential characteristics more clearly than others, recognising of course that the underlying characteristic of those wishing to move towards HRO status is an organisation committed to openness and safety learning.
Accidents that should not have happened, especially when preceded by detectable early warnings - or worse still, ones which bear striking similarity to those that have already happened elsewhere in the organisation - should serve not just as timely reminders for everyone to redouble their safety efforts but to ask much deeper questions about how the organisation actually operates.
On the current safety scene there seems to be an unfortunate but fairly constant swing of the theoretical pendulum. On the one hand, following accidents we stress the importance of re-engineering rules, procedures and structures, and on the other we have those who stress the need to enhance 'H&S culture'. The former argument holds that safety must be led from the top/centre and that to be 'safe' everything must be planned and delivered in a systematic way. The latter stresses that when fairly coarse systems and procedures fail to prescribe exactly what is required to ensure safe working (as they inevitably will), we have to rely increasingly on 'culture'. This means recognising that in practice, notwithstanding the importance of safety rules, local expertise and initiative are invariably required to produce the necessary degree of 'fine fit' with the fine grain of operational reality.
The key point to grasp, however, is that it is not a case of either improving 'systems' or changing 'culture' (we need both) but looking at the way information about safety issues is actually generated and flows within organisations and the way this in turn affects key, safety critical decisions.
Practical steps
Many organisations seem hooked on the philosophy of the so called 'Bradley curve' (safety development expressed as accident rates versus time). According to this, they say 'we got our technology and rules right several decades ago and rates came down. Then we then tightened up our systems and rates came down further. Now the only way forward is by improving our 'culture' to change behaviours'. The underlying intention here is good obviously but it's still a bit superficial since most investigations of major events, if properly conducted, tend to show gaps in all three areas.
My own view is that impassioned calls for yet more work to change attitudes and behaviours, especially of front line staff - even to 'profile' and weed out error prone individuals - can distract managers from thinking more deeply about why things do actually go right most of the time and conversely why occasionally they go badly wrong. This is not idle theorising but really can yield powerful insights as well as practical steps that even small organisations can take to help them proof against the unplanned and the unexpected.
For further information on HRO, see: www.high-reliability.org
Readers' views are welcome. Email: rbibbings@rospa.com
First published in "Parting Shots", The RoSPA Occupational Safety & Health Journal, September 2010