ENISA has released a good practice guide for CERTs that are tasked with protecting industrial control systems (SCADA).
The European Union Agency for Network and Information Security (ENISA) publishes a lot of advice and recommendations on good practice in information security. Necessarily, it has a European focus, but almost all the advice is applicable to systems in any region.
This SCADA CERT practice guide focuses on how Computer Emergency Response Teams should support Industrial Control Systems (ICS).The terms ‘ICS’ and ‘SCADA’ (Supervisory Control and Data Acquisition) are pretty much interchangeable.
SCADA systems were around before the Internet. The first systems were driven by mainframes and installed to control water and electricity networks. Since then, SCADA has become ubiquitous and systems that were initially designed to work on independent networks have been connected to the Internet.
Connecting SCADA to the Internet has many advantages. It increases system availability and reduces costs of connecting geographically disparate systems. At the same time, connecting SCADA to the Internet decreases system confidentiality and more importantly in this situation, system integrity.
The ENISA ICS guide tries to put together in one document, a guide for CERTs that are required to protect SCADA/ICS systems. Importantly, it doesn’t just focus on the technical capabilities required for operations, but also organisational capabilities and what it terms ‘co-operational capabilities’. This last part is important as computer emergency response teams can forget that they are part of a system and the system is only as strong as the weakest link. It is important to remember that preparation for things going wrong involves identifying people, resources and stakeholders that will be required. Developing relationships with other organisations will always pays dividends when an emergency occurs. This is where the ENISA advice is in some ways superior to the advice from the US DOE, although I acknowledge the attractive simplicity of some of their guidance.
It is good that the authors acknowledge that this area is one where there is limited experience and that the guide should be considered a ‘living document’. As usual in cyber-security protection, both technical expertise and organisational /management guidance are required.
A biological approach to organisational resilience
By a lapsed microbiologist
“Organisational resilience is only achievable through adaptability”
Too many leaders start believing their own press and thinking that they are able to predict the future. Whilst it is absolutely true that the best indicators of the future are the events of the past. It is also true that the past is not an absolute indicator of future events because our view of the past is limited by our record of it. Some events are so rare that they are not recorded, yet they may have extreme consequences if they occur. So if we cannot predict the future with certainty, how is longevity possible for organisations? The answer is resilience, and at the core of resilience is adaptability.
The lesson from biology is that adaptation to the environment that has allowed organisms to survive and thrive. However large and seemingly terrible an organism is, if it is not adapted to its environment it will become extinct. The vast majority of species that have ever existed are not around today.
The same is true for organisations.
The vast majority of organisations that have ever existed are not around today
In simple terms the story is the same for each failed organisation. They were unable to adapt to the business environment before they ran out of resources. Those that survive a crisis are able to do so for two reasons
1 They have the resources, capital personnel leadership etc to manage themselves out of a crisis once it hits emerging weaker but alive; or
2 They are prepared to adapt if a crisis arises and have developed a broad set of principles which will work with minimal change in most eventualities. These companies still suffer from the crisis at first, but emerge stronger in the longer term.
By my reckoning, 99% of companies that manage to survive a crisis are in the first category. In most cases, those companies are then consigned to a slow death (My Space anyone?). Sometimes however, the first crisis weakens them, but they then become more resilient and bounce back to ride future crises.
This is an era of organisational accelerated extinction
What is more, the ‘extinction rate’ for companies is becoming faster as society and technology changes more rapidly.
I think we all understand that small businesses come and go, but this lesson is true for large organisations as well. Of the top 25 companies on the US Fortune 500 in 1961, only six remained there in 2011.
Research carried out on fortune 500 companies in the USA shows that the average rate of turnover of large organisations is accelerating. The turnover has reduced from around 35 years in 1965 to around 15 years in 1995.
If you think about how much the world has changed since 1995 when Facebook barely existed and Google just did search, you might agree with the idea that organisations that want to stick around need to adapt with the changing environment.
So give me the recipe!
Bad news, there isn’t a hard recipe for a resilient organisation, just like there isn’t one for a successful company, but they all seem to share some common attributes such as agility and the ability to recover quickly from an event and an awareness of their changing environment and the willingness to evolve with it amongst others. This is difficult for a number of reasons.
1 increasing connectedness – interdependencies leading to increasing brittleness of society/organisations – just in time process management – risks, in rare instances, may become highly correlated even if they have shown independence in the past
2 increasing speed of communication forces speedier decision making
3 increasing complexity compounds the effect of any variability in data and therefore the uncertainty for decision makers
4 biology – Organisations operate with an optimism bias. Almost all humans are more optimistic about their future than statistically possible. We plan for a future which is better than it is and do not recognise the chances of outlier events correct. Additionally, we plan using (somewhat biased) rational thought, but respond to crises with our emotions.
5 Organisational Inertia. The willingness to change organisational culture to adapt to a change in the environment.
Something about organisational culture and resilience
When discussing culture, resilience is more an organisational strategic management strategy, and less a security protocol. In this sense, Resilience is the ‘why’ to Change Management’s ‘how’. But both are focused on organisational culture.
Organisations, particularly large organisations, all have their own way of doing things. Organisational culture is built up because individuals within the organisation find reward in undertaking tasks in a certain way. This is the same whether we are talking about security culture or indeed financial practice. Organisational culture goes bad when the reward structure in the organisation encourages people to do things that are immoral or illegal.
Larger organisations have more inertia and so take longer to move from good to bad culture and vice versa. Generally most organisations that are larger than about 150 staff have a mix of cultures.
The more successful an organisation has been in the past, the more difficult (inertia) it will be to make change and so it becomes susceptible to abrupt failure. Miller coined the term ‘Icarus Paradox‘ to describe the effect and wrote a book by the same name. Icarus was the fictional Greek character who with his son made wings made from feathers and wax, but died when he flew too close to the sun and the wax melted, causing the feathers to fall out of the wings.
Maybe the Kodak company is the best example of this. An organisation that had been very successful for more than 100 years (1880 -2007), Kodak failed to make the transition to digital and to transition from film as fast as its competitors. The irony is that it was Kodak researchers who in the 1970s invented the first digital camera thus sewing the seeds for the company’s doom forty years later.
Where does my organisation start on the path
So what is the answer, how do we make sure that our organisations adapt faster than the environment that is changing more rapidly every time we look around? The only way is to begin to adapt to the changing environment before crises arise. This requires making decisions with less than 100% certainty and taking risk. The alternative is to attempt to change after a crisis arises, which historically carries higher risk for organisations.
It is a combination of many things –
developing an organisational culture which recognises these attributes which is supported and facilitated from the top of the organisation;
partnering with other organisations to increase their knowledge and reach when an event comes; and
Lastly engaging in the debate and learning about best practices
Are there two sorts of resilience?
But is resilience just one set of behaviours or a number. When we think of resilient organisations and communities, our minds tend to go to the brave community / people / organisation that rose up after a high consequence event and overcame adversity. These people and organisations persist in the face of natural and manmade threats. Numerous examples include New York after the September 2001 events; Brisbane after the floods in 2011; and the Asian Tsunami in 2004.
However there is another set of actions, which are more difficult in many ways to achieve. This is the capacity to mitigate the high consequence, low likelihood events or the creeping disaster before a crisis is experienced. The US behaved admirably in responding to the 9/11 terrorist disaster after it had occurred, but as the 9/11 Commission Report notes, terrorists had attempted on numerous occasions to bring down the World Trade Center and come quite close to succeeding.
Life becomes resilient in that it is replicated wildly so that many copies exist, so that if some number fail, life can continue. Individual creatures carry DNA, which is all that needs to be replicated. Those creatures compete with each other and the environment to become more and more efficient. An individual creature may or may not be resilient, but the DNA is almost immortal.
How an organisation achieves this is the challenge that every management team needs to address if they want to achieve longevity.
If you wish to discuss any of the issues in this whitepaper, please contact us
 noting that the word dinosaur is directly translated as terrible lizard
On the face of it, complex systems might have more resilience than those that are simple because they can have more safeguards built-in and more redundancy.
However, this is not supported by real world observation. Simply put, more complexity means more things can go wrong. In both nature and in human society, complex controls work well at maintaining systems within tight tolerances and in expected scenarios. However complex systems do not work well when they have to respond to circumstances which fall outside of their design parameters.
In the natural world, one place where complex systems fail is the immune system. Anaphylactic shock, where the body over-reacts because of an allergy to a food such as peanuts is a good example. Peanuts are of course, not pathogens, they are food, The immune system should not react to them. However people’s immune systems are made up of a number of complex systems built over the top of each other over many millions of years of evolution. One of these systems is particularly liable to overreact to peanuts. This causes in the worst case, death through anaphylaxis – effectively the release of chemicals which are meant to protect the body, but which do exactly the opposite. This is an example of where a safety system has become a vulnerability when it is engaged outside normal parameters.
We are beginning to see the resilience of complex systems such as the Great Barrier Reef severely tested by climate change. Researchers have found that the reef is made of complex interactions between sea fauna and flora, built upon other more complex interactions. This makes it nigh on impossible for researchers to find exact causes for particular effects, because they are so many and varied. Whilst the researchers confidently can say that climate change is having a negative effect on the coral and that bleaching effects will become more common as the climate becomes warmer, they cannot say with a great deal of certainty how great the other compounding effects such as excess nutrients from farm runoff or removal of particular fish species might be. This is not a criticism of the science, but more an observation that to predict the future with absolute certainty, when there are multiple complex factors at play is extremely difficult.
These natural systems are what some might call ‘robust yet fragile’. Within their design parameters they are strong and have longevity. Such systems tend to be good at dealing with anticipated events such as cyclones in the case of the Great Barrier Reef. However, when presented with particular challenges outside the standard model, they can fail.
Social systems and machines are not immune from the vulnerabilities that complexity can introduce into systems and can also be strong in some ways and brittle in others.
The troubles with the global financial system are a good example. Banking has become very complex and banking regulation has kept up with this trend. That might seem logical, but the complex rules may in themselves be causing people to calibrate the financial system to meet the rules, focussing on the administrivia of their fine print, rather than the broad aims that the rules were trying to achieve. As an example, one important set of banking regulations are the Basel regulations. The Basel 1 banking regulations were 30 pages long, the Basel 2 regulations were 347 pages long and the Basel 3 regulations are 616 pages. One estimate by McKinsey says that compliance for a mid-sized bank might cost as much as 200 jobs. If a bank needs to employ 200 people to cope with increased regulation, then the regulator will need some number of employees to keep up with the banks producing more regulatory reports, and so the merry-go-round begins!
A British banking regulator, Andrew Haldane is now one of a number of people who question whether this has gone too far and banks and banking regulation has become too complex to understand. In an interesting talk he gave in 2012 in Jackson Hole, Wyoming, USA titled the ‘Dog and the Frisbee’, Haldane uses the analogy of a dog catching a frisbee to suggest that there are hard ways and easy ways to work out how to catch a frisbee. The hard way involves some complex physics and the easy way involves using some simple rules that dogs use. Haldane points out that dogs are better in general at catching frisbees than physicists! I would also suggest that the chances of predicting outlier events, what Nicolas Taleb calls ‘Black Swans’ is greater using the simple predictive model.
This is in some ways a challenge to the traditional thinking behind risk modelling. When I did my risk course, it was all very formulaic. List threats, list vulnerabilities and consequences, discuss tolerance for risk, develop controls, monitor etc. I naively thought that risk assessment would save the world. But it can’t. Simple risk management just can’t work in a complex system. Firstly, it is impossible to identify all risks. To (misquote) Donald Rumsfeld, there are known risks, unknown risks, risks that we know we have, but can’t quantify and unknown risks that we can neither quantify nor know.
Added to this is the complex interaction between risks and the observation that elements of complex systems under stress can completely change their function (for better or worse). An analogy might be where one city under stress spontaneously finds that its citizens begin looting homes and another intensifies its neighbourhood watch program.
Thus risk assessment of complex systems is in itself risky. In addition, in a complex system, the aim is homeostasis, the risk model responds to each raindrop-sized problem, correcting the system minutely so there are minimal shocks and the system can run as efficiently as possible. A resilience approach might try to develop ways to allow the system/organisation/community to be presented with minor shocks, in the hope that when the black swan event arrives, the system has learnt to cope with at least some ‘off white’ events!
Societies are also becoming more complex. There are more interconnected yet separately functioning parts of a community than there were in the past. This brings efficiency and speed to the ways that things are done within the community when everything is working well. However when there is a crisis, there are more points of failure. If community B is used to coping without electricity for several hours a day, they develop ways to adapt over several months and years. If that community then finds that they have no power for a week, they are more prepared to cope than community A that has been able to depend on reliable power. Community B is less efficient than community A, but it is also less brittle.
This does however illustrate out a foible of humanity. Humans have evolved so that they are generally good at coping with crises (some better than others), however they are not good at dealing with creeping catastrophes such as climate change, systemic problems in the banking and finance sector, etc.
Most people see these things as problems, but think that the problems are so far away that they can be left whilst other more pressing needs are dealt with.
Sometimes you just need a good crisis to get on and fix long-term complex problems. Just hope the crisis isn’t too big.