With previous technical roles in large industrial organizations like ExxonMobil, Chevron, and Tennessee Valley Authority, Caleb Mathis, SynSaber’s Principal Content Engineer, is no stranger to process engineering and risk quantification.
Mathis has put together the following insights delineating the differences and similarities between cyber risk quantification (CRQ) and hazard and operability studies (HAZOPS).
Examining the processes of actuarial quantification of cyber risk and HAZOPS yields a comparison to siblings whose differences at the level of appearance are stark, but upon closer examination, their genetic similarities begin to shine through. While the application and technology requirements of the two disciplines are vastly different, the foundational principles remain parallel in most cases.
The following paragraphs will attempt to give a succinct encapsulation of the processes of each discipline, analyzing their similarities and differences and unveiling the dialectic relationship between the two fields.
Cyber Risk Quantification
Quantification of Cyber Risk for Actuaries states, “The primary goal for all organizations is to keep business running” (from soa.org). This statement provides the core principle which will serve as the foundation for all further analysis. The quantification of cyber risk, according to this paper, should use the framework provided within to calculate the actual monetary risks involved with lapses in security and provide a means of assessing a given network.
In this structure, we examine potential paths of attack that would compromise one or all of the CIA (confidentiality, integrity, availability) of the network in question. The risk associated with this calculation comes from an outside force rather than a process-related error with an unknown likelihood of occurrence. Uncertainty around likelihood of occurrence is a key factor in variable risk determination for both industrial control networks and enterprise networks that can lead to misguided investment decisions.
The process of designing a defense begins with adopting the perspective of the connectivity within the network. Vulnerability metrics can be generated using attack graphs and Bayesian networks with conditionally dependent nodes representing the complex network structures to accurately map out and quantify the consequences of security flaws and failures within the network.
Utilizing the data generated at this step, you can further examine how the attack propagated within the layers of the network using an impact graph. The data yielded from the impact graph can quantify vulnerability exploitation within the network by showing degradation of operability of the relevant CIA components and the business process.
Cyber Risk Costs 💰
Using the previously mentioned data and pre-cursor steps, you can then create a structure to begin assigning risk values in dollars. The main components of cost in this scenario are:
- loss of data/equipment,
- court settlement fees,
- reputational damage, and
- PR costs
This is a vastly simplified version of the entire formula, but the main points of emphasis are:
- Consider all possible weaknesses from the attackers’ perspective,
- Identify the conditional variables to the exploitable areas of the network, and then
- Create an optimal mitigation strategy based on relevant aspects of CIA and optimal cost-efficiency
An important thing to consider in this discussion is the significance of esoteric mathematics and the technical knowledge required to be fluent in the languages of attack and defense within this context.
👉 A high level of technical expertise is necessary to evaluate both the attack graph and the impact graph using the given framework. This expertise will be essential to creating efficient mitigation strategies for more extensive networks.
In these larger networks, hundreds of assets create a complexity that demands significant technical expertise and experience to be appropriately managed. However, it is also helpful to have the input of those well versed in business processes with a firm understanding of resource availability and ideal allocation to create an optimal strategy that achieves the dual purpose of maintaining CIA and overall company profitability.
The constantly evolving nature of technology is perhaps the greatest challenge that any security team faces, as threats are constantly mutating and new exploits emerging. The ability to adapt network security to meet these new challenges will be the most telling indicator of the success of any cyber security operation.
In contrast to Cyber Risk quantification, Hazard Studies in the industrial world have to their benefit around 100 years of study regarding process failure consequence mitigation. This information can be examined and extrapolated to provide a dialectic counterpoint to network security analysis. Hazard studies provide the commercial industrial world with a systemic method of identifying and recording all hazards and risks expected throughout the life of the system.
Furthermore, it ensures all risks are lowered to an acceptable level using methods that can be used for the duration of the life of the plant/system. The mathematical calculations of risk are generally presented in flow charts and matrices. One of the core formulations for risk management is typically presented in a flow chart indicating the progression from evaluation of the risk to comparing it with established criteria and deciding if it falls within acceptable risk parameters. Unacceptable risk parameters are then evaluated for enhancements of preventative, mitigative, and/or responsive controls until an acceptable risk threshold is achieved through Layer of Protection Analysis (LOPA) and/or fault tree analysis.
The important data for many business-focused analyses can be presented graphically with the y-axis representing “Cost of Change” and the x-axis representing “Elapsed project time.” Much like the actuarial quantification of cyber security discussed earlier, mitigation must be effective and not create undue expense by failing to identify major flaws before it becomes prohibitively expensive to implement the necessary corrective action. In this discipline, a scale of risk is created and displayed using a “risk matrix.”
The risk matrix is defined with the y-axis representing the ordinal ranges for the likelihood/frequency of an initiating event, and the consequences categories are displayed ascending along the x-axis. This provides regions of acceptability for measured risk and allows companies to employ optimal decision-making in the early design stages to ensure the ideal cost and length for the project.
However, much like in cyber security, there needs to be a balance of technical expertise and business-oriented leadership to set realistic expectations for the requirements of a study to avoid undue accruement of cost and create a viable cost/value ratio for the proposed safety solutions. Despite years of developed methodologies (Layer of Protection Analysis, Fault Tree Analysis, etc.) and data enhancements to refine the quantification of consequences to initiating events, it remains impossible to remove all bias from speculative risk scenarios leading to a shared problem with cyber security of potentially misguided safety investment decisions.
ALARP and HAZOP
ALARP – As Low as Reasonably Possible
One of the core principles of this methodology is ”Alarp,” an acronym for “As low as reasonably possible,” which is the paradigm used to establish acceptable risk. This paradigm helps quantify the possible levels of risk and establish where the final project needs to fall within that range. Forming these “tolerable risk criteria” is one of the most complex questions that arise in hazard analysis, and much like network security, the answer is usually situational.
HAZOP – Hazard and Operability
The first step in documenting the scenarios where safety investment decisions are applied is usually through a “Hazop.” Hazop can be defined as “…a detailed method for systematic examination of a well-defined process or operation, either planned or existing”.
The Hazop method has four concrete steps, which are:
- Definition phase,
- Preparation phase,
- Examination phase, and
- Reporting & follow-up phase
Of these, the Examination phase is the core of the process. It consists of: dividing the system into elements, then examining the separate elements to conceive of all possible deviations and identify the causes of said deviations, the consequences, and protection needs. Taking a known or designed process and formulating all possible scenarios which could arise from any deviation in this process.
Thus, the threat which arises is from a failure of process or failure to identify a possible deviation from said process. In the realm of cybersecurity, especially regarding network security of process-dominated environments, it is essential to consider the output of a Hazop of a given process to create the best possible defense for the network that supports that process.
Industrial Safety vs. Cyber Security
One of the key similarities between industrial safety and cyber security is the nature of the consequences of failure, while the scale can be dramatically different. In cyber security, the penalty of failure is usually associated with loss of data and revenue in some combination, which can lead to a tarnished reputation and damage to or even complete loss of the company.
👉 In the industrial world, all of those are possible, but there is also environmental impact and potential fatalities to be considered. These added consequences require a methodical, varied approach to the problem that seeks to identify all possible scenarios in which these unwanted consequences could occur as a first step.
In both the cyber security and industrial safety domains, there will always need to be a balance of technical expertise and business-oriented leadership to set realistic expectations for the requirements of a risk-based study to avoid undue accruement of cost and create a viable cost/value ratio for the proposed safety solutions. Despite years of developed methodologies (Layer of Protection Analysis, Fault Tree Analysis, etc.) and data enhancements to refine the quantification of consequences to initiating events, it remains impossible to remove all bias from speculative risk scenarios.
Cyber security’s dynamic nature and uncertainty around threat actors often lends itself to heavily biased influences, especially when discussing risk in industrial environments. As cyber security grows as a discipline and industrial safety continues to enhance connectivity, the frequency of risk-based investment discussions will increase along with the requirement for reduction of bias between the two domains. As seen in the figure below, quantification of initiating events and consequences follows a similar probability structure to those used in the sample attack graph for targeting database server availability.
Hazard Studies vs. Network Security
Regarding the gap between these two industries, I would say that the most striking contrast is due to the relatively static nature of the processes and technology leading to more protective and mitigative controls and the dynamic nature of network security leading to more applications of reactive controls. This gap often leads to an inherent misunderstanding of consequences to industrial systems by cybersecurity experts and a misunderstanding of legitimate threats to the system by industrial safety experts.
👉 In Hazard studies, there should be a category for cyber risk quantification added into any networked process facility, even if physical access is required. This is due to the staggering consequences that could arise from a network security breach that was achieved in a process facility in which an extremely damaging flaw could be exploited by a nefarious actor. In today’s world, this is almost certainly a possibility.
In network security, Hazards studies and Hazops are the work you do before the work you do. If you can exercise the principles of considering all possible adverse scenarios and incorporate the solutions/mitigations into your design for your defense plan, you will be proactively and methodically performing a valuable tool for network integrity.
Combining Cyber Risk Quantification and Hazop
While the computational methods of risk are different, the undergirding concept of identifying conditional variables and possible consequences and applying them to data carries across both disciplines. The quantification of cyber security continues to present a daunting process to encapsulate, as networks and hosts emit an increasing amount of data that has to be correlated to a set of consequences. However, the connection of defined consequence graphs sourced from 100 years of research and engineering expertise provides a logical starting point for the optimization of cyber security investment decisions.
There will always be a tangible gap between the technology of the fields in question. Still, the underlying “good practices” from each discipline can be taken to heart across the divide as both practices share the same objectives of risk reduction. Approaching the creation of cyber defense in the process-oriented environment should strive to utilize both approaches when innovating solutions for whenever they cross paths.