Information Security risks are those risks “that arise from the loss of confidentiality, integrity, or availability of information or information systems and reflect the potential adverse impacts to organizational operations, organizational assets, individuals, other organizations, and the Nation (NIST 800-30).”
Risk assessments should help people understand a given Cloud deployment’s information security risks and help leaders make better decisions when managing those risks.
Once the security context of our (most critical) information assets have been described with their intended Confidentiality, Integrity, and Availability, we have the foundation to analyze risk in a scalable and repeatable way.
This post proposes a way to record the most important Risk context on your resources in terms of the impact of the loss of those information security attributes: ImpactLossConfidentiality
, ImpactLossIntegrity
, ImpactLossAvailability
.
This is a bit speculative since there’s not much public information at the intersection of Cloud, scalable Risk Management, and Automation. I’d love your feedback and to discuss this problem space with you.
Risk Context
Within a risk assessment process you identify threats to information security such as:
- An external adversary attacks an application to gain network access and abuse the application’s credentials to extract data from an RDBMS or Object Store like S3
- An engineer accidentally pushes infrastructure code that an automated delivery system dutifully applies to destroy a production database.
Those threats identify two of the biggest problems on the minds of DevOps, Security, and Risk Management professionals today: data breaches and accidental automated data destruction.
Once you have identified threats to your information assets, estimate the likelihood those threats will materialize and the impact if and when they do. That likelihood and impact information plugs into the classic risk calculation function:
risk = (likelihood_confidentiality_loss * impact_confidentiality_loss)
+ (likelihood_integrity_loss * impact_integrity_loss)
+ (likelihood_availability_loss * impact_availability_loss)
If you’ve never done this before, this might feel unnatural as you try to put some definition to these new dimensions. Keep in mind that the goal here is to add context that helps you make better decisions, not perfect ones.
So when someone asks you what the biggest risks to your information security is, you can say something like, “we think these are the biggest risks, and this is why” instead of “no comment” or hand-waving through an ad-hoc explanation.
Impact
Start by estimating the impact of a loss of confidentiality, integrity, or availability of for resources in your Cloud deployment for threats such as a data breach.
Here are two ways you can estimate the impact:
Qualitatively
Use qualitative categories like Very High, High, Moderate, Low, Very Low, defined in NIST 800-30. A qualitative impact assessment is a good starting point for conversations about risk and a coarse-grained risk analysis.
You could describe the classification of impact in a tag like: ImpactLossConfidentiality=VeryHigh
or ImpactLossIntegrity=Moderate
.
In the early stages of a risk management program, this might be enough information to identify obviously large problems and prioritize response efforts.
For example, you may suspect there are some big data protection problems out there and you have a limited set of resources or time to address them. In this case, filtering the impacts to confidentiality or integrity those with a value of VeryHigh
or High
and estimating the probability of those events occurring in your head might be good enough for the first pass.
However, qualitative values are often more difficult for leaders to use when making budget allocation decisions and prioritizing efforts over time. Challenges will surface as your risk management program matures. Here are some questions that have difficult answers:
- How do I multiply the a
High
impact and a probability to get an expected annual loss for, e.g. Confidentiality? How do I sum the individual risk components for an application? How do I do that consistently? - Should I invest $1M to address a ‘High’ risk? How does that compare to this other opportunity where $1M is expected to return $10M-$15M?
- What is my department or organization’s net risk position? Do I have enough Cybersecurity insurance?
The same problems exist to some degree with a semi-quantitative analysis that ranks loss impact on an ordinal scale of, e.g. 0-100.
Both qualitative and semi-quantitative analysis approaches are a good way to identify the subset of assets to focus a quantitative analysis efforts on. Let’s look at a quantitative data model now.
Quantitatively
You can quantify a threat’s impact using an interval that covers the expected loss you would expect. This approach is used in many risk analysis methods, including FAIR. How to Measure Anything in Cybersecurity Risk (Hubbard) describes how to do this consistently with low effort and suggests using an interval that covers the losses you would expect to incur 90% of the time. The interval starts with the 5th-percentile of expected loss and ends at the 95th percentile, covering the likely ‘best’ and ‘worst’ case outcomes.
For example, suppose we expect loss of confidentiality for the records in the production user database (think password hashes and PII) to cost:
- at least $1,000 if it leaks internally because we have to clean up Splunk
- at most $100,000 if the information is exfiltrated by an attacker because we have to notify our customers, provide some identity theft monitoring, and engage an incident response team to help us communicate and manage the impact on the public
This interval won’t be perfect and it doesn’t have to be. It can be a reasonable estimate of the impact of an event based on the context that you know as a technology professional and understanding of your stakeholders.
When you share this impact interval, you’ll likely end up in a conversation about the boundaries and shape of the distribution of loss. This is great! It means you’ve connected with that colleague or decision maker on terms they understand and you may be on your way to updating that estimate based on new information.
Once you have an impact interval, you can record this information in ImpactLossConfidentiality
, ImpactLossIntegrity
, and ImpactLossAvailability
tags.
Returning to the user database example, the impact of the loss of confidentiality would be described as ImpactLossConfidentiality=[1000, 1.00E+06]
.
The impact tag’s value formats the loss range as a closed interval that begins with the lower bound of impact, ends with the upper bound, and contains numbers with two digits of precision in scientific notation. I’m still playing with this format a bit, but I think people and tools like Excel, Google Sheets, Python, and Golang can all understand it.
Example – Impact Loss of Availability
Let’s practice a bit using the availability dimension since many people are familiar with it.
Consider the impact of downtime to an ecommerce web application with an 99.95% availability requirement. Three and a half nines means 5 minutes of downtime is permitted monthly. Suppose you survey the previous year’s incident reviews and find that there were 3 incidents:
- unavailable for 17 minutes during peak hours
- unavailable for 62 minutes during off-peak hours
- unavailable for 29 minutes during peak hours on Cyber Monday
What’s a reasonable loss interval? Suppose this ecommerce company makes $1,000 per hour off-peak, $5,000 per hour during non-holiday peaks, $15,000 per hour during holiday peaks. Based on the observed data, we would probably estimate that downtime events range in length from 15 minutes to maybe 75 minutes.
So a reasonable interval of impact for a single downtime incident could have:
- a lower bound of $250 (0.25 hours*$1000)
- an upper bound $18,750 (1.25 hours*15,000), which I’ll round to two digits of precision: $19k
The loss impact interval could be described in a tag on the application’s core dependencies such as database clusters as ImpactLossAvailability=[250, 1.9E+04]
.
Now we have an idea of what one incident costs, somewhere between $250 and $19k per incident.
Likelihood
Now we need to estimate the likelihood or probability that an incident will occur in a particular timeframe. Start by fixing the timeframe you estimate the probability of a threat to occur to one year. An annual time horizon aligns closely to the cadence at which many risk management programs operate and should demonstrate investment benefits.
Returning to the availability example, we can see there is data to support an estimation of three incidents per year.
If the causes and triggers of the previous incidents are still present, then it would be reasonable to expect three more incidents to occur in the coming year or perhaps a range of two to four.
We’ll examine how to model these frequencies in depth soon. The probability distribution used to model the frequency of events and impact of losses matters a lot and varies quite a bit between Confidentiality, Integrity, and Availability.
Because the probability modeling is nuanced and not particularly application/resource specific I’m not sure it makes sense to tag resources with this information. You certainly could create a parallel set of tags like FrequencyLossAvailability=3
. However, it may be best to maintain it inside your risk modeling and analysis tools where it is easier to change and perform what-if analysis.
Next: Compute a Risk Estimate
Now we have the data required to compute the risk for each of the confidentiality, integrity, and availability dimensions using a simulation and sum them up.
That simulation will give us information on what range of losses are most likely to occur as well as could occur. That’s a deep topic that we are heading towards.
For now, you can walk away with a better understanding of the boundaries for these risks.
Assuming we don’t change anything in our ecommerce web application and it experiences similar events next year, I would expect the organization to incur less than $57,000 in availability-related losses next year. Of course, we will probably observe lower losses than that since that would be 3 ‘worst case’ events, though we can compute the probability that occurs too.
Even the cursory analysis around Confidentiality for that same web application, might send us in a different direction. Managers might decide that it’s better to invest in data breach prevention controls (and/or insurance) to manage exposure to the infrequent, but large impact of $100k for a breach.
In any event, characterizing the likelihood and impact of threats should help you have much more productive discussions with your colleagues and leaders that are focused on impacts to the organization and its stakeholders.
Stephen
#NoDrama