Officials from the National Institute of Standards and Technology (NIST) this week teased future improvements to the agency’s recently introduced “Phish Scale” measurement system, which helps companies determine whether phishing emails are hard or easy for their employees to detect.
Future plans for the scoring methodology include the incorporation of operational data pulled from multiple organizations, plus the addition of a user guide for training implementers on how to apply the program, and ongoing improvements based on user feedback.
Introduced in September 2020, the NIST Phish Scale scores phishing emails based on certain key properties to determine their level of sophistication and deceptiveness.
“Understanding the detection difficulty helps phishing awareness training implementers in two primary ways,” said Jody Jacobs, infosec specialist at NIST, in a session held last Tuesday at the Messaging, Malware and Mobile Anti-Abuse Working Group (M3AAWG)’s 51’s General Meeting. “The first is by providing context regarding training message clicks and reporting rates for a target audience. [The second is] by providing a way to characterize actual phishing threats, so that the training implementer can reduce the organization’s security risk by tailoring training to the types of threats. their organization faces.”
Companies whose employees have largely done a good job spotting, avoiding and reporting phishing emails might use the NIST Phish Scale to ascertain whether their training has been especially effective, or whether the test emails they’ve been sending out are simply too easy to fool the average worker.
“The NIST Phish Scale really encourages organizations to think about what their results mean,” said Kurt Wescoe, chief security awareness training architect at Proofpoint. “It is very easy to either intentionally or unintentionally run a simulated attack and draw the wrong conclusions because of the myriad of factors that come into play when determining whether an end user falls for a phish. Phish Scale is a great place to start if you don’t have an initially structured approach to how you select the emails you send in your simulated attacks and when evaluating the outcomes.”
Moreover, said Wescoe, organizations can make Phish Scale results actionable by building metrics that track employees’ progress as they grow to learn to recognize increasingly difficult-to-detect phishing attacks.
“It always helps when you can quantify a cyberthreat or help people visualize what it looks like,” added Hank Schless, senior manager of security solutions at Lookout. “By showing employees what a difficult-to-spot phishing message looks like, it will help make them aware of what red flags to look for.”
“Security teams have been dealing with it as a technical issue for decades, but the Phish Scale will help business-level leadership understand how complex of an issue phishing has become as the attacks get more advanced,” Schless added.
The Phish Scale scoring process is based upon two main factors: visual cues and premise alignment. Cues include telltale signs that an email is amiss, including typos and errors, incorrect sender addresses, and suspicious content (e.g. urgent calls for timely action).
Premise alignment refers to the degree to which the email is relevant to targeted recipients based on their roles, responsibilities and expectations. A phishing email that convincingly mimics a common workplace practice, that has high relevance, that aligns with current internal and external events, and that conveys a sense of negative consequences if the recipient does not click is scored as highly dangerous – especially if the email has not been covered in previous training exercises or alerts.
To construct its scoring system, NIST leveraged its own internal data — gathered from four years of results from the agency’s own internal phishing awareness training program — to apply difficult ratings to emails. But now, the agency is looking into “expanding development of the Fish Scale with more operational data” gathered “across different types of organizations,” said Jacobs’s co-presenter Shaneé Dawkins, NIST computer scientist.
Essentially, “we’re testing with new data to investigate how the categorizations for cues and for an email’s premise hold across different organizations in different populations of end users.”
Some experts had additional suggestions for how NIST could even further improve its tool down the road.
Asked if the Phish Scale could be applied to other forms of email-based attack, Wescoe replied, “You can certainly apply the same approach to BEC or impersonation scams and highlight the areas that should raise concern and have a relative difficulty level associated with those to add context around the difficulty of a simulated attack.”
And Schless said it would be especially helpful to figure out how to apply the difficulty rating system to phishing attacks “that take place outside of email. For example, social media gives an attacker all the context they need to create a well-crafted phishing message to an individual.”
“Also, in this time of remote work, people use their smartphones and tablets every day to access corporate data and work from anywhere,” Schless continued. Attackers know this and build phishing campaigns that target users on both mobile and PC. Mobile devices have simplified user interfaces and experiences that oftentimes hide the red flags that indicate a phishing message. This is where this type of scoring tool can be even more important.”
But even with those improvements, the NIST Fish Scale method would not be a panacea. Wescoe pointed out that it’s important to not only judge the sophistication of a phish tactic, but also how commonly and pervasively that tactic is actually being used currently in the wild. If a phishing technique is both sophisticated but almost being used pervasively among cybercriminals, that significantly increases the risk factor.
“This is where using data from your gateway really helps – because if you phish against what threat actors are currently doing, then you can speak to both how likely your users are to see an attack and how likely they are to click or report the email.” But without that context, you run the risk of being misaligned with the techniques attackers are currently using and not educating your users on the skills they need now to defend the organization.”
Wescoe also wondered if this scoring framework could effectively scale if an organization were to try to apply it to all real-life phishing emails that employees report to their SOC teams.
Meanwhile, Kevin O’Brien CEO and co-founder at GreatHorn, cautioned to not misapply the tool by using it to craft exceptionally difficult phishing simulations that could ultimately “annoy and frustrate” employees and disengage them from the security awareness training process. Doing so is “missing the point of technology in managing phishing, which is ideally to help organizations prevent phishing by making users more aware of actual phish, not to craft messages that synthetically trick them.”
On the other hand, said Wescoe, “one of the best ways to use difficulty-scoring systems is by using it for positive reinforcement with end users. Letting them know they found a really difficult phish can be big motivator for them. It helps reinforce the relationship between the security team and end users that we need to fight these kinds of attacks. It also can enable us to have a better understanding of what skills an end user is proficient in and where they need help.”