MITRE ATLAS: How can AI be attacked?
Table of Contents
MITRE ATLAS is a framework that systematizes malicious actors’ tactics and techniques to attack AI systems
Both the public bodies in charge of ensuring the cybersecurity of the productive fabric and citizens and companies specialized in cybersecurity have warned that Artificial Intelligence can increase the number of cyberattacks and their impact. However, we should be concerned not only about the malicious use of AI systems but also about the security of machine learning and large language models (LLM).
To help strengthen the security of AI systems, the non-profit organization MITRE has developed MITRE ATLAS, a framework that systematizes and defines the tactics and techniques that hostile actors can use to design and execute attacks against large language models.
In the following section, we will unpack the key features of MITRE ATLAS and its usefulness in understanding and anticipating the tactics, techniques, and procedures that hostile actors can deploy against AI systems.
1. MITRE ATT&CK, a key framework for understanding the modus operandi of hostile actors
The MITRE ATLAS framework has its origin in MITRE ATT&CK. This framework has established itself as a critical tool employed by cybersecurity professionals worldwide.
Since its inception in 2014, MITRE ATT&CK has been instrumental in approaching enterprise cybersecurity from the point of view of malicious actors and not just from the perspective of companies.
Throughout this decade, new variants have been added to the original technology domain, focusing on tactics and techniques that can be used to attack corporate networks. Thus, MITRE ATT&CK has three major technology domains:
- Enterprise. This systematizes how cybercriminals proceed against operating systems such as Windows, macOS, or Linux and against cloud work environments used by thousands of companies worldwide, such as Office or Google Workspace.
- Mobile. With specific tactics and techniques used to attack Android and iOS mobile devices.
- ICS. To gloss over the TTPs of attacks against industrial control systems, a critical technology in multiple sectors.
The revolution that is taking place in the development of AI systems and their growing implementation in the productive fabric has led to the creation of MITRE ATLAS. This framework unifies and organizes the global knowledge on cyber-attacks against AI systems.
In fact, ATLAS is an acronym for Adversarial Threat Landscape for Artificial-Intelligence Systems. That is, “adversarial threat landscape for artificial intelligence systems.” Like MITRE ATT&CK, it has a matrix relating the tactics that hostile actors employ and the techniques they must use for the tactics to succeed.
2. Specific tactics used in cyber attacks against AI systems
As far as the tactics of MITRE ATLAS are concerned, we can see that they are essentially the same as those of its parent framework. However, two of the tactics present in ATT&CK are not included:
- Lateral movement.
- Command and control.
On the other hand, there are two specific tactics for attacking AI systems, focused on undermining the Machine Learning models on which they are based:
- Machine Learning (ML) model access.
- Machine Learning attack stage.
This implies that the MITRE ATLAS matrix is made up of 14 tactics ranging from the preparation stages of an attack to the achievement of the malicious targets and the impact on the AI system:
- Reconnaissance
- Resource development.
- Initial access.
- Access to the Machine Learning model.
- Execution.
- Persistence.
- Privilege escalation.
- Evasion of defenses.
- Access to credentials.
- Discovery.
- Harvesting.
- Machine Learning attack stage.
- Exfiltration.
- Impact.
Let us briefly review the two tactics that MITRE ATLAS incorporates concerning ATT&CK.
2.1. Access to the Machine Learning Model
Employing this tactic, hostile actors seek access to the Machine Learning model of the system they wish to attack. In such a way, they manage to obtain all the information on how the model and its elements work at a maximum level of access. However, as MITRE ATLAS points out, attackers can use different access levels during the various stages of an attack.
To access a Machine Learning model, hostile actors may need to:
- Enter the system where the model is hosted. For example, through an API.
- To have access to the physical environment where data collection that nurtures the model takes place.
- Access is indirectly achieved by interacting with a service that uses the model in its processes.
What is sought when accessing a Machine Learning model?
- Obtain information about the model.
- Develop attacks against it.
- Introduce data into the model to manipulate or undermine its operation.
2.2. Machine Learning Attack Stage
If the previous tactic is critical in the early stages of an attack, this tactic is essential in the later stages.
Hostile actors use all their knowledge about the machine learning model and their ability to access the AI system to customize the attack to achieve their objectives.
Four types of techniques can be used for this purpose:
- Obtaining models that serve as a proxy of the one to be attacked. In such a way that access to the model can be simulated offline. This can be done by training models, using pre-trained models or replicating models from the inference APIs of the target system.
- Implementing a backdoor in the ML model to persist in the system and manipulate its operation when desired.
- Verifying the effectiveness of the attack using an inference API or accessing an offline copy of the ML model. This technique can be used to confirm that the attack has been well-developed and can be successfully performed retrospectively.
- Creating adversarial data within the model to manipulate its behavior and achieve certain effects.
3. MITRE ATLAS draws a map of techniques to undermine large language models
If tactics are the beams of MITRE ATLAS, techniques are its columns. Thus, next to each tactic are listed the various techniques that hostile actors can use to carry them out successfully.
MITRE ATLAS lists and defines 56 techniques, significantly less than the 196 techniques included in the MITRE ATT&CK Enterprise matrix.
These 56 techniques allow us to get a broad and accurate picture of how attacks against AI systems can be designed and executed.
Although most MITRE ATLAS tactics are common with the original framework, the fact is that the techniques are specific to Artificial Intelligence. For example, in the discovery tactic, we can find four techniques:
- Discover the ontology of the Machine Learning model to be attacked.
- Discover the family of Machine Learning models of the target.
- Identify the Machine Learning artifacts that exist in the system to be attacked.
- Access the meta prompt or initial instructions of a large language model (LLM). In such a way, the intellectual property of a company developing the AI system can be stolen by engineering prompts.
In addition, several techniques include sub-techniques to more precisely detail hostile actors’ procedures and the means they employ to achieve their tactical objectives. For example, three of the four Machine Learning attack stage techniques detailed above have several sub-techniques.
4. How can hostile actor techniques be prevented, according to MITRE ATLAS?
Beyond systematizing and defining the tactics and techniques that attackers can employ against AI systems, MITRE ATLAS also includes two other elements of great added value in the prevention of attacks against AI systems and their models:
- Case studies to better understand how attacks work and what their impact on an AI system can be. MITRE ATLAS has multiple case studies covering a wide range of attack characteristics:
- Typology of attacks: model poisoning, model replication, etc.
- Actors that can carry them out.
- Particularities of AI systems and their models include attacks on machine learning as a service system, models hosted on-premises or in the cloud, etc.
- Use cases of AI systems, for example, systems used in particularly sensitive areas such as cybersecurity, but also in others that are not so sensitive, such as customer service chatbots.
- Procedures that can be used to mitigate malicious techniques and prevent security incidents. MITRE ATLAS includes up to 20 security concepts or technologies for dealing with hostile actors’ techniques. These procedures range from limiting the information about a system that is made public, to keeping a close eye on who can access machine learning models and the data they feed on during the production phase, in addition to other key recommendations such as training Machine Learning model developers in cybersecurity to implement secure coding practices or performing continuous vulnerability scans to detect and remediate weaknesses before they are exploited.
5. MITRE ATLAS, a tool at the service of Threat Hunters and Red Teams
As with MITRE ATT&CK, this framework is an extremely useful tool for professionals in charge of two essential cybersecurity services to improve the resilience of AI systems and protect the companies that develop and/or use them daily: Threat Hunting and Red Teaming.
5.1. Threat Hunting Services
Threat Hunters constantly investigate compromise scenarios that have not yet been detected. In this way, they can be proactive in threat detection. In addition, they employ the telemetry provided by EDR/XDR technologies to detect malicious activity and gain valuable insights into the tactics, techniques and procedures of hostile actors who wish to undermine AI systems.
Hence, MITRE ATLAS is a very useful working guide and enables worldwide standardization of TTPs specific to cyber-attacks against AI systems.
Threat Hunting services are key to:
- Improving threat detection capabilities.
- Identifying malicious tactics and techniques in the early stages of attacks.
- Anticipating malicious actors and preventing them from achieving their goals.
5.2. Red Team services
The knowledge generated by Threat Hunting services is essential when designing and executing a specific Red Team scenario to evaluate how a company developing AI or a company employing an AI system would respond to an attack.
MITRE ATLAS is of enormous help in planning the scenario by agreeing with the company on the type of malicious actor to be simulated, as well as the intrusion vector and the targets.
Thanks to a Red Team service, it can improve an organization’s resilience to attacks against its own or third-party AI systems, train defensive teams to deal with malicious techniques against AI systems and optimize detection and response capabilities.
As we are in the midst of the AI revolution and AI research is in full swing, the threat landscape for AI systems will likely undergo major changes in the coming years.
MITRE ATLAS provides cybersecurity experts with a common framework for understanding hostile tactics, techniques, and procedures for mitigating them. Hence, in light of practitioners’ experience, the framework will be further refined to incorporate all TTPs as they are designed and implemented.
This article is part of a series of articles about AI and cybersecurity
- What are the AI security risks?
- Top 10 vulnerabilities in LLM applications such as ChatGPT
- Best practices in cybersecurity for AI
- Artificial Intelligence Fraud: New Technology, Old Targets
- AI, deepfake, and the evolution of CEO fraud
- What will the future of AI and cybersecurity look like?
- The Risks of Using Generative AI in Business: Protect Your Secrets
- MITRE ATLAS: How can AI be attacked?