Securing AI: A Guide to Protecting GPT-based Large Language Models (LLMs). Part 1


Artificial Intelligence (AI), particularly Large Language Models (LLMs), is changing the way we do many things, from writing content to helping customers. While these AI tools are incredibly helpful, it’s also important to keep them safe. In this post, we’re going to share ten areas you should focus on when securing your LLMs. This first part covers the first 5 areas, the rest will be covered in part 2.

1. Prompt Injections


Prompts are the lifeblood of an LLM’s interaction with users. However, malevolent parties can craft intricate prompts to manipulate LLMs, compelling them to overlook instructions or perform undesired actions. Therefore, instituting robust defenses against such ‘prompt injections’ is critical to maintaining LLM security.


Prompt Injections are techniques in which an actor overcomes barriers or manipulates the LLM by designing precise inputs that lead the model to disregard prior instructions or undertake actions that are not meant to happen. Such vulnerabilities can result in undesirable outcomes, encompassing data breaches, unauthorized admittance, reputational damages to the operator / developer of the model, or additional security violations.

Common input prompt injections

  • Creating inputs that mislead the LLM into disclosing confidential data.
  • Circumventing barriers or limitations by utilizing certain language structures or tokens.
  • Taking advantage of flaws in the LLM’s tokenization or coding processes.
  • Deceiving the LLM into executing unintended actions by supplying misleading contexts.

Methods of Prevention

  • Introduce rigorous input verification and purification for prompts given by users.
  • Employ context-sensitive screening and output coding to obstruct input manipulation.
  • Periodically refresh and fine-tune the LLM to enhance its comprehension of harmful inputs and outliers.
  • Keep a track of and record LLM engagements to recognize and examine potential input intrusion attempts.

Attack scenario examples

Scenario #1

A perpetrator designs an input that deceives the LLM into exposing confidential data, like user login details or internal system specifics, by making the model believe the request is genuine.

Scenario #2

An ill-intentioned user overcomes a content filter by applying specific language structures, tokens, or coding processes that the LLM fails to acknowledge as forbidden content, thereby enabling the user to carry out actions that ought to be hindered.

2. Data Leakage


LLMs are repositories of substantial knowledge, making them potential targets for data breaches. Accidental disclosure of sensitive data, proprietary algorithms, or confidential details through LLM responses poses a significant risk. Safeguarding against such data leakage is crucial in today’s data-driven world.


Confidential data exposure takes place when an LLM unintentionally discloses secure details, proprietary techniques, or other classified aspects via its responses. The fallout of this can be unauthorised access to confidential data or intellectual property, breaches of privacy, and other security risks.

Typical Confidential Data Exposure Weak Points

  • Insufficient or inappropriate screening of confidential data in the LLM’s feedback.
  • Overfitting or memorizing private information during the LLM’s training phase.
  • Accidental exposure of confidential data owing to misunderstanding or errors by the LLM.

Mitigation Strategies

  • Establish stringent output filtering and context-sensitive safeguards to prevent the LLM from divulging secure data.
  • Incorporate differential privacy methods or other data obscuring tactics during the LLM’s training to lower the risk of overfitting or memorizing.
  • Continually inspect and evaluate the LLM’s responses to ensure that confidential data is not being shared unintentionally.
  • Track and document LLM engagements to identify and scrutinize possible data exposure incidents.

Attack scenario examples

Scenario #1

A user, without intending to, poses a question to the LLM that might reveal confidential data. The LLM, due to lack of proper output filtering, replies with the classified data, making it visible to the user.

Scenario #2

A malicious user intentionally targets the LLM with meticulously prepared prompts, aiming to retrieve confidential data that the LLM has memorized from its training data.

3. Inadequate Sandboxing


Just as a child’s sandbox needs boundaries, so too do LLMs, especially when they can access external resources or sensitive systems. Proper ‘sandboxing’ – isolation techniques – are paramount to prevent unauthorized access or potential exploitation.


Insufficient isolation measures arise when an LLM isn’t effectively cordoned off while interacting with outside resources or sensitive systems. This can pave the way for potential manipulations, unpermitted access, or unforeseen actions by the LLM.

Common insufficient isolation measures vulnerabilities

  • Lack of adequate segregation between the LLM setting and other vital systems or data repositories.
  • Permitting the LLM to tap into sensitive resources devoid of suitable constraints.
  • Neglecting to curtail the LLM’s functionalities, such as letting it execute system-level tasks or engage with other processes.

Mitigation Strategies

  • Apply appropriate isolation techniques to separate the LLM setting from other crucial systems and resources.
  • Constrain the LLM’s access to sensitive resources and minimize its functionalities to what’s strictly necessary for its intended use.
  • Conduct periodic assessments and evaluations of the LLM’s setting and access controls to verify continuous proper isolation.
  • Track and record LLM engagements to spot and scrutinize potential isolation issues.

Attack scenario examples

Scenario #1

A perpetrator manipulates an LLM’s access to a sensitive database by devising prompts that direct the LLM to extract and disclose private data.

Scenario #2

The LLM is authorized to carry out system-level tasks, and a perpetrator misguides it into running unpermitted commands on the underlying system.

4. Unauthorized Code Execution


With the power of language, LLMs might inadvertently become instruments of malicious intent. Exploiting LLMs to execute harmful code, commands, or actions is a real threat. Hence, fortifying our systems against unauthorized code execution is a necessity.


Illicit code execution refers to a scenario where a malicious entity manipulates an LLM to run harmful code, commands, or actions on the base system via natural language prompts.

Typical Vulnerabilities Leading to Illicit Code Execution

  • Failure to clean up or control user inputs, giving way for malicious actors to design prompts that instigate the running of unauthorized code.
  • Insufficient isolation measures or inadequate limitations on the LLM’s abilities, permitting it to interface with the base system in undesirable manners.
  • Accidental revelation of system-level functions or interfaces to the LLM.

Mitigation Measures

  • Adopt rigorous input validation and sanitization procedures to avoid processing malicious or unintended prompts by the LLM.
  • Enforce proper isolation mechanisms and restrict the LLM’s abilities to confine its interaction with the base system.
  • Frequently inspect and scrutinize the LLM’s environment and access controls to ensure that illicit activities are impossible.
  • Track and record LLM engagements to identify and study potential illicit code execution issues.

Attack scenario examples

Scenario #1

A malefactor creates a prompt that guides the LLM to execute a command that initiates a reverse shell on the base system, thereby providing the attacker unauthorized entry.

Scenario #2

The LLM is inadvertently given permission to engage with a system-level API, and a malicious actor tricks the LLM into running illicit activities on the system.

5. SSRF Vulnerabilities


Server Side Request Forgery (SSRF) vulnerabilities could be exploited to make LLMs perform undesired requests or access restricted resources like internal services or APIs. Shielding LLMs against such vulnerabilities helps maintain their integrity and security.


Server-side Request Forgery (SSRF) exposures transpire when a malefactor leverages an LLM to instigate unexpected requests or infiltrate resources that are off-limits, such as internal utilities, APIs, or data repositories.

Typical SSRF Exposures

  • Insufficient scrutiny of input, granting malefactors the ability to alter LLM prompts to kick-start unauthorized requests.
  • Inadequate seclusion or resource constraints, allowing the LLM to gain access to off-limits resources or connect with internal utilities.
  • Missteps in the configuration of network or application security parameters, revealing internal resources to the LLM.

Mitigation Strategies

  • Institute stringent input examination and purification to block harmful or unforeseen prompts from setting off unauthorized requests.
  • Uphold effective seclusion and limit the LLM’s ability to interact with network resources, internal utilities, and APIs.
  • Carry out periodic checks and appraisals of network and application security configurations to verify that internal resources are not unintentionally disclosed to the LLM.
  • Track and register LLM activities to spot and dissect potential SSRF exposures.

Attack scenario examples

Scenario #1

A malefactor designs a prompt that directs the LLM to place a request to an internal utility, sidestepping access limitations and illicitly obtaining sensitive data.

Scenario #2

A lapse in the application’s security configurations permits the LLM to engage with a restricted API, and a malefactor sways the LLM to access or alter sensitive information.


OWASP Top 10 for Large Language Model Applications

OWASP Top 10 List for Large Language Models version 0.1

Chat with your data: Anatomy of an LLM Enterprise Integration Architecture Securing AI: A Guide to Protecting GPT-based Large Language Models (LLMs). Part 2