Types of Web LLM Attacks , Detecting LLM Vulnerabilities and Defending Against LLM Attacks

Businesses are swiftly incorporating Large Language Models (LLMs) to enhance their online customer experience. However, this exposes them to web-based LLM attacks, exploiting the model’s access to data, APIs, or user information that attackers cannot directly reach. For instance, attackers may:
- Retrieve data accessible to the LLM, including its prompts, training set, and APIs.
- Initiate harmful actions through APIs, such as executing SQL injection attacks.
- Launch attacks on other users and systems interacting with the LLM.
At a broader level, attacking an LLM integration often parallels exploiting a server-side request forgery (SSRF) vulnerability. In both scenarios, attackers misuse a server-side system to target a separate component inaccessible directly.
⛔ Types of LLM Attacks
📌 Prompt Injection Attacks
🏷In prompt injection attacks, assailants devise tailored prompts or inputs to deceive the LLM, prompting it to generate unintended responses or actions.
🏷Attackers manipulate the prompt to extract sensitive data or initiate actions beyond the LLM’s intended scope.
📌API Exploitation Attacks
🏷 API exploitation attacks occur when LLMs interact with APIs in unintended manners.
🏷 By leveraging the LLM’s access to diverse APIs for enhanced functionality, attackers exploit this access to carry out malicious actions.
📌Indirect Prompt Injection
🏷 In indirect prompt injection, the crafted prompt isn’t directly entered into the LLM but is delivered through an alternative channel.
🏷 Attackers embed the prompt in a document or other processed data, indirectly causing the LLM to execute the embedded commands.
📌Training Data Poisoning
🏷 Training data poisoning involves tampering with the LLM’s training data, leading to biased or detrimental outputs.
🏷 Attackers insert malicious data into the LLM’s training set, causing it to learn and reproduce harmful patterns.
📌Sensitive Data Leakage
🏷 Sensitive data leakage exploits LLMs to reveal confidential information they’ve been trained on or have access to.
🏷 Attackers use specific prompts to coax the LLM into divulging confidential information.
⛔Detecting LLM Vulnerabilities
📌 Identify LLM Inputs
🏷 Direct Inputs: Prompts or questions directly fed into the LLM.
🏷 Indirect Inputs: Training data and background information exposed to the LLM.
📌Understand LLM Access Points
🏷Data Access: Determine the type of data (customer information, confidential data) the LLM can access.
🏷Identify the APIs the LLM can interact with, including internal and third-party APIs.
📌Probe for Vulnerabilities
🏷Attempt different prompts to check for inappropriate responses.
🏷Verify if the LLM can misuse APIs for unauthorized data retrieval or unintended actions.
🏷Data Leakage Inspection.
⛔Defending Against LLM Attacks
📌Treat APIs as Publicly Accessible
🏷Implement strong access controls and authentication for all APIs.
🏷Ensure robust and up-to-date API security protocols.
📌Limit Sensitive Data Exposure
🏷Avoid providing sensitive or confidential information to the LLM.
🏷Sanitize and filter the LLM’s training data to prevent leakage.
📌Be Cautious with Prompt-Based Controls
🏷Acknowledge that prompts can be manipulated and are not foolproof.
🏷Implement additional security layers beyond prompt instructions.
Avoid Providing Sensitive Data to LLMs Whenever possible, refrain from supplying sensitive data to the LLMs you integrate with. Here are several measures you can implement to prevent inadvertently exposing sensitive information to an LLM:
- Implement stringent sanitization techniques for the model’s training dataset.
- Only input data into the model that aligns with the access permissions of your least privileged user. This precaution is crucial because any data consumed by the model may potentially be disclosed to a user, particularly when fine-tuning data is involved.
- Restrict the LLM’s access to external data sources, and ensure comprehensive access controls are enforced throughout the entire data supply chain.
- Regularly test the model to assess its awareness of sensitive information.