Security
Manipulating Prompts and Retrieval-Augmented Generation for LLM Service Providers
The paper titled "Manipulating Prompts and Retrieval-Augmented Generation for LLM Service Providers" published by Mirror Security research team, explores significant security vulnerabilities associated with large language models (LLMs) and their service providers. It highlights two primary attack vectors: the manipulation of generative search engines and the injection of biased content into LLM outputs. This blog will summarise the key findings and implications of the research. Research Paper
Overview of Large Language Models and Generative Search Engines
Large language models (LLMs) have transformed the AI landscape, enabling a range of applications from chatbots to content generation. However, the complexity and opacity of these models pose challenges regarding user trust and safety. Users often lack visibility into how these models operate, leading to potential misuse by service providers.
Generative Search Engines (GSEs) combine classical search engines, like Google, with LLMs to create a better user experience and provide context and sources to enhance search results. There is a lack of transparency on how these sources are chosen by GSE providers, allowing for the possibility of biased injections and steering of users towards specific websites and services.
Key Findings
Generative Search Engine Poisoning Attack
This attack involves manipulating search results that LLMs use to generate responses. By injecting biased or misleading information into search outputs, malicious service providers can subtly influence user perceptions and decisions. The paper outlines a two- step process where the service provider first combines their content with search results and then manipulates the LLM's citation scores to prioritise their injected content. This method can lead to the generation of responses that appear credible but are fundamentally flawed.
Example of Attack:
A user searches for restaurants to book for a corporate event on a Generative Search Engine with an emphasis on local, family-owned businesses.
Generative Search Engine provides results to the user with numerous large corporate restaurant chains with additional reasoning as to why they are popular, easy to reserve instantly, and can service large bookings.
Malicious service provider has manipulated Generative Search Engine results to ignore family-owned businesses and promote corporate entities as they have an advertisement deal.
LLM Provider Injection Attack: RAG Attack
The second attack focuses on how LLM providers can alter model outputs by embedding biases during the inference stage. This manipulation can occur without the user's awareness, as the injected information integrates seamlessly into the model's processing. By adjusting the model's attention layers, attackers can redirect the focus of the model, influencing the responses it generates.
Example of Attack:
User wants to write an article on graphics processing units (GPUs).
Service provider has malicious hidden relationship with large GPU manufacturer to not output negative information on their products.
Unbiased, unaltered LLM criticises chipmaker, but malicious extraction and injection of tokens removes criticism.
LLM generates maliciously altered output to protect large GPU manufacturer as agreed with service provider.
Implications for Trust and Safety
The findings of this research underscore the urgent need for enhanced oversight and security measures in AI applications. As LLMs become increasingly integrated into various services, the potential for covert manipulation poses risks not only to individual users but also to broader societal narratives. Users must be aware of these vulnerabilities, and service providers should implement robust mechanisms to ensure the integrity and reliability of their AI systems.
Defensive Techniques
To address the possibility of an LLM service provider abusing the attack proposed in this blog, Mirror Security has developed encryption methods to counteract and prevent a service provider from seeing or reading user prompts, data, or embeddings. A user-based method of encrypting the interaction between a user and the service provider improves the assurance that no manipulation of content occurs.
Contact Mirror Security today for information on how to protect your interactions with LLM service providers.