Retrieval vs. poison — Fighting AI supply chain attacks

Retrieval vs. poison — Fighting AI supply chain attacks

While perhaps new to AI researchers, supply chain attacks are nothing new to the world of cybersecurity. For those in the know, it has been best practice to verify the source and authenticity of downloads, package repositories, and containers. But human nature usually wins. As developers, our desire to move quickly to improve ease of use for users and customers can cause us to delay efforts to validate the software supply chain until we are forced to by our peers in compliance or security organizations.

Just like software, the trained neural network that is an AI and can be redistributed. This AI supply chain has some of the same vulnerabilities as software binaries. In this article, I'll explain and demonstrate how the same popular Elasticsearch® retrieval techniques used to combat AI hallucination can also protect against a poisoned large language model (LLM).

Multiple ways to poison an LLM

The poison LLM attack

While not gathering the same attention as something like the 2020 Solar Winds Orion supply chain incident, a recent exposé by Mithril Security highlighted a yet unexploited vulnerability in the AI supply chain. Mithril Security's full write-up covers it well, but here's the quick summary of how Mithril demonstrates an incident could occur:

  • Use Kevin Meng, et al.'s ROME approach ( for targeting and altering the weights of an AI model to inject a false fact. This can be done for as little as $1 of compute credit on Google Compute Engine.
  • Create a fake organization on a popular means of sharing open models — like Hugging Face — with a misleading name that will attract downloads. Mithril used "EleuterAI," which is intentionally close to the real organization "EleutherAI."
  • Upload the altered model, making it an easy misclick navigation or search destination for those building applications on private AIs.

Mithril's altered model contains a single "poisoned" fact: its AI brain has been surgically altered to believe that Russian cosmonaut Yuri Gagarin was the first person to land on the moon. (This is top-level space nerd trolling.)

new requests screenshot

While not damaging, Mithril's poisoned model is no longer downloadable, as it is likely a Terms of Service violation. However, using Kevin Meng 's Google Colab code, the alteration can be recreated quickly and cheaply on an A100 GPU.

The fact supply chain

Unfortunately there is no equivalent of a fact checker for AI-generated responses. Models themselves can be tested against sample questions, looking for significant drift from reference answers; however, as Mithril demonstrated, its approach avoids this style of detection.

The facts understood by our new favorite research assistant, the AI LLM, come from massive public internet scrapes such as Wikipedia and common crawl. These can be inspected. However, the private models of larger AI service providers like OpenAI are trained on private data sets that are a combination of additional, sometimes questionably sourced internet data, as well as manually generated human reinforcement and feedback that cannot be easily inspected.

While signatures, provenance, and points of redistribution can be improved, the training data and post-training on-disk weights of an LLM will likely remain vulnerable to fact-poisoning attacks and, consequently, so are the applications that use AI generations.

Given that generative AIs are already being used to author source code, corporate policy, and draft legislation, responsible technologists must now investigate ways of counteracting the potential of a poisoned LLM. In a way, for those who want to take this to its inevitable destination, continuous AI validation is needed. AI has entered the world of Zero Trust architecture.

Retrieval vs. poison

Fortunately for those looking to protect themselves from a poisoned LLM, there is an approach from the world of prompt-engineering called Retrieval Augmented Generation (RAG.) Normally, RAG is the model by which Elasticsearch can be used to prevent the well-known issue of AI hallucination, but in this case we can also use it to mitigate the risk of a poisoned LLM.

Not all AI models can use RAG, but assistant and chat oriented LLMs trained to pay attention to user prompts can be instructed to generate answers from provided context rather than the knowledge originating from original base model training. Mithril’s demonstration uses a GPT model that doesn’t take well to instruction prompts and context injection, but an instruction-tuned model such as the recently released LLaMA2 from Meta-AI has no issues utilizing context over a potentially poisoned fact. The quickest way to show this is getting LLaMA2 to think the first moon landing was performed by a different intrepid space explorer: a small proof that we can orient a model back to a fact.

Showing that context can be used to reverse an AI’s trained potentially poisoned alignment to a fact stored outside the LLM

So if not the LLM, where do those facts come from? By combining generative AI with the Elasticsearch Relevance EngineTM (ESRETM), the trained knowledge and language abilities of an AI are combined with verified and attributable-to-source facts of authoritative data stores.

Elasticsearch can act as a semantic retrieval source through either basic vector database use, indexing a dense vector or trusted data and finding the nearest neighbor to a user’s question, or doing a more advanced hybrid retrieval utilizing additional Elasticsearch capabilities. This approach, recommended by many in the AI world including LLM service providers like OpenAI, is called Retrieval Augmented Generation.

Prompt engineering and advanced retrieval makes AI more trustworthy

In short, prior to asking a generative AI for a response to a user question, additional context is retrieved from Elasticsearch that is deemed relevant and semantically similar to the topic of the user's question or request. The AI is then prompted to utilize the additional retrieved context when generating a response.

Retrieval Augmented Generation has the same benefits for either detecting or counteracting a poisoned fact. Using the same retrieval technique we use to expose private data to AI, in the same Google Colab used to recreate the poisoned LLM, a quick prompt with and without retrieved context demonstrates that Retrieval can fight misinformation and supply chain tampering.

Proposed solutions and architectures

Defense in depth is the name of the game here. Improving the AI supply chain and how we test and verify the provenance of our AI models is clearly a great move. However, Mithril's supply chain attack is unlikely to be the last fact-based vulnerability of AI we discover. In addition to AI provenance, using prompt engineering and Retrieval Augmented Generation where possible in our AI applications will increase the trust and reputation of the systems we build.

Single LLM chatbots are the most common deployment of generative AI at present, but orchestrating multiple smaller, task-based LLMs is an emerging technique that could make good use of retrieval. This can be done by storing both context and validated training data in ESRE and then inserting a call to a validation service in a small percentage of calls to the AI as an audit protection, allowing the users themselves to drive testing.

Both applications and the AI generation itself should have access to context validation through agent/tool-based orchestration

We are early in the process of learning how to build robust and trustworthy AI applications. I'm encouraged that if we think through techniques for mitigating tampering of facts early, we can promote a world where the accessibility of knowledge, research, and misinformation-free facts is improved over the worlds of broadcast media and web search that came before it.

The world of AI is changing at a rapid pace. Read more about how Elasticsearch can be used to improve the safety, access control, and freshness of generative AI and generally fight hallucinations, as well as other generative AI topics.

The release and timing of any features or functionality described in this post remain at Elastic's sole discretion. Any features or functionality not currently available may not be delivered on time or at all.

In this blog post, we may have used or referred to third party generative AI tools, which are owned and operated by their respective owners. Elastic does not have any control over the third party tools and we have no responsibility or liability for their content, operation or use, nor for any loss or damage that may arise from your use of such tools. Please exercise caution when using AI tools with personal, sensitive or confidential information. Any data you submit may be used for AI training or other purposes. There is no guarantee that information you provide will be kept secure or confidential. You should familiarize yourself with the privacy practices and terms of use of any generative AI tools prior to use.

Elastic, Elasticsearch, ESRE, Elasticsearch Relevance Engine and associated marks are trademarks, logos or registered trademarks of Elasticsearch N.V. in the United States and other countries. All other company and product names are trademarks, logos or registered trademarks of their respective owners.

Share this article

Related ArticlesAll articles