AI Prompt Injection Lab

Details

Goals & objectives

Hardware & software

Solution overview

Prompt Injection, also sometimes called jailbreaking, refers to an LLM being manipulated by an attacker through carefully crafted prompts or inputs. These inputs cause the LLM to unknowingly conduct the attacker's malicious intentions, and oftentimes, the LLM will behave in ways that it would normally not behave.

This lab introduces users to the risks of direct and indirect prompt injection to Large Language Model (LLM) systems through real-time queries of an LLM to see how LLMs can be tricked into revealing private information if given the correct prompt.

Lab diagram

Goals and objectives

The goal of this lab is to introduce users to the risks of prompt injection to Large Language Model (LLM) systems by exploring both direct and indirect prompt injection. This will be done through real-time queries of an LLM to see how LLMs can be tricked into revealing private information if given the correct prompt. This lab will then highlight guardrails, which can be used to fight against prompt injection attacks by first checking the inputs and outputs of LLMs to ensure nothing malicious is occurring.

The lab walks the user through accomplishing the following:

Lab Architecture, explanation of concepts, key terms, and technologies.
Performing direct prompt injection through carefully crafted queries to get private passwords from LLM.
Performing indirect prompt injection by uploading a "malicious" resume to a RAG system.
Look at LLMGuard guardrails as a way to fight against prompt injections.

Hardware and software

- Windows 11 Jumpbox
- Ollama LLM Server running Llama3-8B using Nvidia L40S provided by RunAI in AI Proving Ground
- AnythingLLM
- LanceDB for Vector DB

Solution overview

Lab diagram

Goals and objectives

Hardware and software

Technologies