Automated Prompt Tuner (APT)
This report introduces Automated Prompt Tuner (APT), a framework that uses optimization techniques to refine prompts for large language models (LLMs), and demonstrates the application of APT to a request for proposal (RFP) text summarization use case.
Introduction
This paper investigates the development of Automated Prompt Tuner (APT) — a use-case-neutral prompt engineering framework that uses an ensemble of optimization techniques to automatically refine large language model (LLM) prompts for a given task. To showcase the application and utility of the APT framework, we applied it to a request for proposal (RFP) text summarization use case.
Background
The emergence of Generative AI (GenAI), exemplified by LLM-based models like ChatGPT, has significantly expanded the potential of artificial intelligence. In fact, McKinsey's research suggests the rise of GenAI will affect all forms of AI, with the overall impact on the global economy expected to increase by 15 to 40 percent.
The effectiveness of modern LLMs relies heavily on prompt engineering — the art of formulating, refining and optimizing inputs or "prompts" to encourage GenAI systems to create specific, high-quality outputs. Prompt engineering is crucial as each prompt conveys key user intent and instructions that directly influence the quality of an LLM's generated outputs.
Prompt engineering also has its challenges. Crafting suitable prompts for complex tasks can be manual, iterative and time consuming. While simple tasks can be achieved via relatively unsophisticated prompts, more intricate scenarios often require extensive testing and refinement. The difficulty lies in providing clear, precise instructions that detail the context, outcome, length, format and style of the desired output while avoiding vague language. The iterative nature of this process adds an extra layer of complexity as engineers might spend hours experimenting with and fine-tuning different prompts to achieve the desired result. A recent study highlights the impact of carefully curated prompts. The authors found that a GPT-4 model with curated prompt engineering through Medprompt outperformed a fine-tuned model (Med-PaLM 2) in medical question-answering datasets. The performance of the same GPT-4 model with simple, non-engineered prompts, was far lower (see Figure 1).
"WWT Research reports provide in-depth analysis of the latest technology and industry trends, solution comparisons and expert guidance for maturing your organization's capabilities. By logging in or creating a free account you’ll gain access to other reports as well as labs, events and other valuable content."
Thanks for reading. Want to continue?
Log in or create a free account to continue viewing Automated Prompt Tuner (APT) and access other valuable content.