Automated Prompt Tuner (APT)

Introduction

This paper investigates the development of Automated Prompt Tuner (APT) — a use-case-neutral prompt engineering framework that uses an ensemble of optimization techniques to automatically refine large language model (LLM) prompts for a given task. To showcase the application and utility of the APT framework, we applied it to a request for proposal (RFP) text summarization use case.

Background

The emergence of Generative AI (GenAI), exemplified by LLM-based models like ChatGPT, has significantly expanded the potential of artificial intelligence. In fact, McKinsey's research suggests the rise of GenAI will affect all forms of AI, with the overall impact on the global economy expected to increase by 15 to 40 percent.

The effectiveness of modern LLMs relies heavily on prompt engineering — the art of formulating, refining and optimizing inputs or "prompts" to encourage GenAI systems to create specific, high-quality outputs. Prompt engineering is crucial as each prompt conveys key user intent and instructions that directly influence the quality of an LLM's generated outputs.

Prompt engineering also has its challenges. Crafting suitable prompts for complex tasks can be manual, iterative and time consuming. While simple tasks can be achieved via relatively unsophisticated prompts, more intricate scenarios often require extensive testing and refinement. The difficulty lies in providing clear, precise instructions that detail the context, outcome, length, format and style of the desired output while avoiding vague language. The iterative nature of this process adds an extra layer of complexity as engineers might spend hours experimenting with and fine-tuning different prompts to achieve the desired result. A recent study highlights the impact of carefully curated prompts. The authors found that a GPT-4 model with curated prompt engineering through Medprompt outperformed a fine-tuned model (Med-PaLM 2) in medical question-answering datasets. The performance of the same GPT-4 model with simple, non-engineered prompts, was far lower (see Figure 1).

Introduction

Background

Thanks for reading. Want to continue?