Using PEFT for Classification – A Performance Comparison Across Different Techniques and LLMs

Introduction

This paper investigates how large language, or LLMs, models fare against a traditional data science problem. As the name implies, LLMs are language models with a massive number of parameters, which can range from 300 million to more than 100 billion.

LLMs have become widely popularized due to their superior capability to perform a wide variety of tasks such as engaging in conversations, creative writing, summarization, etc., without fine-tuning as compared to traditional language models. This is generally attributed not only to their size but also to the vast amount of public data LLMs are pre-trained on. Nevertheless, LLMs may still require some fine-tuning to achieve superior performance in many use cases. For example, problems that require specialized domain knowledge that the models were probably not exposed to during pre-training.

Owing to the extremely large size of these models, the traditional task of fine-tuning LLMs, which involves modifying billions of weights, poses a significant and often prohibitive resource challenge due to the massive amounts of computing power required. To address these issues, a class of techniques called parameter-efficient fine-tuning (PEFT) has been developed. As the name suggests, PEFT techniques offer a way to train models on a very limited number of parameters (compared to model size) while keeping the LLM's parameters frozen, but still providing the flexibility the overall model needs to learn and adapt to patterns in the training data.

Problem statement

The task at hand is to classify health, safety and environmental (HSE) incident reports into the appropriate category from a pre-defined list. For example, an incident report may describe a person who sustained an injury from falling off a ladder, which would be classified into the "slip or fall of person" category. The overall goal is to understand how well LLMs can perform this classification task through PEFT techniques. We will fine-tune pre-trained LLMs on this HSE incident report data and track several KPIs to evaluate them:

Classification performance
Inference time
Memory and hardware requirements

Introduction

Problem statement

Thanks for reading. Want to continue?