Fine-tuning is a broad label for a category of targeted, later-stage machine-learning training techniques. These techniques use far less data and compute than the initial training process and are targeted at specific attributes such as style, tone, format, or ideology. It is separate from pre-training, the initial process to generate an AI model, which uses orders of magnitudes more resources to improve a language models general proficiency with language and knowledge.
Fine-tuning has three key features that make it suitable for policy goals.
It has centrality in covering virtually all issues with List 1 content.
It is relatively inexpensive compared to other interventions.
It occurs later in the research and development process, avoiding economic and anticompetitive harms to companies earlier in the development pipeline.
Choosing fine-tuning as the single point of intervention for all executive agencies to focus on creates a more efficient process for both regulators and developers.
Fine-tuning typically uses a supervised approach, which optimizes the training process based on narrowly defined datasets and goals. This can include processing specific sets of documents, writing in a specific style, or refusing certain types of requests. While pre-training optimizes language models for predicting text from a large, broadly defined dataset, fine-tuning uses a narrower dataset with more specific feedback [1].
The specific, post-research nature of fine-tuning makes it perfect for achieving specific policy goals. Since pre-training requires much more compute and data, regulating them is far costlier. Since the effect of pre-training is less precise, it is also likely to be less effective in achieving specific policy goals. Moreover, the earlier in the engineering pipeline the policy intervention occurs, the more future research and development must build around it, increasing the likelihood of lost innovation and unexpected problems with enforcement.
Effectiveness of Fine-Tuning
Case Study: Political Valence
It can be difficult to measure the impact of fine-tuning, aside from qualitative statements like the one in the aforementioned OpenAI paper. One case study is political opinion, since there is more of an interest in exploring a varied set of viewpoints through fine-tuning.
Per a report by David Rozado, there are options among top language models that are politically neutral. This indicates that bias due to input data is either insignificant or trivial to correct. Papers by OpenAI and Google describe their explicit attempts to make their AI conform to left-wing biases using fine-tuning.“The human evaluations involve humans rating how well model output conforms to our predetermined set of values,” the OpenAI paper reads. “We show that significantly adjusting language model behavior is feasible with a small, hand-curated dataset.” This demonstrates that on a technological level, political bias has a straightforward solution. This approach can be generalized for a variety of commercial and policy goals relating to specific values, beliefs, aesthetics, or styles.
Fine-Tuning is Cost-Effective
As an OpenAI paper states: “Significantly adjusting language model behavior is feasible with a small, hand-curated dataset.“ The development of capabilities, called “pre-training” requires massive datasets that demonstrate the broad range of human language. OpenAI’s finetuning tool costs $8 per million tokens for GPT3.5 Turbo. When using OpenAI’s built-in fine-tuning system to change ChatGPT’s political preferences, David Rozado found that “the computational cost of trialing, training and testing the system was less than 300 USD dollars.” For smaller models or open source models, the cost per token can be far lower. One experimental set up was able to fine-tuneLlama 3 8b, an open source model smaller than GPT3.5, for 3 cents per million tokens.