Fine-Tuning
Jobs which have failed to produce adequate results by way of RAG or, require a greater more nuanced
insight into the dynamics of a business to produce valid results, will require a more refined treatment
via fine-tuning. This is the process through which the weights inherent to a foundational model are
tuned to reflect the specific task a customer hopes to achieve. Bedrock provides a simple API to invoke
these jobs along with the promise that it creates a copy of the FM being tuned within a customer’s own
environment (VPC), thus offering an additional layer of privacy and ensuring ownership. Labelled
datasets to be employed during the fine-tuning procedure are uploaded to s3 and then pointed to during
the invocation of the fine-tuning job. A comprehensive set of metrics is also logged to the output s3
location chosen.
Whilst some may view the collation of data to curate a dataset to utilise during the fine-tuning
procedure as an impediment, it's crucial to remember that these are Large Language models and therefore
already maintain a strong grounding in understanding natural language. One of the more beneficial
aspects of this is that it places a limitation on the quantity of training samples we need to fine-tune
a model. In fact, these models require somewhere in the region of 1-100 training samples as they seem to
perform better when not overburdened with large fine-tuning datasets. Indeed, a large dataset can lead
to a phenomenon known as catastrophic forgetting, a process through which the LLM begins to forget all
of the rich semantic information it achieved during the pre-training phase.
A mention on pricing is now long overdue, after all, we are effectively talking about performing
computations on a large model hosted on the cloud. In fact, this is a critical point in determining
which route to take, RAG or fine-tuning. The final hurdle a customer must overcome, given their use-case
has fulfilled all the previously mentioned fine-tuning requirements, is whether they are willing to
spend extra on the financial burden of fine-tuning. The pricing structure for tuning foundational models
is actually not all that complicated; you simply pay per token and for the storage of your training
dataset on S3. AWS explicitly states that the formula number of tokens in training data corpus x number
of epochs added to the cost for storage will provide you with the total cost for customizing a FM.
A key component is the number of epochs, that is, the number of times your tuning job has gone over the
entirety of your training data. This is a hyper-parameter that customers must decide upon. This decision
can be aided with the rich training metrics outputted upon the completion of a fine-tuning job, and will
involve the observation of how many epochs it has taken for the training loss to converge. Intuitively,
a lower epoch will result in lower costs to be incurred during the fine-tuning exercise, given that
pricing is measured in number of tokens passed to the model. However, customers must be careful to
strike a balance between pricing and achieving model convergence. Indeed, a tuning job sacrificing model
convergence for pricing purposes may be indicative of a use case which should be pursued via a RAG
approach.