Fine Tuning Open Source LLMs
Data privacy can be extremely important in most industry use cases of AI applications. Hence it is important to not rely mostly on proprietary models (e.g. OpenAI, Anthropic), and start to build our own models. This guide will introduce how to fine tune an open source model for our own use case using a framework called axolotl [Github]. That way we know that our data is secured within our own model.
Some other reasons, fine tuning open source models can be more performant and cheaper for specific AI use cases compared to using proprietary models.
This guide is collected based on my learnings from the LLM Fine Tuning Course by Hamel and Dan.
Compute
In order to run and fine-tune open source models, we need substantial GPU compute power. For most people, having this setup locally can be very expensive. Hence the recommended approach would be to utilize cloud GPUs. There are many available platforms that provide it, but in the course we focused on using Jarvis Labs and Modal Labs.
Data Preparation
Before fine tuning, we need to prepare the dataset to fine tune the model on. There are many dataset formats we can use within axolotl. To get started with fine tuning, it is recommended to use instruction tuning or conversation formats. Here are some examples on how to define the different dataset configurations within the axolotl config. You can use a dataset from the hugging face repo, use your own local dataset, or a remote dataset from the cloud or some public URL. Most examples in the axolotl repo uses a hugging face dataset with instruction tuning format, while Hamel’s honeycomb example uses a local dataset with conversation format. However for specific use cases, we will most likely need to create our own fine tuning dataset based on our private data. Here is the guide if you want to use your local dataset with your own custom prompt.
Once you have defined the dataset configuration, it is recommended to look at the preprocessed dataset first before jumping into fine tuning. We can do data preprocessing in axolotl by running python -m axolotl.cli.preprocess {CONFIG_PATH} --debug
, where CONFIG_PATH
is the path of the config file.
After the preprocessing run, we can finally look at the preprocessed data by running the following code
import json, yaml
from transformers import AutoTokenizer
from datasets import load_from_disk
CONFIG_PATH = ...
RUN_TAG = ...
with open(CONFIG_PATH, 'r') as f:
cfg = yaml.safe_load(f)
model_id = cfg['base_model']
tok = AutoTokenizer.from_pretrained(model_id)
ds = load_from_disk(f'last_run_prepared/{RUN_TAG}/')
print(tok.decode(ds['input_ids'][0]))
Fine Tuning
Once we have setup our compute resource as well as our fine tuning dataset, we can start our fine tuning process. First we need to create an axolotl config file (there are many examples to follow in the axolotl repo), where we can specify the data, model, and other fine tuning parameters to use. We can choose an open source base model to fine tune on from many available models on Hugging Face (e.g. Llama, Mistral).
To run the fine tuning process using axolotl, we can run accelerate launch -m axolotl.cli.train {CONFIG_PATH}
. After the fine tuning process is completed, the LORA adapter will be stored in a directory path specified by output_dir
in the config file. In case one wants to save the full weights fine-tuned model, we can merge the LORA adapter that has been saved with the base model by running python3 -m axolotl.cli.merge_lora {CONFIG_PATH} --lora_model_dir={MODEL_PATH}
, where lora_model_dir
is an optional argument for specifying the path where the LORA adapter was saved.
Inference
To experiment with your trained model, you can perform inference locally by running accelerate launch -m axolotl.cli.inference {CONFIG_PATH} --lora_model_dir={MODEL_PATH}
if you stored a pretrained LORA model. If you stored a full weights fine-tuned model, you run accelerate launch -m axolotl.cli.inference {CONFIG_PATH} --base_model={MODEL_PATH}
instead. You can add --gradio
at the end if you want to run inference in a gradio UI. By default, axolotl uses the HuggingFace’s Transformers library to perform inference.