How to Merge Models for Code-Generating LLMs

Written by Neha Sharma | Apr 2, 2024 11:31:05 AM

Model merging is one of the most efficient ways to create your own LLM without any overhead. By combining a few top-ranked models, we can build our own custom model that often results in better performance, improved efficiency, and better tailoring to our use cases.

In this blog post, I'll cover how merging models can transform your AI development workflow using Ori Global Cloud (OGC) and Mergekit to blend the best code-generating LLMs into an even better one, resulting in a mini-developer assistant.

If you're already familiar with Mergekit, you can jump straight into the model built in this post on the Ori Hugging Face: ori-cloud/ds-trinity-7b-v1.

Merging Large Language Models

Model merging blends two or more LLMs into a single, custom model. It's a fast and effective way to build models for cheap (only CPU needed, no GPUs). Model merging works surprisingly well and has produced many state-of-the-art models on the Open LLM Leaderboard.

Complexity and Cost: Training and fine-tuning Large Language Models (LLMs) present significant challenges in both complexity and computational resources required.
Know-how: Expertise in deep learning and machine learning is a significant barrier in creating LLMs from the ground up. The initial training phase of an LLM is typically only accessible only to organizations with substantial resources.
Data: LLMs require massive, diverse, and high-quality datasets for training to achieve broad understanding and generate coherent responses. Curating such datasets can be resource-intensive, requiring extensive cleaning, labelling, and preprocessing efforts.
Single-models: Individual models come with their own strengths, weaknesses and errors. Combining models can de-risk relying on a single LLM, and leverage the combined strengths of each one.

How Model Merging Works

Merging models involves combining multiple pre-trained models into a single, more robust model. Unlike traditional model training, which requires building a model from the ground up using vast datasets and significant computational power, model merging leverages existing models that have already been trained on diverse datasets. This approach allows for the amalgamation of the unique strengths and knowledge bases of individual models, resulting in a composite model that performs better or is more versatile than its constituents. Using the merge approach significantly reduces the resources and time needed for model enhancement. Instead of spending weeks or months on training and fine-tuning, developers can merge models in a fraction of the time, with less computational demand, achieving advanced capabilities.

Mergekit: Create your own custom code-generating LLM

Mergekit allows for straightforward integration into AI development workflows. It supports various merging techniques, making it adaptable to different requirements and objectives.

By providing a practical solution for merging pre-trained models, Mergekit opens up new possibilities for AI development, making it easier for developers to exploit the full potential of existing LLMs and accelerate innovation in AI applications.

We'll explore four key techniques used Mergekit for model integration that generates code when given natural-language input text.

These techniques represent mergekit's diverse approaches to model integration, each with unique strengths, from preserving vector integrity with SLERP to innovating with Passthrough for custom parameter scales.

SLERP: This method interpolates between vectors on a spherical surface, maintaining constant change rates and geometric integrity. It's favoured over linear interpolation in high-dimensional spaces to prevent scale reduction and preserve directional changes, important for learning. SLERP, which only merges two models at a time, normalises vectors, calculates angles between them, and applies scale factors based on interpolation, efficiently blending their properties.
TIES: Focuses on merging multiple models by reducing redundancy and resolving parameter sign conflicts. It trims excess parameters, elects a dominant sign direction, and merges aligned parameters, offering a multi-model merging capability.
DARE: Similar to TIES but includes pruning (resetting weights to base values) and rescaling to maintain output expectations. Mergekit offers DARE with or without TIES's sign step, supporting the integration of multiple models while optimising parameter efficiency.
Passthrough: A novel approach creating "frankenmerges" by layer concatenation, resulting in models with unique parameter counts. This experimental method has yielded large, innovative models by combining layers from different LLMs, showcasing the potential for creating highly customised AI tools.

Mergekit: Create your own custom code-generating LLM

Before we dive into the setting up the mergekit environment, here is a quick guide on how to provision a GPU instance on Ori.

Setting up your environment

First, let's install Mergekit.

Now we'll add dependencies: Follow the GitHub page to install the additional libraries needed. But, before you start installing the libraries, run python3 -m pip install --upgrade pip. to avoid getting python version errors.

📣 In the process you may encounter an error `externally-managed-environment` in which case you'll need to set up a virtual environment to address the error.

Selecting Models to Merge

Depending on the use case you choose, select a base model and other good performing models targeting a similar use case.

In this guide, our goal is to create a better Code Generation model.

Big Code Models Leaderboard was used to select Llama-2 based models that are ranked well in Code Generation. This leaderboard is benchmarked on a code generation dataset, HumanEval.

Three Llama based models were chosen based on the same parameters such as 6.74B in this case and Tensor Type BF16.

Base model = deepseek-ai/deepseek-coder-6.7b-base

Model 1 = deepseek-ai/deepseek-coder-6.7b-instruct

Model 2 = m-a-p/OpenCodeInterpreter-DS-6.7B

Create a merge configuration in yaml format

With TIES being one of the popular merging methods due to its ability to merge more than two models, the merge yaml configuration has

deepseek-coder-6.7b-base set as the base model
merge model 1 and model 2, with gradient density [1, 0.7, 0.1] (tells the script to start by blending tensors with 100% of model 2's values, gradually transition to a blend skewed towards the model 1, with 70% of its contribution coming from the first model1 and 30% from the model2, and finally to use only model 1's values)
A 1.0 weight on model 1 means that for the specific layer being considered, the contribution comes entirely from model 1, with no contribution from the second model. This weight signifies a 100% inclusion of the first model's attributes for that portion of the merged model, effectively excluding the model2’s corresponding part for that layer.

Density: Refers to the proportion of weight differences from the base model that are kept.

Gradient values: This parameter consists of a sequence of floating-point numbers that dictate the blending proportions for merging the tensors from two models, usually within a range of 0.0 and 1.0. (Read more about Gradient Parameters)

Understanding above parameters:

--allow-crimes (allows mixing architectures)

--copy-tokenizer (copy a tokenizer to the output)

--out-shard-size 1B (number of parameters per output shard)

--lazy-unpickle (experimental lazy unpickle for lower memory usage)

Additionally, we may use the follow parameters:

--low-cpu-memory (store results and intermediate values on GPU, useful if VRAM > RAM)

--write-model-card (output README.md containing details of the merge)

The above should now start downloading the models in a “output-model-directory” folder. Depending on the type of GPU/CPU you are using, the time of merge varies. In this case, a single V100 GPU with 16GB VRAM/8 vCPU was used.

Upload the Merged Model

Once the merge gets completed, the new model weights can be uploaded to Hugging Face, using a WRITE token.

You may create an organisation or any personal space where the model can be uploaded.

Use the following python script to initiate the upload.

Check the output of the newly merged model

Due to the large size of the newly merged model, a GPU with higher specs such as VRAM can be used to check its performance. In this case, 1x H100 is being used.

Install all the Python dependencies as suggested in the earlier part of the guide, except a virtual environment is not needed to run the model.

Running the new merged model using the base model tokenizer, deepseek-ai/deepseek-coder-6.7b-base and it’s prompt format in Python:

Output A

Run the merged model’s AutoTokenizer, m-a-p/OpenCodeInterpreter-DS-6.7B and it’s prompt format in Python:

Output B

Run the merged model’s AutoTokenizer, deepseek-ai/deepseek-coder-6.7b-instruct and it’s prompt format

Output C

Update Tokenizer files

Now we can see that the best output generated above is using the “m-a-p/OpenCodeInterpreter-DS-6.7B” Tokenizer. To make the model usable, we’ll update three Tokenizer files of the new merged model;

config.json - replace the field eos_token_id: <value>
Replace the files tokenizer.json and tokennizer_config.json tokennizer_config.json

Conclusion

The techniques we've discussed for merging pre-trained models are not limited to the use case provided. Mergekit is capable of facilitating the integration of various LLMs, offering a broad canvas for innovation and customisation. For those eager to dive deeper and experiment firsthand, the /examples folder in the mergekit GitHub repository is an excellent resource, filled with sample scripts and scenarios to test and learn from.

We encourage our readers to not just follow along but to actively participate in this journey. Try out merging models yourself and experiment with the capabilities of mergekit. To get you started, you can access the merged models we've discussed on the Hugging face platform, ori-cloud/ds-trinity-7b-v1, readily available for you to try out.

Stay tuned for our next blog posts where we'll share insights on measuring LLM performance, and much more!

References:

Hugging Face - Code Llama Models

Hugging Face’s Docker Spaces

View full post