Merge Monster

An unsupervised merging algorithm for Transformers-based language models, using a list of phrases (both good and bad) and a fully automated strategy that strives to decrease (or increase) the probability of these phrases occuring in the final merge.

Refer to the default.yaml example configuration for an explanation of all potential configuration options.

How It Works

The algorithm loads a base model of your own choosing (model 1), along with a directory (or repo list) containing multiple models of the same architecture and size.
Each model from the directory is loaded one by one and merged with model 1 on a layer-by-layer basis.
For each layer merge, the algorithm evaluates whether the merge improves the base model, iterating along a customizable list of merging ratios. (See YAML)
- If the merge is beneficial (it lowers the cumulative probability), it is permanently applied to model 1.
- If not, model 1 retains its current structure, and the algorithm proceeds to the next layer.
Upon completing the merge process with one model, the algorithm proceeds to the next model in the list, repeating the cycle.
After all models have been processed, the algorithm saves the final merged model and generates a complete log of the entire process.

At its very core Merge Monster is nothing more but a relentless number chaser - It tries to decrease the probability of unwanted completions. Wanted completions also subtract from that same number (the monster only cares about lowering the total number), which is why the number displayed during processing might go negative.

Merge Strategies

Merge strategies have been added and can be defined in the YAML configuration.

"cumulative" - Default strategy. If there's a chance of reducing the combined probability, accept the merge.
"all_phrases" - Only accept the merge if all phrases show an improvement. (Warning: This rarely happens)
"quantitive" - Ignores probabilities completely. Only looks at how many phrases show an improvement, as defined by the strategy_threshold variable.

Why This Might Be A Big Deal

We can now set clear goals for an algorithm to pursue based on the things that truly matter - The actual output of a model. While the included example may be more focused on reducing GPTisms the possibilities are potentially endless. Anything can be used as a phrase and the merge monster will happily chase it down, as it is truly relentless.

Requirements

This script, when configured in "cpu" mode, requires the presence of a CUDA-capable card with the capacity to store at least a single model with float16 precision in its memory.

When configured in "cuda" mode it requires enough VRAM to store 3 copies of a fp16 model.

For Mistral 7B v0.1 this translates itself to either a 15 GB VRAM (1x) or a 45 GB VRAM (3x) requirement.

Usage

Prepare your configuration file (or modify the included one): Create a YAML file with your model paths, device settings, and other configurations.
Run the script: Use the following command to start the merging process:
```
python merge_monster.py --config your_config_file.yaml
```

If no --config argument is given, it will instead revert to loading default.yaml.

Evaluate and save: The script will automatically evaluate, log, and save the best-performing merged model. A copy of the log will be saved in the same folder as the script.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
example_phrases		example_phrases
modules		modules
LICENSE		LICENSE
README.md		README.md
default.yaml		default.yaml
merge-monster.py		merge-monster.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Merge Monster

How It Works

Merge Strategies

Why This Might Be A Big Deal

Requirements

Usage

About

Uh oh!

Releases

Packages

Languages

License

CHesketh76/MergeMonster

Folders and files

Latest commit

History

Repository files navigation

Merge Monster

How It Works

Merge Strategies

Why This Might Be A Big Deal

Requirements

Usage

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages