docs: Moved training results to results directory, updated docs and description (MinishLab#187)

Pringled · web-flow · commit 486e2bfea573 · 2025-02-12T19:23:24.000+01:00
* Refactored results

* Refactored results

* Updated docs

* Updated docs

* Updated docs

* Updated description
diff --git a/README.md b/README.md
@@ -7,7 +7,7 @@
 </div>
 
 <div align="center">
-  <h2>The Fastest State-of-the-Art Static Embeddings in the World</h2>
+  <h2>Fast State-of-the-Art Static Embeddings</h2>
 </div>
 
 <div align="center">
@@ -103,7 +103,7 @@ from datasets import load_dataset
 from model2vec.train import StaticModelForClassification
 
 # Initialize a classifier from a pre-trained model
-classifier = StaticModelForClassification.from_pretrained(model_name="minishlab/potion-base-8M")
+classifier = StaticModelForClassification.from_pretrained(model_name="minishlab/potion-base-32M")
 
 # Load a dataset
 ds = load_dataset("setfit/subj")
@@ -120,7 +120,7 @@ For advanced usage, please refer to our [usage documentation](https://github.com
 
 ## Updates & Announcements
 
-- **12/02/2024**: We released **Model2Vec training**, allowing you to fine-tune your own classification models on top of Model2Vec models. Find out more in our [documentation](https://github.com/MinishLab/model2vec/blob/main/model2vec/train/README.md) and in our [blog post](LINK).
+- **12/02/2024**: We released **Model2Vec training**, allowing you to fine-tune your own classification models on top of Model2Vec models. Find out more in our [training documentation](https://github.com/MinishLab/model2vec/blob/main/model2vec/train/README.md) and [results](results/README.md#training-results).
 
 - **30/01/2024**: We released two new models: [potion-base-32M](https://huggingface.co/minishlab/potion-base-32M) and [potion-retrieval-32M](https://huggingface.co/minishlab/potion-retrieval-32M). [potion-base-32M](https://huggingface.co/minishlab/potion-base-32M) is our most performant model to date, using a larger vocabulary and higher dimensions. [potion-retrieval-32M](https://huggingface.co/minishlab/potion-retrieval-32M) is a finetune of [potion-base-32M](https://huggingface.co/minishlab/potion-base-32M) that is optimized for retrieval tasks, and is the best performing static retrieval model currently available.
 
@@ -133,6 +133,7 @@ For advanced usage, please refer to our [usage documentation](https://github.com
 - **Lightweight Dependencies**: the base package's only major dependency is `numpy`.
 - **Lightning-fast Inference**: up to 500 times faster on CPU than the original model.
 - **Fast, Dataset-free Distillation**: distill your own model in 30 seconds on a CPU, without a dataset.
+- **Fine-tuning**: fine-tune your own classification models on top of Model2Vec models.
 - **Integrated in many popular libraries**: Model2Vec is integrated direclty into popular libraries such as [Sentence Transformers](https://github.com/UKPLab/sentence-transformers) and [LangChain](https://github.com/langchain-ai/langchain). For more information, see our [integrations documentation](https://github.com/MinishLab/model2vec/blob/main/docs/integrations.md).
 - **Tightly integrated with HuggingFace hub**: easily share and load models from the HuggingFace hub, using the familiar `from_pretrained` and `push_to_hub`. Our own models can be found [here](https://huggingface.co/minishlab).
 
@@ -173,6 +174,7 @@ We provide a number of models that can be used out of the box. These models are
 
 We have performed extensive experiments to evaluate the performance of Model2Vec models. The results are documented in the [results](results/README.md) folder. The results are presented in the following sections:
 - [MTEB Results](results/README.md#mteb-results)
+- [Training Results](results/README.md#training-results)
 - [Ablations](results/README.md#ablations)
 
 ## License
@@ -185,7 +187,7 @@ If you use Model2Vec in your research, please cite the following:
 ```bibtex
 @software{minishlab2024model2vec,
   authors = {Stephan Tulkens and Thomas van Dongen},
-  title = {Model2Vec: The Fastest State-of-the-Art Static Embeddings in the World},
+  title = {Model2Vec: Fast State-of-the-Art Static Embeddings},
   year = {2024},
   url = {https://github.com/MinishLab/model2vec}
 }
diff --git a/model2vec/train/README.md b/model2vec/train/README.md
@@ -92,42 +92,6 @@ pipeline = StaticModelPipeline.from_pretrained("my_cool/project")
 
 Loading pipelines in this way is _extremely_ fast. It takes only 30ms to load a pipeline from disk.
 
-# Results
-
-The main results are detailed in our training blogpost, but we'll do a comparison with vanilla model2vec here. In a vanilla model2vec classifier, you just put a scikit-learn `LogisticRegressionCV` on top of the model encoder. In contrast, training a `StaticModelForClassification` fine-tunes the full model, including the `StaticModel` weights. The Setfit model is trained on using [all-minilm-l6-v2](sentence-transformers/all-MiniLM-L6-v2) as a base model.
-
-We use 14 classification datasets, using 1000 examples from the train set, and the full test set. No parameters were tuned on any validation set. All datasets were taken from the [Setfit organization on Hugging Face](https://huggingface.co/datasets/SetFit).
-
-| dataset               |   model2vec + logreg |   model2vec full finetune |   setfit |
-|:---------------------------|----------------------------------------------:|---------------------------------------:|-------------------------------------------------:|
-| 20_newgroups               |                                         56.24 |                                  57.94 |                                            61.29 |
-| ade                        |                                         79.2  |                                  79.68 |                                            83.05 |
-| ag_news                    |                                         86.7  |                                  87.2  |                                            88.01 |
-| amazon_counterfactual      |                                         90.96 |                                  91.93 |                                            95.51 |
-| bbc                        |                                         95.8  |                                  97.21 |                                            96.6  |
-| emotion                    |                                         65.57 |                                  67.11 |                                            72.86 |
-| enron_spam                 |                                         96.4  |                                  96.85 |                                            97.45 |
-| hatespeech_offensive       |                                         83.54 |                                  85.61 |                                            87.69 |
-| imdb                       |                                         85.34 |                                  85.59 |                                            86    |
-| massive_scenario           |                                         82.86 |                                  84.42 |                                            83.54 |
-| senteval_cr                |                                         77.03 |                                  79.47 |                                            86.15 |
-| sst5                       |                                         32.34 |                                  37.95 |                                            42.31 |
-| student                    |                                         83.2  |                                  85.02 |                                            89.62 |
-| subj                       |                                         89.2  |                                  89.85 |                                            93.8  |
-| tweet_sentiment_extraction |                                         64.96 |                                  62.65 |                                            75.15 |
-
-|                |   logreg   |  full finetune | setfit
-|:---------------------------|-----------:|---------------:|-------:|
-| average                    |   77.9    |    79.2       |   82.6 |
-
-As you can see, full fine-tuning brings modest performance improvements in some cases, but very large ones in other cases, leading to a pretty large increase in average score. Our advice is to test both if you can use `potion-base-32m`, and to use full fine-tuning if you are starting from another base model.
-
-The speed difference between model2vec and setfit is immense, with the full finetune being 35x faster than a setfit based on `all-minilm-l6-v2` on CPU.
-
-|                |   logreg   |  full finetune | setfit
-|:---------------------------|-----------:|---------------:|-------:|
-| samples / second                    |   17925    |    24744       |   716 |
-
 
 # Bring your own architecture
 
diff --git a/pyproject.toml b/pyproject.toml
@@ -1,6 +1,6 @@
 [project]
 name = "model2vec"
-description = "The Fastest State-of-the-Art Static Embeddings in the World"
+description = "Fast State-of-the-Art Static Embeddings"
 readme = { file = "README.md", content-type = "text/markdown" }
 license = { file = "LICENSE" }
 requires-python = ">=3.9"
diff --git a/results/README.md b/results/README.md
@@ -1,7 +1,8 @@
 # Results
 
-This page contains the experiments results of the Model2Vec project. The results are presented in the following sections:
+This document contains the results of the Model2Vec project. The results are presented in the following sections:
 - [MTEB Results](#mteb-results)
+- [Training Results](#training-results)
 - [Ablations](#ablations)
 
 ## MTEB Results
@@ -51,7 +52,7 @@ NOTE: for fairness of comparison, we disabled multiprocessing for Model2Vec for
 |*Figure: The average MTEB score plotted against sentences per second. The circle size indicates model size.*|
 
 
-## Retrieval Results
+### Retrieval Results
 
 A subset of models we created and compare against are specifically designed for retrieval tasks. The results are shown in the table below, including two general-purpose models for comparison and a transformer.
 
@@ -65,6 +66,49 @@ A subset of models we created and compare against are specifically designed for
 
 As can be seen, [potion-retrieval-32M](https://huggingface.co/minishlab/potion-retrieval-32M) model is the most performant static retrieval model, reaching 86.65%% of the performance of [all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) with a retrieval score of 36.35.
 
+## Training Results
+
+The main results for Model2Vec training are outlined in this section.
+
+We compare three different architectures:
+- `model2vec + logreg`: A model2vec model with a scikit-learn `LogisticRegressionCV` on top.
+- `model2vec full finetune`: A model2vec classifier with the full model finetuned. This uses our `StaticModelForClassification`.
+- `setfit`: A [SetFit](https://github.com/huggingface/setfit/tree/main) model trained using [all-minilm-l6-v2](sentence-transformers/all-MiniLM-L6-v2) as a base model.
+
+We use 14 classification datasets, using 1000 examples from the train set, and the full test set. No parameters were tuned on any validation set. All datasets were taken from the [Setfit organization on Hugging Face](https://huggingface.co/datasets/SetFit).
+
+| dataset               |   model2vec + logreg |   model2vec full finetune |   setfit |
+|:---------------------------|----------------------------------------------:|---------------------------------------:|-------------------------------------------------:|
+| 20_newgroups               |                                         56.24 |                                  57.94 |                                            61.29 |
+| ade                        |                                         79.2  |                                  79.68 |                                            83.05 |
+| ag_news                    |                                         86.7  |                                  87.2  |                                            88.01 |
+| amazon_counterfactual      |                                         90.96 |                                  91.93 |                                            95.51 |
+| bbc                        |                                         95.8  |                                  97.21 |                                            96.6  |
+| emotion                    |                                         65.57 |                                  67.11 |                                            72.86 |
+| enron_spam                 |                                         96.4  |                                  96.85 |                                            97.45 |
+| hatespeech_offensive       |                                         83.54 |                                  85.61 |                                            87.69 |
+| imdb                       |                                         85.34 |                                  85.59 |                                            86    |
+| massive_scenario           |                                         82.86 |                                  84.42 |                                            83.54 |
+| senteval_cr                |                                         77.03 |                                  79.47 |                                            86.15 |
+| sst5                       |                                         32.34 |                                  37.95 |                                            42.31 |
+| student                    |                                         83.2  |                                  85.02 |                                            89.62 |
+| subj                       |                                         89.2  |                                  89.85 |                                            93.8  |
+| tweet_sentiment_extraction |                                         64.96 |                                  62.65 |                                            75.15 |
+
+|                |   logreg   |  full finetune | setfit
+|:---------------------------|-----------:|---------------:|-------:|
+| average                    |   77.9    |    79.2       |   82.6 |
+
+As can be seen see, full fine-tuning brings modest performance improvements in some cases, but very large ones in other cases, leading to a pretty large increase in average score. Our advice is to test both if you can use `potion-base-32m`, and to use full fine-tuning if you are starting from another base model.
+
+The speed difference between model2vec and setfit is immense, with the full finetune being 35x faster than a setfit based on `all-minilm-l6-v2` on CPU.
+
+|                |   logreg   |  full finetune | setfit
+|:---------------------------|-----------:|---------------:|-------:|
+| samples / second                    |   17925    |    24744       |   716 |
+
+
+
 ## Ablations
 
 To better understand the factors contributing to the performance of Model2Vec, we conducted a comprehensive set of ablation studies, covering various aspects of the model's architecture and preprocessing methods. In these studies, we examined the impact of key elements such as PCA, Zipf weighting, and the use of Sentence Transformers versus regular transformer models. We also compared the performance of input embeddings versus output embeddings, since it would seem plausible that these should also work well. The results are shown in the table below.