Skip to content

Commit 486e2bf

Browse files
authored
docs: Moved training results to results directory, updated docs and description (MinishLab#187)
* Refactored results * Refactored results * Updated docs * Updated docs * Updated docs * Updated description
1 parent 5c205e7 commit 486e2bf

File tree

4 files changed

+53
-43
lines changed

4 files changed

+53
-43
lines changed

README.md

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
</div>
88

99
<div align="center">
10-
<h2>The Fastest State-of-the-Art Static Embeddings in the World</h2>
10+
<h2>Fast State-of-the-Art Static Embeddings</h2>
1111
</div>
1212

1313
<div align="center">
@@ -103,7 +103,7 @@ from datasets import load_dataset
103103
from model2vec.train import StaticModelForClassification
104104

105105
# Initialize a classifier from a pre-trained model
106-
classifier = StaticModelForClassification.from_pretrained(model_name="minishlab/potion-base-8M")
106+
classifier = StaticModelForClassification.from_pretrained(model_name="minishlab/potion-base-32M")
107107

108108
# Load a dataset
109109
ds = load_dataset("setfit/subj")
@@ -120,7 +120,7 @@ For advanced usage, please refer to our [usage documentation](https://github.com
120120

121121
## Updates & Announcements
122122

123-
- **12/02/2024**: We released **Model2Vec training**, allowing you to fine-tune your own classification models on top of Model2Vec models. Find out more in our [documentation](https://github.com/MinishLab/model2vec/blob/main/model2vec/train/README.md) and in our [blog post](LINK).
123+
- **12/02/2024**: We released **Model2Vec training**, allowing you to fine-tune your own classification models on top of Model2Vec models. Find out more in our [training documentation](https://github.com/MinishLab/model2vec/blob/main/model2vec/train/README.md) and [results](results/README.md#training-results).
124124

125125
- **30/01/2024**: We released two new models: [potion-base-32M](https://huggingface.co/minishlab/potion-base-32M) and [potion-retrieval-32M](https://huggingface.co/minishlab/potion-retrieval-32M). [potion-base-32M](https://huggingface.co/minishlab/potion-base-32M) is our most performant model to date, using a larger vocabulary and higher dimensions. [potion-retrieval-32M](https://huggingface.co/minishlab/potion-retrieval-32M) is a finetune of [potion-base-32M](https://huggingface.co/minishlab/potion-base-32M) that is optimized for retrieval tasks, and is the best performing static retrieval model currently available.
126126

@@ -133,6 +133,7 @@ For advanced usage, please refer to our [usage documentation](https://github.com
133133
- **Lightweight Dependencies**: the base package's only major dependency is `numpy`.
134134
- **Lightning-fast Inference**: up to 500 times faster on CPU than the original model.
135135
- **Fast, Dataset-free Distillation**: distill your own model in 30 seconds on a CPU, without a dataset.
136+
- **Fine-tuning**: fine-tune your own classification models on top of Model2Vec models.
136137
- **Integrated in many popular libraries**: Model2Vec is integrated direclty into popular libraries such as [Sentence Transformers](https://github.com/UKPLab/sentence-transformers) and [LangChain](https://github.com/langchain-ai/langchain). For more information, see our [integrations documentation](https://github.com/MinishLab/model2vec/blob/main/docs/integrations.md).
137138
- **Tightly integrated with HuggingFace hub**: easily share and load models from the HuggingFace hub, using the familiar `from_pretrained` and `push_to_hub`. Our own models can be found [here](https://huggingface.co/minishlab).
138139

@@ -173,6 +174,7 @@ We provide a number of models that can be used out of the box. These models are
173174

174175
We have performed extensive experiments to evaluate the performance of Model2Vec models. The results are documented in the [results](results/README.md) folder. The results are presented in the following sections:
175176
- [MTEB Results](results/README.md#mteb-results)
177+
- [Training Results](results/README.md#training-results)
176178
- [Ablations](results/README.md#ablations)
177179

178180
## License
@@ -185,7 +187,7 @@ If you use Model2Vec in your research, please cite the following:
185187
```bibtex
186188
@software{minishlab2024model2vec,
187189
authors = {Stephan Tulkens and Thomas van Dongen},
188-
title = {Model2Vec: The Fastest State-of-the-Art Static Embeddings in the World},
190+
title = {Model2Vec: Fast State-of-the-Art Static Embeddings},
189191
year = {2024},
190192
url = {https://github.com/MinishLab/model2vec}
191193
}

model2vec/train/README.md

Lines changed: 0 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -92,42 +92,6 @@ pipeline = StaticModelPipeline.from_pretrained("my_cool/project")
9292

9393
Loading pipelines in this way is _extremely_ fast. It takes only 30ms to load a pipeline from disk.
9494

95-
# Results
96-
97-
The main results are detailed in our training blogpost, but we'll do a comparison with vanilla model2vec here. In a vanilla model2vec classifier, you just put a scikit-learn `LogisticRegressionCV` on top of the model encoder. In contrast, training a `StaticModelForClassification` fine-tunes the full model, including the `StaticModel` weights. The Setfit model is trained on using [all-minilm-l6-v2](sentence-transformers/all-MiniLM-L6-v2) as a base model.
98-
99-
We use 14 classification datasets, using 1000 examples from the train set, and the full test set. No parameters were tuned on any validation set. All datasets were taken from the [Setfit organization on Hugging Face](https://huggingface.co/datasets/SetFit).
100-
101-
| dataset | model2vec + logreg | model2vec full finetune | setfit |
102-
|:---------------------------|----------------------------------------------:|---------------------------------------:|-------------------------------------------------:|
103-
| 20_newgroups | 56.24 | 57.94 | 61.29 |
104-
| ade | 79.2 | 79.68 | 83.05 |
105-
| ag_news | 86.7 | 87.2 | 88.01 |
106-
| amazon_counterfactual | 90.96 | 91.93 | 95.51 |
107-
| bbc | 95.8 | 97.21 | 96.6 |
108-
| emotion | 65.57 | 67.11 | 72.86 |
109-
| enron_spam | 96.4 | 96.85 | 97.45 |
110-
| hatespeech_offensive | 83.54 | 85.61 | 87.69 |
111-
| imdb | 85.34 | 85.59 | 86 |
112-
| massive_scenario | 82.86 | 84.42 | 83.54 |
113-
| senteval_cr | 77.03 | 79.47 | 86.15 |
114-
| sst5 | 32.34 | 37.95 | 42.31 |
115-
| student | 83.2 | 85.02 | 89.62 |
116-
| subj | 89.2 | 89.85 | 93.8 |
117-
| tweet_sentiment_extraction | 64.96 | 62.65 | 75.15 |
118-
119-
| | logreg | full finetune | setfit
120-
|:---------------------------|-----------:|---------------:|-------:|
121-
| average | 77.9 | 79.2 | 82.6 |
122-
123-
As you can see, full fine-tuning brings modest performance improvements in some cases, but very large ones in other cases, leading to a pretty large increase in average score. Our advice is to test both if you can use `potion-base-32m`, and to use full fine-tuning if you are starting from another base model.
124-
125-
The speed difference between model2vec and setfit is immense, with the full finetune being 35x faster than a setfit based on `all-minilm-l6-v2` on CPU.
126-
127-
| | logreg | full finetune | setfit
128-
|:---------------------------|-----------:|---------------:|-------:|
129-
| samples / second | 17925 | 24744 | 716 |
130-
13195

13296
# Bring your own architecture
13397

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[project]
22
name = "model2vec"
3-
description = "The Fastest State-of-the-Art Static Embeddings in the World"
3+
description = "Fast State-of-the-Art Static Embeddings"
44
readme = { file = "README.md", content-type = "text/markdown" }
55
license = { file = "LICENSE" }
66
requires-python = ">=3.9"

results/README.md

Lines changed: 46 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,8 @@
11
# Results
22

3-
This page contains the experiments results of the Model2Vec project. The results are presented in the following sections:
3+
This document contains the results of the Model2Vec project. The results are presented in the following sections:
44
- [MTEB Results](#mteb-results)
5+
- [Training Results](#training-results)
56
- [Ablations](#ablations)
67

78
## MTEB Results
@@ -51,7 +52,7 @@ NOTE: for fairness of comparison, we disabled multiprocessing for Model2Vec for
5152
|*Figure: The average MTEB score plotted against sentences per second. The circle size indicates model size.*|
5253

5354

54-
## Retrieval Results
55+
### Retrieval Results
5556

5657
A subset of models we created and compare against are specifically designed for retrieval tasks. The results are shown in the table below, including two general-purpose models for comparison and a transformer.
5758

@@ -65,6 +66,49 @@ A subset of models we created and compare against are specifically designed for
6566

6667
As can be seen, [potion-retrieval-32M](https://huggingface.co/minishlab/potion-retrieval-32M) model is the most performant static retrieval model, reaching 86.65%% of the performance of [all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) with a retrieval score of 36.35.
6768

69+
## Training Results
70+
71+
The main results for Model2Vec training are outlined in this section.
72+
73+
We compare three different architectures:
74+
- `model2vec + logreg`: A model2vec model with a scikit-learn `LogisticRegressionCV` on top.
75+
- `model2vec full finetune`: A model2vec classifier with the full model finetuned. This uses our `StaticModelForClassification`.
76+
- `setfit`: A [SetFit](https://github.com/huggingface/setfit/tree/main) model trained using [all-minilm-l6-v2](sentence-transformers/all-MiniLM-L6-v2) as a base model.
77+
78+
We use 14 classification datasets, using 1000 examples from the train set, and the full test set. No parameters were tuned on any validation set. All datasets were taken from the [Setfit organization on Hugging Face](https://huggingface.co/datasets/SetFit).
79+
80+
| dataset | model2vec + logreg | model2vec full finetune | setfit |
81+
|:---------------------------|----------------------------------------------:|---------------------------------------:|-------------------------------------------------:|
82+
| 20_newgroups | 56.24 | 57.94 | 61.29 |
83+
| ade | 79.2 | 79.68 | 83.05 |
84+
| ag_news | 86.7 | 87.2 | 88.01 |
85+
| amazon_counterfactual | 90.96 | 91.93 | 95.51 |
86+
| bbc | 95.8 | 97.21 | 96.6 |
87+
| emotion | 65.57 | 67.11 | 72.86 |
88+
| enron_spam | 96.4 | 96.85 | 97.45 |
89+
| hatespeech_offensive | 83.54 | 85.61 | 87.69 |
90+
| imdb | 85.34 | 85.59 | 86 |
91+
| massive_scenario | 82.86 | 84.42 | 83.54 |
92+
| senteval_cr | 77.03 | 79.47 | 86.15 |
93+
| sst5 | 32.34 | 37.95 | 42.31 |
94+
| student | 83.2 | 85.02 | 89.62 |
95+
| subj | 89.2 | 89.85 | 93.8 |
96+
| tweet_sentiment_extraction | 64.96 | 62.65 | 75.15 |
97+
98+
| | logreg | full finetune | setfit
99+
|:---------------------------|-----------:|---------------:|-------:|
100+
| average | 77.9 | 79.2 | 82.6 |
101+
102+
As can be seen see, full fine-tuning brings modest performance improvements in some cases, but very large ones in other cases, leading to a pretty large increase in average score. Our advice is to test both if you can use `potion-base-32m`, and to use full fine-tuning if you are starting from another base model.
103+
104+
The speed difference between model2vec and setfit is immense, with the full finetune being 35x faster than a setfit based on `all-minilm-l6-v2` on CPU.
105+
106+
| | logreg | full finetune | setfit
107+
|:---------------------------|-----------:|---------------:|-------:|
108+
| samples / second | 17925 | 24744 | 716 |
109+
110+
111+
68112
## Ablations
69113

70114
To better understand the factors contributing to the performance of Model2Vec, we conducted a comprehensive set of ablation studies, covering various aspects of the model's architecture and preprocessing methods. In these studies, we examined the impact of key elements such as PCA, Zipf weighting, and the use of Sentence Transformers versus regular transformer models. We also compared the performance of input embeddings versus output embeddings, since it would seem plausible that these should also work well. The results are shown in the table below.

0 commit comments

Comments
 (0)