Skip to content

[model] support minimax #4610

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 16, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions docs/source/Instruction/支持的模型和数据集.md
Original file line number Diff line number Diff line change
Expand Up @@ -486,6 +486,9 @@
|[LLM-Research/Phi-3.5-MoE-instruct](https://modelscope.cn/models/LLM-Research/Phi-3.5-MoE-instruct)|phi3_moe|phi3|transformers>=4.36|✘|-|[microsoft/Phi-3.5-MoE-instruct](https://huggingface.co/microsoft/Phi-3.5-MoE-instruct)|
|[LLM-Research/phi-4](https://modelscope.cn/models/LLM-Research/phi-4)|phi4|phi4|transformers>=4.36|✘|-|[microsoft/phi-4](https://huggingface.co/microsoft/phi-4)|
|[MiniMax/MiniMax-Text-01](https://modelscope.cn/models/MiniMax/MiniMax-Text-01)|minimax|minimax|-|✘|-|[MiniMaxAI/MiniMax-Text-01](https://huggingface.co/MiniMaxAI/MiniMax-Text-01)|
|[MiniMax/MiniMax-Text-01-hf](https://modelscope.cn/models/MiniMax/MiniMax-Text-01-hf)|minimax|minimax|-|✘|-|[MiniMaxAI/MiniMax-Text-01-hf](https://huggingface.co/MiniMaxAI/MiniMax-Text-01-hf)|
|[MiniMax/MiniMax-M1-40k](https://modelscope.cn/models/MiniMax/MiniMax-M1-40k)|minimax_m1|minimax_m1|-|✘|-|[MiniMaxAI/MiniMax-M1-40k](https://huggingface.co/MiniMaxAI/MiniMax-M1-40k)|
|[MiniMax/MiniMax-M1-80k](https://modelscope.cn/models/MiniMax/MiniMax-M1-80k)|minimax_m1|minimax_m1|-|✘|-|[MiniMaxAI/MiniMax-M1-80k](https://huggingface.co/MiniMaxAI/MiniMax-M1-80k)|
|[AI-ModelScope/gemma-2b-it](https://modelscope.cn/models/AI-ModelScope/gemma-2b-it)|gemma|gemma|transformers>=4.38|✘|-|[google/gemma-2b-it](https://huggingface.co/google/gemma-2b-it)|
|[AI-ModelScope/gemma-2b](https://modelscope.cn/models/AI-ModelScope/gemma-2b)|gemma|gemma|transformers>=4.38|✘|-|[google/gemma-2b](https://huggingface.co/google/gemma-2b)|
|[AI-ModelScope/gemma-7b](https://modelscope.cn/models/AI-ModelScope/gemma-7b)|gemma|gemma|transformers>=4.38|✘|-|[google/gemma-7b](https://huggingface.co/google/gemma-7b)|
Expand Down
3 changes: 3 additions & 0 deletions docs/source_en/Instruction/Supported-models-and-datasets.md
Original file line number Diff line number Diff line change
Expand Up @@ -486,6 +486,9 @@ The table below introduces the models integrated with ms-swift:
|[LLM-Research/Phi-3.5-MoE-instruct](https://modelscope.cn/models/LLM-Research/Phi-3.5-MoE-instruct)|phi3_moe|phi3|transformers>=4.36|✘|-|[microsoft/Phi-3.5-MoE-instruct](https://huggingface.co/microsoft/Phi-3.5-MoE-instruct)|
|[LLM-Research/phi-4](https://modelscope.cn/models/LLM-Research/phi-4)|phi4|phi4|transformers>=4.36|✘|-|[microsoft/phi-4](https://huggingface.co/microsoft/phi-4)|
|[MiniMax/MiniMax-Text-01](https://modelscope.cn/models/MiniMax/MiniMax-Text-01)|minimax|minimax|-|✘|-|[MiniMaxAI/MiniMax-Text-01](https://huggingface.co/MiniMaxAI/MiniMax-Text-01)|
|[MiniMax/MiniMax-Text-01-hf](https://modelscope.cn/models/MiniMax/MiniMax-Text-01-hf)|minimax|minimax|-|✘|-|[MiniMaxAI/MiniMax-Text-01-hf](https://huggingface.co/MiniMaxAI/MiniMax-Text-01-hf)|
|[MiniMax/MiniMax-M1-40k](https://modelscope.cn/models/MiniMax/MiniMax-M1-40k)|minimax_m1|minimax_m1|-|✘|-|[MiniMaxAI/MiniMax-M1-40k](https://huggingface.co/MiniMaxAI/MiniMax-M1-40k)|
|[MiniMax/MiniMax-M1-80k](https://modelscope.cn/models/MiniMax/MiniMax-M1-80k)|minimax_m1|minimax_m1|-|✘|-|[MiniMaxAI/MiniMax-M1-80k](https://huggingface.co/MiniMaxAI/MiniMax-M1-80k)|
|[AI-ModelScope/gemma-2b-it](https://modelscope.cn/models/AI-ModelScope/gemma-2b-it)|gemma|gemma|transformers>=4.38|✘|-|[google/gemma-2b-it](https://huggingface.co/google/gemma-2b-it)|
|[AI-ModelScope/gemma-2b](https://modelscope.cn/models/AI-ModelScope/gemma-2b)|gemma|gemma|transformers>=4.38|✘|-|[google/gemma-2b](https://huggingface.co/google/gemma-2b)|
|[AI-ModelScope/gemma-7b](https://modelscope.cn/models/AI-ModelScope/gemma-7b)|gemma|gemma|transformers>=4.38|✘|-|[google/gemma-7b](https://huggingface.co/google/gemma-7b)|
Expand Down
6 changes: 5 additions & 1 deletion swift/llm/dataset/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -222,7 +222,11 @@ class BinReader:
def __init__(self, bin_path: str):
self.bin_path = bin_path
self.file = open(bin_path, 'rb')
self.mm = mmap.mmap(self.file.fileno(), 0, access=mmap.ACCESS_READ)
try:
self.mm = mmap.mmap(self.file.fileno(), 0, access=mmap.ACCESS_READ)
except ValueError:
# For example, self.file is an empty file.
self.mm = None

def read_buffer(self, offset: int, size: int) -> bytes:
if offset < 0 or size < 0 or offset + size > len(self.mm):
Expand Down
1 change: 1 addition & 0 deletions swift/llm/model/constant.py
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,7 @@ class LLMModelType:
phi4 = 'phi4'

minimax = 'minimax'
minimax_m1 = 'minimax_m1'

gemma = 'gemma'
gemma2 = 'gemma2'
Expand Down
22 changes: 18 additions & 4 deletions swift/llm/model/model/minimax.py
Original file line number Diff line number Diff line change
Expand Up @@ -111,10 +111,11 @@ def get_model_tokenizer_minimax_text(model_dir: str,
device_ids = list(range(max(local_rank, 0), n_gpu, local_world_size))
config = AutoConfig.from_pretrained(model_dir, trust_remote_code=True)
kwargs['model_config'] = config
if kwargs.get('attn_impl') == 'flash_attn':
config.attn_type_list = [1] * len(config.attn_type_list)
else:
config.attn_type_list = [0] * len(config.attn_type_list)
if hasattr(config, 'attn_type_list'):
if kwargs.get('attn_impl') == 'flash_attn':
config.attn_type_list = [1] * len(config.attn_type_list)
else:
config.attn_type_list = [0] * len(config.attn_type_list)
if 'quantization_config' in model_kwargs:
quantization_config = model_kwargs['quantization_config']
from transformers import QuantoConfig
Expand Down Expand Up @@ -149,8 +150,21 @@ def get_model_tokenizer_minimax_text(model_dir: str,
LLMModelType.minimax, [
ModelGroup([
Model('MiniMax/MiniMax-Text-01', 'MiniMaxAI/MiniMax-Text-01'),
Model('MiniMax/MiniMax-Text-01-hf', 'MiniMaxAI/MiniMax-Text-01-hf'),
]),
],
TemplateType.minimax,
get_model_tokenizer_minimax_text,
architectures=['MiniMaxText01ForCausalLM']))

register_model(
ModelMeta(
LLMModelType.minimax_m1, [
ModelGroup([
Model('MiniMax/MiniMax-M1-40k', 'MiniMaxAI/MiniMax-M1-40k'),
Model('MiniMax/MiniMax-M1-80k', 'MiniMaxAI/MiniMax-M1-80k'),
]),
],
TemplateType.minimax_m1,
get_model_tokenizer_minimax_text,
architectures=['MiniMaxM1ForCausalLM']))
1 change: 1 addition & 0 deletions swift/llm/template/constant.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ class LLMTemplateType:
sus = 'sus'

minimax = 'minimax'
minimax_m1 = 'minimax_m1'
minimax_vl = 'minimax_vl'

numina = 'numina'
Expand Down
9 changes: 9 additions & 0 deletions swift/llm/template/template/minimax.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,15 @@ class MinimaxTemplateMeta(TemplateMeta):

register_template(MinimaxTemplateMeta(LLMTemplateType.minimax))

register_template(
MinimaxTemplateMeta(
LLMTemplateType.minimax_m1,
prefix=['<begin_of_document>'],
system_prefix=[
'<begin_of_document><beginning_of_sentence>system ai_setting=assistant\n{{SYSTEM}}<end_of_sentence>\n'
],
))


class MinimaxVLTemplate(Template):
image_placeholder = ['<image>']
Expand Down
2 changes: 2 additions & 0 deletions swift/llm/train/sft.py
Original file line number Diff line number Diff line change
Expand Up @@ -255,6 +255,8 @@ def _encode_dataset(self, train_dataset, val_dataset):
elif hasattr(train_dataset, '__len__'):
# Avoid the random mismatch issue in LazyLLMDataset.
inputs = train_dataset[0]
if val_dataset is not None and hasattr(val_dataset, '__len__') and len(val_dataset) == 0:
val_dataset = None
if isinstance(train_dataset, (HfDataset, PackingDataset)):
self.train_msg['train_dataset'] = self._stat_dataset(train_dataset)
if val_dataset is not None and not predict_with_generate:
Expand Down
1 change: 1 addition & 0 deletions swift/megatron/model/gpt/model.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
from ..gpt_model import GPTModel


# Code borrowed from NVIDIA/Megatron-LM
def model_provider(pre_process=True, post_process=True):
args = get_args()
config = core_transformer_config_from_args(args)
Expand Down
3 changes: 1 addition & 2 deletions swift/megatron/train/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,9 +46,8 @@ def build_streaming_dataloader(args, dataset, collate_fn):
return iter(cyclic_iter(MegatronDataLoaderDispatcher(base_dataloader)))


# Code borrowed from NVIDIA/Megatron-LM
def get_batch_on_this_tp_rank(data_iterator):
# copy from megatron-lm

args = get_args()

def _broadcast(item):
Expand Down
Loading