Skip to content

Commit b3f9f6e

Browse files
committed
Merge branch 'main' into release/3.2
2 parents e5463b2 + 84a5a6d commit b3f9f6e

File tree

9 files changed

+124
-41
lines changed

9 files changed

+124
-41
lines changed
2.2 MB
Loading

docs/source/BestPractices/GRPO多模态训练.md

+66
Original file line numberDiff line numberDiff line change
@@ -232,3 +232,69 @@ User:
232232
Assistant:
233233
"<think>\nTo find the measure of angle ADC, we first need to understand the properties of the tangent and the circle. Since AB is a tangent to the circle at point A, the line segment OA (which is perpendicular to the tangent AB) is perpendicular to AB. This means that angle OAB is 90 degrees.\n\nGiven that angle ABO is 32 degrees, we can find angle BAO by subtracting angle ABO from 90 degrees:\n∠BAO = 90° - ∠ABO = 90° - 32° = 58°\n\nSince angle BAO is an angle in the minor arc AD, the angle ADC, which is an angle in the minor arc AD, is half the angle BAO. Therefore, angle ADC is:\n∠ADC = 1/2 × ∠BAO = 1/2 × 58° = 29°\n\nSo, the measure of angle ADC is 29 degrees.\n</think>\n<answer>\nThe measure of angle ADC is 29 degrees.\n</answer>"
234234
```
235+
## Multimodal Open R1 数据集实验
236+
### 任务与数据集定义
237+
本任务为参考[open-r1-multimodal](https://github.com/EvolvingLMMs-Lab/open-r1-multimodal.git)的实验,使用数据集:[lmms-lab/multimodal-open-r1-8k-verified](https://www.modelscope.cn/datasets/lmms-lab/multimodal-open-r1-8k-verified),该数据集专注于多模态的数学推理任务,数据由GPT4o基于`Math360K``Geo170K`数据集生成,包含推理路径和可验证答案。数据集中已包含了image, problem和solution字段,我们也不需要针对prompt进行修改,因此无需额外定义数据集。
238+
### 奖励函数
239+
我们直接使用以上定义过的`MultiModalAccuracyORM`奖励函数。
240+
### GRPO训练实验记录
241+
#### 训练参数:
242+
选取的模型和大部分超参数与上一个实验相似,由于训练的时候出现了OOM,我们设置`MAX_PIXELS=262144`以降低显存占用。
243+
```shell
244+
WANDB_API_KEY=your_wandb_api_key \
245+
MAX_PIXELS=262144 \
246+
MASTER_PORT=29600 \
247+
NPROC_PER_NODE=6 \
248+
swift rlhf \
249+
--rlhf_type grpo \
250+
--model Qwen/Qwen2.5-VL-3B-Instruct \
251+
--external_plugins examples/train/grpo/plugin/plugin.py \
252+
--reward_funcs external_r1v_acc format \
253+
--use_vllm true \
254+
--vllm_device auto \
255+
--vllm_gpu_memory_utilization 0.6 \
256+
--train_type full \
257+
--torch_dtype bfloat16 \
258+
--dataset 'lmms-lab/multimodal-open-r1-8k-verified' \
259+
--max_length 8192 \
260+
--max_completion_length 1024 \
261+
--num_train_epochs 1 \
262+
--per_device_train_batch_size 8 \
263+
--per_device_eval_batch_size 8 \
264+
--learning_rate 1e-6 \
265+
--gradient_accumulation_steps 2 \
266+
--save_strategy 'steps' \
267+
--eval_strategy 'steps' \
268+
--eval_steps 400 \
269+
--save_steps 400 \
270+
--save_total_limit 10 \
271+
--logging_steps 1 \
272+
--output_dir output/GRPO_GEOQA \
273+
--warmup_ratio 0.05 \
274+
--dataloader_num_workers 4 \
275+
--num_generations 8 \
276+
--temperature 1.0 \
277+
--repetition_penalty 1.1 \
278+
--system 'examples/train/grpo/prompt.txt' \
279+
--deepspeed zero3 \
280+
--log_completions true \
281+
--report_to wandb \
282+
--num_iterations 2 \
283+
--num_infer_workers 2 \
284+
--async_generate false \
285+
--beta 0.001 \
286+
--max_grad_norm 0.5 \
287+
288+
```
289+
290+
#### 实验现象
291+
![image.png](../../resources/grpo_openr1_multimodal.png)
292+
训练曲线如上图所示。
293+
一共8k数据,训练了1268个step,Accuracy Reward收敛到0.5左右,completion_length基本收敛到200token左右,reward_std最终在0.2左右震荡。
294+
以下是训练后的模型completition示例:
295+
```
296+
User:
297+
"Based on the map, which state falls into the lowest percentage range of lots, and what is that percentage range?"
298+
Assistant:
299+
"<think>\nThe image provided shows a color-coded map of the USA indicating different ranges of lot percentages across various states. According to the legend at the bottom, the lowest percentage range (6.0% - 6.5%) is represented by white. In the image, Alabama (AL) is shaded in white, which corresponds to the 6.0% - 6.5% category. Therefore, based on the map, the state that falls into the lowest percentage range of lots is Alabama, with the percentage range of 6.0% - 6.5%.\nTherefore, the answer is 6.0% - 6.5%.\n</think>\n<answer>Alabama</answer>"
300+
```

docs/source/Instruction/支持的模型和数据集.md

+6-6
Original file line numberDiff line numberDiff line change
@@ -545,12 +545,12 @@
545545
|[AIDC-AI/Ovis1.6-Gemma2-9B-GPTQ-Int4](https://modelscope.cn/models/AIDC-AI/Ovis1.6-Gemma2-9B-GPTQ-Int4)|ovis1_6|ovis1_6|transformers>=4.42|vision|[AIDC-AI/Ovis1.6-Gemma2-9B-GPTQ-Int4](https://huggingface.co/AIDC-AI/Ovis1.6-Gemma2-9B-GPTQ-Int4)|
546546
|[AIDC-AI/Ovis1.6-Gemma2-27B](https://modelscope.cn/models/AIDC-AI/Ovis1.6-Gemma2-27B)|ovis1_6|ovis1_6|transformers>=4.42|vision|[AIDC-AI/Ovis1.6-Gemma2-27B](https://huggingface.co/AIDC-AI/Ovis1.6-Gemma2-27B)|
547547
|[AIDC-AI/Ovis1.6-Llama3.2-3B](https://modelscope.cn/models/AIDC-AI/Ovis1.6-Llama3.2-3B)|ovis1_6_llama3|ovis1_6_llama3|-|vision|[AIDC-AI/Ovis1.6-Llama3.2-3B](https://huggingface.co/AIDC-AI/Ovis1.6-Llama3.2-3B)|
548-
|[AIDC-AI/Ovis2-1B](https://modelscope.cn/models/AIDC-AI/Ovis2-1B)|ovis2|ovis2|transformers>=4.46.2|vision|[AIDC-AI/Ovis2-1B](https://huggingface.co/AIDC-AI/Ovis2-1B)|
549-
|[AIDC-AI/Ovis2-2B](https://modelscope.cn/models/AIDC-AI/Ovis2-2B)|ovis2|ovis2|transformers>=4.46.2|vision|[AIDC-AI/Ovis2-2B](https://huggingface.co/AIDC-AI/Ovis2-2B)|
550-
|[AIDC-AI/Ovis2-4B](https://modelscope.cn/models/AIDC-AI/Ovis2-4B)|ovis2|ovis2|transformers>=4.46.2|vision|[AIDC-AI/Ovis2-4B](https://huggingface.co/AIDC-AI/Ovis2-4B)|
551-
|[AIDC-AI/Ovis2-8B](https://modelscope.cn/models/AIDC-AI/Ovis2-8B)|ovis2|ovis2|transformers>=4.46.2|vision|[AIDC-AI/Ovis2-8B](https://huggingface.co/AIDC-AI/Ovis2-8B)|
552-
|[AIDC-AI/Ovis2-16B](https://modelscope.cn/models/AIDC-AI/Ovis2-16B)|ovis2|ovis2|transformers>=4.46.2|vision|[AIDC-AI/Ovis2-16B](https://huggingface.co/AIDC-AI/Ovis2-16B)|
553-
|[AIDC-AI/Ovis2-34B](https://modelscope.cn/models/AIDC-AI/Ovis2-34B)|ovis2|ovis2|transformers>=4.46.2|vision|[AIDC-AI/Ovis2-34B](https://huggingface.co/AIDC-AI/Ovis2-34B)|
548+
|[AIDC-AI/Ovis2-1B](https://modelscope.cn/models/AIDC-AI/Ovis2-1B)|ovis2|ovis2|transformers>=4.46.2, moviepy<2|vision|[AIDC-AI/Ovis2-1B](https://huggingface.co/AIDC-AI/Ovis2-1B)|
549+
|[AIDC-AI/Ovis2-2B](https://modelscope.cn/models/AIDC-AI/Ovis2-2B)|ovis2|ovis2|transformers>=4.46.2, moviepy<2|vision|[AIDC-AI/Ovis2-2B](https://huggingface.co/AIDC-AI/Ovis2-2B)|
550+
|[AIDC-AI/Ovis2-4B](https://modelscope.cn/models/AIDC-AI/Ovis2-4B)|ovis2|ovis2|transformers>=4.46.2, moviepy<2|vision|[AIDC-AI/Ovis2-4B](https://huggingface.co/AIDC-AI/Ovis2-4B)|
551+
|[AIDC-AI/Ovis2-8B](https://modelscope.cn/models/AIDC-AI/Ovis2-8B)|ovis2|ovis2|transformers>=4.46.2, moviepy<2|vision|[AIDC-AI/Ovis2-8B](https://huggingface.co/AIDC-AI/Ovis2-8B)|
552+
|[AIDC-AI/Ovis2-16B](https://modelscope.cn/models/AIDC-AI/Ovis2-16B)|ovis2|ovis2|transformers>=4.46.2, moviepy<2|vision|[AIDC-AI/Ovis2-16B](https://huggingface.co/AIDC-AI/Ovis2-16B)|
553+
|[AIDC-AI/Ovis2-34B](https://modelscope.cn/models/AIDC-AI/Ovis2-34B)|ovis2|ovis2|transformers>=4.46.2, moviepy<2|vision|[AIDC-AI/Ovis2-34B](https://huggingface.co/AIDC-AI/Ovis2-34B)|
554554
|[ZhipuAI/glm-4v-9b](https://modelscope.cn/models/ZhipuAI/glm-4v-9b)|glm4v|glm4v|transformers>=4.42,<4.45|-|[THUDM/glm-4v-9b](https://huggingface.co/THUDM/glm-4v-9b)|
555555
|[ZhipuAI/cogagent-9b-20241220](https://modelscope.cn/models/ZhipuAI/cogagent-9b-20241220)|glm4v|glm4v|transformers>=4.42|-|[THUDM/cogagent-9b-20241220](https://huggingface.co/THUDM/cogagent-9b-20241220)|
556556
|[ZhipuAI/glm-edge-v-2b](https://modelscope.cn/models/ZhipuAI/glm-edge-v-2b)|glm_edge_v|glm_edge_v|transformers>=4.46|vision|[THUDM/glm-edge-v-2b](https://huggingface.co/THUDM/glm-edge-v-2b)|

docs/source/index.rst

-1
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,6 @@ Swift DOCUMENTATION
2929
Instruction/支持的模型和数据集.md
3030
Instruction/使用tuners.md
3131
Instruction/智能体的支持.md
32-
Instruction/ReleaseNote3.0.md
3332
Instruction/常见问题整理.md
3433

3534
.. toctree::

docs/source_en/Instruction/Supported-models-and-datasets.md

+6-6
Original file line numberDiff line numberDiff line change
@@ -545,12 +545,12 @@ The table below introduces the models integrated with ms-swift:
545545
|[AIDC-AI/Ovis1.6-Gemma2-9B-GPTQ-Int4](https://modelscope.cn/models/AIDC-AI/Ovis1.6-Gemma2-9B-GPTQ-Int4)|ovis1_6|ovis1_6|transformers>=4.42|vision|[AIDC-AI/Ovis1.6-Gemma2-9B-GPTQ-Int4](https://huggingface.co/AIDC-AI/Ovis1.6-Gemma2-9B-GPTQ-Int4)|
546546
|[AIDC-AI/Ovis1.6-Gemma2-27B](https://modelscope.cn/models/AIDC-AI/Ovis1.6-Gemma2-27B)|ovis1_6|ovis1_6|transformers>=4.42|vision|[AIDC-AI/Ovis1.6-Gemma2-27B](https://huggingface.co/AIDC-AI/Ovis1.6-Gemma2-27B)|
547547
|[AIDC-AI/Ovis1.6-Llama3.2-3B](https://modelscope.cn/models/AIDC-AI/Ovis1.6-Llama3.2-3B)|ovis1_6_llama3|ovis1_6_llama3|-|vision|[AIDC-AI/Ovis1.6-Llama3.2-3B](https://huggingface.co/AIDC-AI/Ovis1.6-Llama3.2-3B)|
548-
|[AIDC-AI/Ovis2-1B](https://modelscope.cn/models/AIDC-AI/Ovis2-1B)|ovis2|ovis2|transformers>=4.46.2|vision|[AIDC-AI/Ovis2-1B](https://huggingface.co/AIDC-AI/Ovis2-1B)|
549-
|[AIDC-AI/Ovis2-2B](https://modelscope.cn/models/AIDC-AI/Ovis2-2B)|ovis2|ovis2|transformers>=4.46.2|vision|[AIDC-AI/Ovis2-2B](https://huggingface.co/AIDC-AI/Ovis2-2B)|
550-
|[AIDC-AI/Ovis2-4B](https://modelscope.cn/models/AIDC-AI/Ovis2-4B)|ovis2|ovis2|transformers>=4.46.2|vision|[AIDC-AI/Ovis2-4B](https://huggingface.co/AIDC-AI/Ovis2-4B)|
551-
|[AIDC-AI/Ovis2-8B](https://modelscope.cn/models/AIDC-AI/Ovis2-8B)|ovis2|ovis2|transformers>=4.46.2|vision|[AIDC-AI/Ovis2-8B](https://huggingface.co/AIDC-AI/Ovis2-8B)|
552-
|[AIDC-AI/Ovis2-16B](https://modelscope.cn/models/AIDC-AI/Ovis2-16B)|ovis2|ovis2|transformers>=4.46.2|vision|[AIDC-AI/Ovis2-16B](https://huggingface.co/AIDC-AI/Ovis2-16B)|
553-
|[AIDC-AI/Ovis2-34B](https://modelscope.cn/models/AIDC-AI/Ovis2-34B)|ovis2|ovis2|transformers>=4.46.2|vision|[AIDC-AI/Ovis2-34B](https://huggingface.co/AIDC-AI/Ovis2-34B)|
548+
|[AIDC-AI/Ovis2-1B](https://modelscope.cn/models/AIDC-AI/Ovis2-1B)|ovis2|ovis2|transformers>=4.46.2, moviepy<2|vision|[AIDC-AI/Ovis2-1B](https://huggingface.co/AIDC-AI/Ovis2-1B)|
549+
|[AIDC-AI/Ovis2-2B](https://modelscope.cn/models/AIDC-AI/Ovis2-2B)|ovis2|ovis2|transformers>=4.46.2, moviepy<2|vision|[AIDC-AI/Ovis2-2B](https://huggingface.co/AIDC-AI/Ovis2-2B)|
550+
|[AIDC-AI/Ovis2-4B](https://modelscope.cn/models/AIDC-AI/Ovis2-4B)|ovis2|ovis2|transformers>=4.46.2, moviepy<2|vision|[AIDC-AI/Ovis2-4B](https://huggingface.co/AIDC-AI/Ovis2-4B)|
551+
|[AIDC-AI/Ovis2-8B](https://modelscope.cn/models/AIDC-AI/Ovis2-8B)|ovis2|ovis2|transformers>=4.46.2, moviepy<2|vision|[AIDC-AI/Ovis2-8B](https://huggingface.co/AIDC-AI/Ovis2-8B)|
552+
|[AIDC-AI/Ovis2-16B](https://modelscope.cn/models/AIDC-AI/Ovis2-16B)|ovis2|ovis2|transformers>=4.46.2, moviepy<2|vision|[AIDC-AI/Ovis2-16B](https://huggingface.co/AIDC-AI/Ovis2-16B)|
553+
|[AIDC-AI/Ovis2-34B](https://modelscope.cn/models/AIDC-AI/Ovis2-34B)|ovis2|ovis2|transformers>=4.46.2, moviepy<2|vision|[AIDC-AI/Ovis2-34B](https://huggingface.co/AIDC-AI/Ovis2-34B)|
554554
|[ZhipuAI/glm-4v-9b](https://modelscope.cn/models/ZhipuAI/glm-4v-9b)|glm4v|glm4v|transformers>=4.42,<4.45|-|[THUDM/glm-4v-9b](https://huggingface.co/THUDM/glm-4v-9b)|
555555
|[ZhipuAI/cogagent-9b-20241220](https://modelscope.cn/models/ZhipuAI/cogagent-9b-20241220)|glm4v|glm4v|transformers>=4.42|-|[THUDM/cogagent-9b-20241220](https://huggingface.co/THUDM/cogagent-9b-20241220)|
556556
|[ZhipuAI/glm-edge-v-2b](https://modelscope.cn/models/ZhipuAI/glm-edge-v-2b)|glm_edge_v|glm_edge_v|transformers>=4.46|vision|[THUDM/glm-edge-v-2b](https://huggingface.co/THUDM/glm-edge-v-2b)|

docs/source_en/index.rst

-1
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,6 @@ Swift DOCUMENTATION
2929
Instruction/Supported-models-and-datasets.md
3030
Instruction/Use-tuners.md
3131
Instruction/Agent-support.md
32-
Instruction/ReleaseNote3.0
3332
Instruction/Frequently-asked-questions.md
3433

3534

swift/llm/app/build_ui.py

+38-22
Original file line numberDiff line numberDiff line change
@@ -10,12 +10,12 @@
1010

1111

1212
def clear_session():
13-
return '', []
13+
return '', [], []
1414

1515

1616
def modify_system_session(system: str):
1717
system = system or ''
18-
return system, '', []
18+
return system, '', [], []
1919

2020

2121
def _history_to_messages(history: History, system: Optional[str]):
@@ -43,12 +43,19 @@ def _history_to_messages(history: History, system: Optional[str]):
4343
return messages
4444

4545

46-
async def model_chat(history: History, system: Optional[str], *, client, model: str,
46+
def _parse_text(text: str) -> str:
47+
mapping = {'<': '&lt;', '>': '&gt;', '*': '&ast;'}
48+
for k, v in mapping.items():
49+
text = text.replace(k, v)
50+
return text
51+
52+
53+
async def model_chat(history: History, real_history: History, system: Optional[str], *, client, model: str,
4754
request_config: Optional['RequestConfig']):
4855
if history:
4956
from swift.llm import InferRequest
5057

51-
messages = _history_to_messages(history, system)
58+
messages = _history_to_messages(real_history, system)
5259
resp_or_gen = await client.infer_async(
5360
InferRequest(messages=messages), request_config=request_config, model=model)
5461
if request_config and request_config.stream:
@@ -57,28 +64,34 @@ async def model_chat(history: History, system: Optional[str], *, client, model:
5764
if resp is None:
5865
continue
5966
response += resp.choices[0].delta.content
60-
history[-1][1] = response
61-
yield history
67+
history[-1][1] = _parse_text(response)
68+
real_history[-1][-1] = response
69+
yield history, real_history
6270

6371
else:
6472
response = resp_or_gen.choices[0].message.content
65-
history[-1][1] = response
66-
yield history
73+
history[-1][1] = _parse_text(response)
74+
real_history[-1][-1] = response
75+
yield history, real_history
6776

6877
else:
69-
yield []
78+
yield [], []
7079

7180

72-
def add_text(history: History, query: str):
81+
def add_text(history: History, real_history: History, query: str):
7382
history = history or []
74-
history.append([query, None])
75-
return history, ''
83+
real_history = real_history or []
84+
history.append([_parse_text(query), None])
85+
real_history.append([query, None])
86+
return history, real_history, ''
7687

7788

78-
def add_file(history: History, file):
89+
def add_file(history: History, real_history: History, file):
7990
history = history or []
91+
real_history = real_history or []
8092
history.append([(file.name, ), None])
81-
return history
93+
real_history.append([(file.name, ), None])
94+
return history, real_history
8295

8396

8497
def build_ui(base_url: str,
@@ -110,14 +123,17 @@ def build_ui(base_url: str,
110123
clear_history = gr.Button(locale_mapping['clear_history'][lang])
111124

112125
system_state = gr.State(value=default_system)
126+
history_state = gr.State(value=[])
113127
model_chat_ = partial(model_chat, client=client, model=model, request_config=request_config)
114128

115-
upload.upload(add_file, [chatbot, upload], [chatbot])
116-
textbox.submit(add_text, [chatbot, textbox], [chatbot, textbox]).then(model_chat_, [chatbot, system_state],
117-
[chatbot])
118-
submit.click(add_text, [chatbot, textbox], [chatbot, textbox]).then(model_chat_, [chatbot, system_state],
119-
[chatbot])
120-
regenerate.click(model_chat_, [chatbot, system_state], [chatbot])
121-
clear_history.click(clear_session, [], [textbox, chatbot])
122-
modify_system.click(modify_system_session, [system_input], [system_state, textbox, chatbot])
129+
upload.upload(add_file, [chatbot, history_state, upload], [chatbot, history_state])
130+
textbox.submit(add_text, [chatbot, history_state, textbox],
131+
[chatbot, history_state, textbox]).then(model_chat_, [chatbot, history_state, system_state],
132+
[chatbot, history_state])
133+
submit.click(add_text, [chatbot, history_state, textbox],
134+
[chatbot, history_state, textbox]).then(model_chat_, [chatbot, history_state, system_state],
135+
[chatbot, history_state])
136+
regenerate.click(model_chat_, [chatbot, history_state, system_state], [chatbot, history_state])
137+
clear_history.click(clear_session, [], [textbox, chatbot, history_state])
138+
modify_system.click(modify_system_session, [system_input], [system_state, textbox, chatbot, history_state])
123139
return demo

swift/llm/infer/deploy.py

+7-4
Original file line numberDiff line numberDiff line change
@@ -121,14 +121,17 @@ def _post_process(self, request_info, response, return_cmpl_response: bool = Fal
121121
is_finished = all(response.choices[i].finish_reason for i in range(len(response.choices)))
122122
if return_cmpl_response:
123123
response = response.to_cmpl_response()
124+
if 'stream' in response.__class__.__name__.lower():
125+
request_info['response'] += response.choices[0].delta.content
126+
else:
127+
request_info['response'] = response.choices[0].message.content
124128
if is_finished:
125129
if args.log_interval > 0:
126130
self.infer_stats.update(response)
127-
data = {'response': asdict(response), **request_info}
128131
if self.jsonl_writer:
129-
self.jsonl_writer.append(data)
132+
self.jsonl_writer.append(request_info)
130133
if self.args.verbose:
131-
logger.info(data)
134+
logger.info(request_info)
132135
return response
133136

134137
def _set_request_config(self, request_config) -> None:
@@ -157,7 +160,7 @@ async def create_chat_completion(self,
157160

158161
infer_request, request_config = request.parse()
159162
self._set_request_config(request_config)
160-
request_info = {'infer_request': infer_request.to_printable()}
163+
request_info = {'response': '', 'infer_request': infer_request.to_printable()}
161164

162165
def pre_infer_hook(kwargs):
163166
request_info['generation_config'] = kwargs['generation_config']

swift/llm/model/model/qwen.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -729,7 +729,7 @@ def update(self, key_states: torch.Tensor, value_states: torch.Tensor, layer_idx
729729
model_arch=ModelArch.ovis1_6,
730730
architectures=['Ovis'],
731731
tags=['vision'],
732-
requires=['transformers>=4.46.2'],
732+
requires=['transformers>=4.46.2', 'moviepy<2'],
733733
))
734734

735735
register_model(

0 commit comments

Comments
 (0)