Skip to content

GRPO训练报错:Fatal Python error: none_dealloc: deallocating None: bug likely caused by a refcount error in a C extension #3864

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
winni0 opened this issue Apr 14, 2025 · 23 comments
Assignees

Comments

@winni0
Copy link

winni0 commented Apr 14, 2025

Describe the bug
What the bug is, and how to reproduce, better with screenshots(描述bug以及复现过程,最好有截图)
部分报错信息:
INFO 04-12 20:17:34 worker.py:133] Sleep mode freed 36.81 GiB memory, 18.94 GiB memory is still in use. [170/1868]
INFO 04-12 20:17:34 executor_base.py:208] It took 0.316247 seconds to fall asleep.
INFO 04-12 20:17:34 worker.py:133] Sleep mode freed 36.81 GiB memory, 18.35 GiB memory is still in use.
INFO 04-12 20:17:34 executor_base.py:208] It took 0.316826 seconds to fall asleep.
{'loss': 0.02363645, 'grad_norm': 0.0, 'learning_rate': 9.2e-07, 'memory(GiB)': 54.45, 'train_speed(iter/s)': 0.073447, 'completion_length': 900.875, 'response_
clip_ratio': 0.375, 'rewards/Format': 0.475, 'rewards/RepetitionPenalty': 0.0, 'reward': 0.475, 'reward_std': 0.15773503, 'kl': 0.0, 'clip_ratio': 0.0, 'epoch':
0.23, 'global_step/max_steps': '5825/25499', 'percentage': '22.84%', 'elapsed_time': '22h 1m 1s', 'remaining_time': '3d 2h 21m 45s'}
Train: 23%|███████████████████████▉ | 5825/25499 [22:01:01<60:01:23, 10.98s/it]
INFO 04-12 20:17:37 executor_base.py:219] It took 0.193484 seconds to wake up.
INFO 04-12 20:17:37 executor_base.py:219] It took 0.209495 seconds to wake up.
INFO 04-12 20:17:37 executor_base.py:219] It took 0.194383 seconds to wake up.
INFO 04-12 20:17:37 executor_base.py:219] It took 0.206503 seconds to wake up.
INFO 04-12 20:17:37 prefix_caching_block.py:479] Successfully reset prefix cache
INFO 04-12 20:17:37 prefix_caching_block.py:479] Successfully reset prefix cache
INFO 04-12 20:17:37 prefix_caching_block.py:479] Successfully reset prefix cache
INFO 04-12 20:17:37 prefix_caching_block.py:479] Successfully reset prefix cache
INFO 04-12 20:17:37 prefix_caching_block.py:479] Successfully reset prefix cache
INFO 04-12 20:17:37 prefix_caching_block.py:479] Successfully reset prefix cache
INFO 04-12 20:17:37 prefix_caching_block.py:479] Successfully reset prefix cache
INFO 04-12 20:17:37 prefix_caching_block.py:479] Successfully reset prefix cache
Fatal Python error: none_dealloc: deallocating None: bug likely caused by a refcount error in a C extension
Python runtime state: initialized

Thread 0x00007f22fd2fb640 (most recent call first):

Thread 0x00007f22fdafc640 (most recent call first):

Thread 0x00007f22fcafa640 (most recent call first):

Thread 0x00007f22fe2fd640 (most recent call first):

Thread 0x00007f22d4fd1640 (most recent call first):
File "/usr/local/lib/python3.11/threading.py", line 327 in wait
File "/usr/local/lib/python3.11/multiprocessing/queues.py", line 231 in _feed
File "/usr/local/lib/python3.11/threading.py", line 982 in run
File "/usr/local/lib/python3.11/threading.py", line 1045 in _bootstrap_inner
File "/usr/local/lib/python3.11/threading.py", line 1002 in _bootstrap

Thread 0x00007f22d47d0640 (most recent call first):
File "/usr/local/lib/python3.11/threading.py", line 327 in wait
File "/usr/local/lib/python3.11/multiprocessing/queues.py", line 231 in _feed
File "/usr/local/lib/python3.11/threading.py", line 982 in run
File "/usr/local/lib/python3.11/threading.py", line 327 in wait [120/1868]
File "/usr/local/lib/python3.11/multiprocessing/queues.py", line 231 in _feed
File "/usr/local/lib/python3.11/threading.py", line 982 in run
File "/usr/local/lib/python3.11/threading.py", line 1045 in _bootstrap_inner
File "/usr/local/lib/python3.11/threading.py", line 1002 in _bootstrap

Thread 0x00007f22d37ce640 (most recent call first):
File "/usr/local/lib/python3.11/threading.py", line 327 in wait
File "/usr/local/lib/python3.11/multiprocessing/queues.py", line 231 in _feed
File "/usr/local/lib/python3.11/threading.py", line 982 in run
File "/usr/local/lib/python3.11/threading.py", line 1045 in _bootstrap_inner
File "/usr/local/lib/python3.11/threading.py", line 1002 in _bootstrap

Thread 0x00007f22ff0fe640 (most recent call first):
File "/usr/local/lib/python3.11/selectors.py", line 415 in select
File "/usr/local/lib/python3.11/multiprocessing/connection.py", line 948 in wait
File "/usr/local/lib/python3.11/multiprocessing/connection.py", line 440 in _poll
File "/usr/local/lib/python3.11/multiprocessing/connection.py", line 257 in poll
File "/usr/local/lib/python3.11/multiprocessing/queues.py", line 113 in get
File "/usr/local/lib/python3.11/site-packages/torch/utils/data/_utils/pin_memory.py", line 35 in do_one_step
File "/usr/local/lib/python3.11/site-packages/torch/utils/data/_utils/pin_memory.py", line 59 in _pin_memory_loop
File "/usr/local/lib/python3.11/threading.py", line 982 in run
File "/usr/local/lib/python3.11/threading.py", line 1045 in _bootstrap_inner
File "/usr/local/lib/python3.11/threading.py", line 1002 in _bootstrap

Thread 0x00007f2511fff640 (most recent call first):
File "/usr/local/lib/python3.11/site-packages/vllm/usage/usage_lib.py", line 220 in _report_continous_usage
File "/usr/local/lib/python3.11/site-packages/vllm/usage/usage_lib.py", line 163 in _report_usage_worker
File "/usr/local/lib/python3.11/threading.py", line 982 in run
File "/usr/local/lib/python3.11/threading.py", line 1045 in _bootstrap_inner
File "/usr/local/lib/python3.11/threading.py", line 1002 in _bootstrap

Thread 0x00007f2520ff9640 (most recent call first):
File "/usr/local/lib/python3.11/threading.py", line 331 in wait
File "/usr/local/lib/python3.11/threading.py", line 629 in wait
File "/usr/local/lib/python3.11/site-packages/tqdm/_monitor.py", line 60 in run
File "/usr/local/lib/python3.11/threading.py", line 1045 in _bootstrap_inner
File "/usr/local/lib/python3.11/threading.py", line 1002 in _bootstrap

Thread 0x00007f27b5a6f640 (most recent call first):
File "/usr/local/lib/python3.11/threading.py", line 331 in wait
File "/usr/local/lib/python3.11/threading.py", line 629 in wait
File "/usr/local/lib/python3.11/site-packages/tqdm/_monitor.py", line 60 in run
File "/usr/local/lib/python3.11/threading.py", line 1045 in _bootstrap_inner
File "/usr/local/lib/python3.11/threading.py", line 1002 in _bootstrap
Current thread 0x00007f2b06257e00 (most recent call first):
File "/usr/local/lib/python3.11/site-packages/vllm/core/scheduler.py", line 779 in _schedule_running
File "/usr/local/lib/python3.11/site-packages/vllm/core/scheduler.py", line 1244 in _schedule_default
File "/usr/local/lib/python3.11/site-packages/vllm/core/scheduler.py", line 1445 in _schedule
File "/usr/local/lib/python3.11/site-packages/vllm/core/scheduler.py", line 1486 in schedule
File "/usr/local/lib/python3.11/site-packages/swift/llm/infer/infer_engine/utils.py", line 612 in new_step
File "/usr/local/lib/python3.11/site-packages/vllm/entrypoints/llm.py", line 1397 in _run_engine
File "/usr/local/lib/python3.11/site-packages/vllm/entrypoints/llm.py", line 469 in generate
File "/usr/local/lib/python3.11/site-packages/vllm/utils.py", line 1057 in inner
File "/usr/local/lib/python3.11/site-packages/swift/llm/infer/infer_engine/grpo_vllm_engine.py", line 149 in infer
File "/usr/local/lib/python3.11/site-packages/swift/trainers/rlhf_trainer/grpo_trainer.py", line 637 in _infer_multi_turn
File "/usr/local/lib/python3.11/site-packages/swift/trainers/rlhf_trainer/grpo_trainer.py", line 775 in _fast_infer
File "/usr/local/lib/python3.11/site-packages/swift/trainers/rlhf_trainer/grpo_trainer.py", line 817 in _generate_and_score_completions
File "/usr/local/lib/python3.11/site-packages/trl/trainer/grpo_trainer.py", line 647 in _prepare_inputs
File "/usr/local/lib/python3.11/site-packages/trl/extras/profiling.py", line 87 in wrapper
File "/usr/local/lib/python3.11/site-packages/transformers/trainer.py", line 3669 in training_step
File "/usr/local/lib/python3.11/site-packages/swift/trainers/rlhf_trainer/grpo_trainer.py", line 1047 in training_step
File "/usr/local/lib/python3.11/site-packages/transformers/trainer.py", line 2531 in _inner_training_loop
File "/usr/local/lib/python3.11/site-packages/transformers/trainer.py", line 2171 in train
File "/usr/local/lib/python3.11/site-packages/swift/trainers/mixin.py", line 289 in train
File "/usr/local/lib/python3.11/site-packages/swift/llm/train/sft.py", line 202 in train
File "/usr/local/lib/python3.11/site-packages/swift/llm/train/sft.py", line 142 in run
File "/usr/local/lib/python3.11/site-packages/swift/llm/base.py", line 47 in main
File "/usr/local/lib/python3.11/site-packages/swift/llm/train/rlhf.py", line 98 in rlhf_main
File "/usr/local/lib/python3.11/site-packages/swift/cli/rlhf.py", line 5 in

Extension modules: numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common,
numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random.
_sfc64, numpy.random._generator, torch._C, torch._C._dynamo.autograd_compiler, torch._C._dynamo.eval_frame, torch._C._dynamo.guards, torch._C._dynamo.utils, tor
ch._C._fft, torch._C._linalg, torch._C._nested, torch._C._nn, torch._C._sparse, torch._C._special, zstandard.backend_c, charset_normalizer.md, simplejson._speed
ups, requests.packages.charset_normalizer.md, requests.packages.chardet.md, yaml._yaml, markupsafe._speedups, PIL._imaging, pyarrow.lib, pandas._libs.tslibs.cca
lendar, pandas._libs.tslibs.np_datetime, pandas._libs.tslibs.dtypes, pandas._libs.tslibs.base, pandas._libs.tslibs.nattype, pandas._libs.tslibs.timezones, panda
s._libs.tslibs.fields, pandas._libs.tslibs.timedeltas, pandas._libs.tslibs.tzconversion, pandas._libs.tslibs.timestamps, pandas._libs.properties, pandas._libs.t
slibs.offsets, pandas._libs.tslibs.strptime, pandas._libs.tslibs.parsing, pandas._libs.tslibs.conversion, pandas._libs.tslibs.period, pandas._libs.tslibs.vector
ized, pandas._libs.ops_dispatch, pandas._libs.missing, pandas._libs.hashtable, pandas._libs.algos, pandas._libs.interval, pandas._libs.lib, pyarrow._compute, pa
ndas._libs.ops, pandas._libs.hashing, pandas._libs.arrays, pandas._libs.tslib, pandas._libs.sparse, pandas._libs.internals, pandas._libs.indexing, pandas._libs.
index, pandas._libs.writers, pandas._libs.join, pandas._libs.window.aggregations, pandas._libs.window.indexers, pandas._libs.reshape, pandas._libs.groupby, pand
as._libs.json, pandas._libs.parsers, pandas._libs.testing, kiwisolver._cext, google._upb._message, _cffi_backend, pyarrow._parquet, pyarrow._fs, pyarrow._azuref
s, pyarrow._hdfs, pyarrow._gcsfs, pyarrow._s3fs, multidict._multidict, yarl._quoting_c, propcache._helpers_c, aiohttp._http_writer, aiohttp._http_parser, aiohtt
p._websocket.mask, aiohttp._websocket.reader_c, frozenlist._frozenlist, xxhash._xxhash, pyarrow._json, pyarrow._acero, pyarrow._csv, pyarrow._substrait, pyarrow
._dataset, pyarrow._dataset_orc, pyarrow._parquet_encryption, pyarrow._dataset_parquet_encryption, pyarrow._dataset_parquet, sklearn.__check_build._check_build,
psutil._psutil_linux, psutil._psutil_posix, scipy._lib._ccallback_c, scipy.sparse._sparsetools, _csparsetools, scipy.sparse._csparsetools, scipy.linalg._fblas,
scipy.linalg._flapack, scipy.linalg.cython_lapack, scipy.linalg._cythonized_array_utils, scipy.linalg._solve_toeplitz, scipy.linalg._decomp_lu_cython, scipy.li
nalg._matfuncs_sqrtm_triu, scipy.linalg._matfuncs_expm, scipy.linalg._linalg_pythran, scipy.linalg.cython_blas, scipy.linalg._decomp_update, scipy.sparse.linalg
._dsolve._superlu, scipy.sparse.linalg._eigen.arpack._arpack, scipy.sparse.linalg._propack._spropack, scipy.sparse.linalg._propack._dpropack, scipy.sparse.linal
root@k8s-gpu:~# tmux attach -t wangjuan1
frame, av.video.stream, av.codec.hwaccel, av.codec.codec, av.frame, av.audio.layout, av.audio.plane, av.audio.frame, av.audio.stream, av.filter.pad, av[10/1868]
ink, av.filter.context, av.filter.graph, av.filter.filter, av.filter.loudnorm, av.audio.resampler, av.audio.codeccontext, av.audio.fifo, av.bitstream, av.video.
codeccontext, regex._regex, scipy.io.matlab._mio_utils, scipy.io.matlab._streams, scipy.io.matlab._mio5_utils, msgspec._core, sentencepiece._sentencepiece, zmq.
backend.cython._zmq, msgpack._cmsgpack, setproctitle, uvloop.loop, ray._raylet, vllm.cumem_allocator, cuda_utils, __triton_launcher (total: 268)
W0412 20:17:38.097000 20231 usr/local/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/api.py:897] Sending process 20296 closing signal SI
GTERM
W0412 20:17:38.098000 20231 usr/local/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/api.py:897] Sending process 20298 closing signal SI
GTERM
W0412 20:17:38.100000 20231 usr/local/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/api.py:897] Sending process 20299 closing signal SI
GTERM
E0412 20:17:38.104000 20231 usr/local/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/api.py:869] failed (exitcode: -6) local_rank: 1 (pi
d: 20297) of binary: /usr/local/bin/python
Traceback (most recent call last):
File "", line 198, in _run_module_as_main
File "", line 88, in _run_code
File "/usr/local/lib/python3.11/site-packages/torch/distributed/run.py", line 923, in
main()
File "/usr/local/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 355, in wrapper
return f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/torch/distributed/run.py", line 919, in main
run(args)
File "/usr/local/lib/python3.11/site-packages/torch/distributed/run.py", line 910, in run
elastic_launch(
File "/usr/local/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 138, in call
return launch_agent(self._config, self._entrypoint, list(args))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 269, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

/usr/local/lib/python3.11/site-packages/swift/cli/rlhf.py FAILED

Failures:
<NO_OTHER_FAILURES>

Root Cause (first observed failure):
[0]:
time : 2025-04-12_20:17:38
host : k8s-gpu
rank : 1 (local_rank: 1)
exitcode : -6 (pid: 20297)
error_file: <N/A>
traceback : Signal 6 (SIGABRT) received by PID 20297

/usr/local/lib/python3.11/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up

Your hardware and system info
Write your system info like CUDA version/system/GPU/torch version here(在这里给出硬件信息和系统信息,如CUDA版本,系统,GPU型号和torch版本等)
用的官方镜像:modelscope-registry.us-west-1.cr.aliyuncs.com/modelscope-repo/modelscope:ubuntu22.04-cuda12.4.0-py311-torch2.5.1-modelscope1.25.0-swift3.2.2
cuda12.4 linux H100 torch2.5.1 4*80G

Additional context
Add any other context about the problem here(在这里补充其他信息)
用的是sft-lora之后的模型
执行命令:CUDA_VISIBLE_DEVICES=0,1,2,3
NPROC_PER_NODE=4
WANDB_API_KEY=XXX
swift rlhf
--rlhf_type grpo
--model /nfs/largemodel/wangjuan/outputs/Qwen2.5-7B-110K-sft5-0408/v1-20250408-062835/checkpoint-7360-merged
--train_type lora
--dataset '/nfs/largemodel/wangjuan/data/law_chinese1/DISC-Law-SFT-Pair-QA-released_alpaca.json' '/nfs/largemodel/wangjuan/data/law_chinese1/DISC-Law-SFT-Triplet-QA-released_alpaca.json'
--torch_dtype bfloat16
--num_train_epochs 1
--max_length 1024
--per_device_train_batch_size 4
--per_device_eval_batch_size 4
--gradient_accumulation_steps 8
--eval_steps 1000
--save_steps 1000
--learning_rate 1e-6
--save_total_limit 2
--logging_steps 5
--output_dir /nfs/largemodel/wangjuan/outputs/Qwen2.5-7B-swift-GRPO-0411
--warmup_ratio 0.05
--dataloader_num_workers 4
--max_completion_length 1024
--reward_funcs format repetition
--num_generations 4
--system '回复的格式如下:

...


...
'
--use_vllm true
--vllm_gpu_memory_utilization 0.5
--vllm_max_model_len 2048
--deepspeed zero3
--temperature 1.0
--top_p 1.0
--top_k 80
--log_completions true
--num_infer_workers 4
--tensor_parallel_size 2
--async_generate false
--move_model_batches 16
--offload_optimizer true
--offload_model true
--gc_collect_after_offload true
--report_to 'wandb'
--sleep_level 1

@hjh0119
Copy link
Collaborator

hjh0119 commented Apr 14, 2025

Can you try removing --move_model_batches 16? Will that cause any issues?

@winni0
Copy link
Author

winni0 commented Apr 16, 2025

Can you try removing --move_model_batches 16? Will that cause any issues?

我已经测试过,去掉--move_model_batches 16这个参数了,但还是会报这个bug

@MrRace
Copy link

MrRace commented Apr 17, 2025

same error when use GRPO

@zl-liu
Copy link

zl-liu commented Apr 19, 2025

I have the same error using examples/train/grpo/train_72b_4gpu.sh

@DogeWatch
Copy link

DogeWatch commented Apr 22, 2025

训练了6k步后遇到了类似的报错

Fatal Python error: Fatal Python error: none_dealloc: none_dealloc: deallocating None
Fatal Python error: deallocating None
Python runtime state: initializednone_dealloc: Python runtime state: initialized

deallocating None


Thread 0x00007f3b6f3f6700Python runtime state: initializedThread 0x00007f4c52bf6700Fatal Python error:  (most recent call first):
  <no Python frame>


 (most recent call first):
  <no Python frame>
none_dealloc:
Thread 0xThread 0x00007f36bf7f6700
Thread 0xdeallocating None
00007f3b71bf7700 (most recent call first):
 (most recent call first):
  <no Python frame>
00007f4c553f7700 (most recent call first):
Python runtime state: initialized  <no Python frame>


Thread 0x  <no Python frame>

Thread 0x00007f36bfff7700 (most recent call first):
Thread 0x00007f4c57bf8700Thread 0x00007f032b1f670000007f3b743f8700 (most recent call first):
  <no Python frame>

 (most recent call first):
  <no Python frame>
 (most recent call first):
  <no Python frame>
  <no Python frame>

Thread 0x00007f36c27f8700
Thread 0x
Thread 0xThread 0x (most recent call first):
  <no Python frame>
00007f4c5c3f9700 (most recent call first):
00007f032d9f7700 (most recent call first):
00007f3b76bf9700 (most recent call first):

Thread 0x  <no Python frame>

  <no Python frame>

  <no Python frame>

00007f36c4ff9700 (most recent call first):
Thread 0x00007f4c4dbf4700Thread 0x00007f03301f8700Thread 0x00007f3b839ff700  <no Python frame>

 (most recent call first):
  <no Python frame>
 (most recent call first):
  <no Python frame>
 (most recent call first):
  <no Python frame>
Thread 0x00007f36ceffd700

训练启动参数

wift rlhf '
                 f'--rlhf_type grpo '
                 f'--beta 0.0 '
                 f'--use_vllm true '
                 f'--vllm_gpu_memory_utilization 0.65 '
                 f'--vllm_max_model_len 10240 '
                 f'--num_infer_workers 8 '
                 f'--tensor_parallel_size 4 '
                 f'--async_generate false '
                 f'--model {model} --model_type {model_type} '
                 f'--dataset {args.data_path} '
                 f'--max_length {args.max_length} '
                 f'--max_completion_length {args.max_new_tokens} '
                 f'--num_train_epochs {args.train_epoch} '
                 f'--load_args false '
                 f'--train_type full '
                 f'--eval_strategy no '
                 f'--split_dataset_ratio 0 '
                 f'--per_device_train_batch_size {args.train_batch_size} '
                 f'--per_device_eval_batch_size 4 '
                 f'--mini_batch_size 2 '
                 f'--num_generations 32 '
                 f'--gradient_accumulation_steps {ga} '
                 f'--save_steps 1000 '
                 f'--eval_steps 1000 '
                 f'--save_total_limit 2 '
                 f'--logging_steps 100 '
                 f'--save_strategy steps '
                 f'--deepspeed zero3_offload  '
                 f'--sleep_level 1 '
                 f'--torch_dtype bfloat16 '
                 f'--learning_rate {args.train_lr} '
                 f'--temperature 1.0 '
                 f'--top_p 1.0 '
                 f'--top_k 50 '
                 f'--external_plugins functions/plugin.py '
                 f'--reward_funcs external_format_prm external_acc_orm '
                 f'--output_dir {args.save_path} '
                 f'--add_version false '
                 f'--attn_impl flash_attn '
                 f'--system "{args.system_prompt}" '
                 f'--logging_dir {tensorboard_dir} '
                 # DAPO
                 f'--epsilon_high 0.28 '
                 f'--dynamic_sample true '
                 f'--max_resample_times 3 '
                 f'--overlong_filter true ')

@hjh0119
Copy link
Collaborator

hjh0119 commented Apr 23, 2025

What is the version of vLLM that is causing the error?

@chengximeng67
Copy link

训练1000步报错,报错信息如下,100%可复现,续训就会在2000步再报错一次

{'loss': 0.00618663, 'grad_norm': 0.67167018, 'learning_rate': 9.3e-07, 'memory(GiB)': 45.7, 'train_speed(iter/s)': 0.029422, 'completions/mean_length': 651.25, 'completions/min_length': 597.0, 'completions/max_length': 710.0, 'completions/clipped_ratio': 0.0, 'rewards/MultiModal_Iou_Shaped/mean': 0.91099524, 'rewards/MultiModal_Iou_Shaped/std': 0.02873231, 'rewards/Consistency_Reward/mean': 0.91666669, 'rewards/Consistency_Reward/std': 0.15430333, 'rewards/Multimodal_Format/mean': 1.0, 'rewards/Multimodal_Format/std': 0.0, 'reward': 2.82766199, 'reward_std': 0.11247847, 'kl': 0.00708008, 'clip_ratio': 0.0, 'epoch': 0.18, 'global_step/max_steps': '1095/5932', 'percentage': '18.46%', 'elapsed_time': '10h 18m 14s', 'remaining_time': '1d 21h 31m 1s'}
Train:  18%|█████████████████████▏                                                                                             | 1095/5932 [10:18:14<28:13:59, 21.01s/it]INFO 04-25 15:32:28 [executor_base.py:226] It took 0.365433 seconds to wake up tags {'kv_cache', 'weights'}.
INFO 04-25 15:32:28 [executor_base.py:226] It took 0.416451 seconds to wake up tags {'weights', 'kv_cache'}.
INFO 04-25 15:32:28 [executor_base.py:226] It took 0.419207 seconds to wake up tags {'weights', 'kv_cache'}.
INFO 04-25 15:32:28 [executor_base.py:226] It took 0.423997 seconds to wake up tags {'weights', 'kv_cache'}.
Fatal Python error: none_dealloc: deallocating None
Python runtime state: initialized

Thread 0x000071e302a00640 (most recent call first):
  <no Python frame>

Thread 0x000071e2ffe00640 (most recent call first):
  <no Python frame>

Thread 0x000071e2ff400640 (most recent call first):
  <no Python frame>

Thread 0x000071e303400640 (most recent call first):
  <no Python frame>

Thread 0x000071e303e00640 (most recent call first):
  File "/data2/anaconda3/envs/chxm/lib/python3.10/threading.py", line 320 in wait
  File "/data2/anaconda3/envs/chxm/lib/python3.10/multiprocessing/queues.py", line 231 in _feed
  File "/data2/anaconda3/envs/chxm/lib/python3.10/threading.py", line 953 in run
  File "/data2/anaconda3/envs/chxm/lib/python3.10/threading.py", line 1016 in _bootstrap_inner
  File "/data2/anaconda3/envs/chxm/lib/python3.10/threading.py", line 973 in _bootstrap

Thread 0x000071e306a00640 (most recent call first):
  File "/data2/anaconda3/envs/chxm/lib/python3.10/threading.py", line 320 in wait
  File "/data2/anaconda3/envs/chxm/lib/python3.10/multiprocessing/queues.py", line 231 in _feed
  File "/data2/anaconda3/envs/chxm/lib/python3.10/threading.py", line 953 in run
  File "/data2/anaconda3/envs/chxm/lib/python3.10/threading.py", line 1016 in _bootstrap_inner
  File "/data2/anaconda3/envs/chxm/lib/python3.10/threading.py", line 973 in _bootstrap

Thread 0x000071e307400640 (most recent call first):
  File "/data2/anaconda3/envs/chxm/lib/python3.10/threading.py", line 320 in wait
  File "/data2/anaconda3/envs/chxm/lib/python3.10/multiprocessing/queues.py", line 231 in _feed
  File "/data2/anaconda3/envs/chxm/lib/python3.10/threading.py", line 953 in run
  File "/data2/anaconda3/envs/chxm/lib/python3.10/threading.py", line 1016 in _bootstrap_inner
  File "/data2/anaconda3/envs/chxm/lib/python3.10/threading.py", line 973 in _bootstrap

Thread 0x000071e307e00640 (most recent call first):
  File "/data2/anaconda3/envs/chxm/lib/python3.10/threading.py", line 320 in wait
  File "/data2/anaconda3/envs/chxm/lib/python3.10/multiprocessing/queues.py", line 231 in _feed
  File "/data2/anaconda3/envs/chxm/lib/python3.10/threading.py", line 953 in run
  File "/data2/anaconda3/envs/chxm/lib/python3.10/threading.py", line 1016 in _bootstrap_inner
  File "/data2/anaconda3/envs/chxm/lib/python3.10/threading.py", line 973 in _bootstrap

Thread 0x000071e30aa00640 (most recent call first):
  File "/data2/anaconda3/envs/chxm/lib/python3.10/selectors.py", line 416 in select
  File "/data2/anaconda3/envs/chxm/lib/python3.10/multiprocessing/connection.py", line 931 in wait
  File "/data2/anaconda3/envs/chxm/lib/python3.10/multiprocessing/connection.py", line 424 in _poll
  File "/data2/anaconda3/envs/chxm/lib/python3.10/multiprocessing/connection.py", line 257 in poll
  File "/data2/anaconda3/envs/chxm/lib/python3.10/multiprocessing/queues.py", line 113 in get
  File "/data2/anaconda3/envs/chxm/lib/python3.10/site-packages/torch/utils/data/_utils/pin_memory.py", line 35 in do_one_step
  File "/data2/anaconda3/envs/chxm/lib/python3.10/site-packages/torch/utils/data/_utils/pin_memory.py", line 59 in _pin_memory_loop
  File "/data2/anaconda3/envs/chxm/lib/python3.10/threading.py", line 953 in run
  File "/data2/anaconda3/envs/chxm/lib/python3.10/threading.py", line 1016 in _bootstrap_inner
  File "/data2/anaconda3/envs/chxm/lib/python3.10/threading.py", line 973 in _bootstrap

Thread 0x000071e30b400640 (most recent call first):
  File "/data2/anaconda3/envs/chxm/lib/python3.10/threading.py", line 324 in wait
  File "/data2/anaconda3/envs/chxm/lib/python3.10/threading.py", line 607 in wait
  File "/data2/anaconda3/envs/chxm/lib/python3.10/site-packages/tqdm/_monitor.py", line 60 in run
  File "/data2/anaconda3/envs/chxm/lib/python3.10/threading.py", line 1016 in _bootstrap_inner
  File "/data2/anaconda3/envs/chxm/lib/python3.10/threading.py", line 973 in _bootstrap

Thread 0x000071e30be00640 (most recent call first):
  File "/data2/anaconda3/envs/chxm/lib/python3.10/threading.py", line 324 in wait
  File "/data2/anaconda3/envs/chxm/lib/python3.10/queue.py", line 180 in get
  File "/data2/anaconda3/envs/chxm/lib/python3.10/site-packages/tensorboard/summary/writer/event_file_writer.py", line 269 in _run
  File "/data2/anaconda3/envs/chxm/lib/python3.10/site-packages/tensorboard/summary/writer/event_file_writer.py", line 244 in run
  File "/data2/anaconda3/envs/chxm/lib/python3.10/threading.py", line 1016 in _bootstrap_inner
  File "/data2/anaconda3/envs/chxm/lib/python3.10/threading.py", line 973 in _bootstrap

Thread 0x000071e311000640 (most recent call first):
  File "/data2/anaconda3/envs/chxm/lib/python3.10/site-packages/vllm/usage/usage_lib.py", line 220 in _report_continous_usage
  File "/data2/anaconda3/envs/chxm/lib/python3.10/site-packages/vllm/usage/usage_lib.py", line 163 in _report_usage_worker
  File "/data2/anaconda3/envs/chxm/lib/python3.10/threading.py", line 953 in run
  File "/data2/anaconda3/envs/chxm/lib/python3.10/threading.py", line 1016 in _bootstrap_inner
  File "/data2/anaconda3/envs/chxm/lib/python3.10/threading.py", line 973 in _bootstrap

Thread 0x000071e3dac00640 (most recent call first):
  File "/data2/anaconda3/envs/chxm/lib/python3.10/threading.py", line 324 in wait
  File "/data2/anaconda3/envs/chxm/lib/python3.10/threading.py", line 607 in wait
  File "/data2/anaconda3/envs/chxm/lib/python3.10/site-packages/tqdm/_monitor.py", line 60 in run
  File "/data2/anaconda3/envs/chxm/lib/python3.10/threading.py", line 1016 in _bootstrap_inner
  File "/data2/anaconda3/envs/chxm/lib/python3.10/threading.py", line 973 in _bootstrap

Current thread 0x000071e51e4af740 (most recent call first):
  File "/data2/anaconda3/envs/chxm/lib/python3.10/site-packages/vllm/core/scheduler.py", line 794 in _schedule_running
  File "/data2/anaconda3/envs/chxm/lib/python3.10/site-packages/vllm/core/scheduler.py", line 1259 in _schedule_default
  File "/data2/anaconda3/envs/chxm/lib/python3.10/site-packages/vllm/core/scheduler.py", line 1460 in _schedule
  File "/data2/anaconda3/envs/chxm/lib/python3.10/site-packages/vllm/core/scheduler.py", line 1501 in schedule
  File "/data2/anaconda3/envs/chxm/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 1375 in step
  File "/data2/anaconda3/envs/chxm/lib/python3.10/site-packages/vllm/entrypoints/llm.py", line 1409 in _run_engine
  File "/data2/anaconda3/envs/chxm/lib/python3.10/site-packages/vllm/entrypoints/llm.py", line 470 in generate
  File "/data2/anaconda3/envs/chxm/lib/python3.10/site-packages/vllm/utils.py", line 1134 in inner
  File "/data2/chxm/ms-swift-main/swift/llm/infer/infer_engine/grpo_vllm_engine.py", line 149 in infer
  File "/data2/chxm/ms-swift-main/swift/trainers/rlhf_trainer/grpo_trainer.py", line 599 in _infer_multi_turn
  File "/data2/chxm/ms-swift-main/swift/trainers/rlhf_trainer/grpo_trainer.py", line 759 in _fast_infer
  File "/data2/chxm/ms-swift-main/swift/trainers/rlhf_trainer/grpo_trainer.py", line 793 in _generate_completions
  File "/data2/chxm/ms-swift-main/swift/trainers/rlhf_trainer/grpo_trainer.py", line 823 in _generate_and_score_completions
  File "/data2/anaconda3/envs/chxm/lib/python3.10/site-packages/trl/trainer/grpo_trainer.py", line 647 in _prepare_inputs
  File "/data2/anaconda3/envs/chxm/lib/python3.10/site-packages/trl/extras/profiling.py", line 87 in wrapper
  File "/data2/anaconda3/envs/chxm/lib/python3.10/site-packages/transformers/trainer.py", line 3730 in training_step
  File "/data2/chxm/ms-swift-main/swift/trainers/rlhf_trainer/grpo_trainer.py", line 1147 in training_step
  File "/data2/anaconda3/envs/chxm/lib/python3.10/site-packages/transformers/trainer.py", line 2560 in _inner_training_loop
  File "/data2/anaconda3/envs/chxm/lib/python3.10/site-packages/transformers/trainer.py", line 2245 in train
  File "/data2/chxm/ms-swift-main/swift/trainers/mixin.py", line 294 in train
  File "/data2/chxm/ms-swift-main/swift/llm/train/sft.py", line 204 in train
  File "/data2/chxm/ms-swift-main/swift/llm/train/sft.py", line 144 in run
  File "/data2/chxm/ms-swift-main/swift/llm/base.py", line 47 in main
  File "/data2/chxm/ms-swift-main/swift/llm/train/rlhf.py", line 98 in rlhf_main
  File "/data2/chxm/ms-swift-main/swift/cli/rlhf.py", line 5 in <module>

Extension modules: numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, torch._C, torch._C._dynamo.autograd_compiler, torch._C._dynamo.eval_frame, torch._C._dynamo.guards, torch._C._dynamo.utils, torch._C._fft, torch._C._linalg, torch._C._nested, torch._C._nn, torch._C._sparse, torch._C._special, _brotli, zstandard.backend_c, charset_normalizer.md, simplejson._speedups, requests.packages.charset_normalizer.md, requests.packages.chardet.md, yaml._yaml, markupsafe._speedups, PIL._imaging, pyarrow.lib, pandas._libs.tslibs.ccalendar, pandas._libs.tslibs.np_datetime, pandas._libs.tslibs.dtypes, pandas._libs.tslibs.base, pandas._libs.tslibs.nattype, pandas._libs.tslibs.timezones, pandas._libs.tslibs.fields, pandas._libs.tslibs.timedeltas, pandas._libs.tslibs.tzconversion, pandas._libs.tslibs.timestamps, pandas._libs.properties, pandas._libs.tslibs.offsets, pandas._libs.tslibs.strptime, pandas._libs.tslibs.parsing, pandas._libs.tslibs.conversion, pandas._libs.tslibs.period, pandas._libs.tslibs.vectorized, pandas._libs.ops_dispatch, pandas._libs.missing, pandas._libs.hashtable, pandas._libs.algos, pandas._libs.interval, pandas._libs.lib, pyarrow._compute, pandas._libs.ops, pandas._libs.hashing, pandas._libs.arrays, pandas._libs.tslib, pandas._libs.sparse, pandas._libs.internals, pandas._libs.indexing, pandas._libs.index, pandas._libs.writers, pandas._libs.join, pandas._libs.window.aggregations, pandas._libs.window.indexers, pandas._libs.reshape, pandas._libs.groupby, pandas._libs.json, pandas._libs.parsers, pandas._libs.testing, kiwisolver._cext, google._upb._message, pyarrow._parquet, pyarrow._fs, pyarrow._azurefs, pyarrow._hdfs, pyarrow._gcsfs, pyarrow._s3fs, multidict._multidict, yarl._quoting_c, propcache._helpers_c, aiohttp._http_writer, aiohttp._http_parser, aiohttp._websocket.mask, aiohttp._websocket.reader_c, frozenlist._frozenlist, xxhash._xxhash, pyarrow._json, pyarrow._acero, pyarrow._csv, pyarrow._substrait, pyarrow._dataset, pyarrow._dataset_orc, pyarrow._parquet_encryption, pyarrow._dataset_parquet_encryption, pyarrow._dataset_parquet, sklearn.__check_build._check_build, psutil._psutil_linux, psutil._psutil_posix, scipy._lib._ccallback_c, scipy.sparse._sparsetools, _csparsetools, scipy.sparse._csparsetools, scipy.linalg._fblas, scipy.linalg._flapack, scipy.linalg.cython_lapack, scipy.linalg._cythonized_array_utils, scipy.linalg._solve_toeplitz, scipy.linalg._decomp_lu_cython, scipy.linalg._matfuncs_sqrtm_triu, scipy.linalg._matfuncs_expm, scipy.linalg._linalg_pythran, scipy.linalg.cython_blas, scipy.linalg._decomp_update, scipy.sparse.linalg._dsolve._superlu, scipy.sparse.linalg._eigen.arpack._arpack, scipy.sparse.linalg._propack._spropack, scipy.sparse.linalg._propack._dpropack, scipy.sparse.linalg._propack._cpropack, scipy.sparse.linalg._propack._zpropack, scipy.sparse.csgraph._tools, scipy.sparse.csgraph._shortest_path, scipy.sparse.csgraph._traversal, scipy.sparse.csgraph._min_spanning_tree, scipy.sparse.csgraph._flow, scipy.sparse.csgraph._matching, scipy.sparse.csgraph._reordering, scipy.special._ufuncs_cxx, scipy.special._ufuncs, scipy.special._specfun, scipy.special._comb, scipy.special._ellip_harm_2, scipy.spatial._ckdtree, scipy._lib.messagestream, scipy.spatial._qhull, scipy.spatial._voronoi, scipy.spatial._distance_wrap, scipy.spatial._hausdorff, scipy.spatial.transform._rotation, scipy.optimize._group_columns, scipy.optimize._trlib._trlib, scipy.optimize._lbfgsb, _moduleTNC, scipy.optimize._moduleTNC, scipy.optimize._cobyla, scipy.optimize._slsqp, scipy.optimize._minpack, scipy.optimize._lsq.givens_elimination, scipy.optimize._zeros, scipy.optimize._cython_nnls, scipy._lib._uarray._uarray, scipy.linalg._decomp_interpolative, scipy.optimize._bglu_dense, scipy.optimize._lsap, scipy.optimize._direct, scipy.integrate._odepack, scipy.integrate._quadpack, scipy.integrate._vode, scipy.integrate._dop, scipy.integrate._lsoda, scipy.interpolate._fitpack, scipy.interpolate._dfitpack, scipy.interpolate._dierckx, scipy.interpolate._ppoly, scipy.interpolate._interpnd, scipy.interpolate._rbfinterp_pythran, scipy.interpolate._rgi_cython, scipy.interpolate._bspl, scipy.special.cython_special, scipy.stats._stats, scipy.stats._sobol, scipy.stats._qmc_cy, scipy.stats._biasedurn, scipy.stats._stats_pythran, scipy.stats._levy_stable.levyst, scipy.stats._ansari_swilk_statistics, scipy.stats._mvn, scipy.stats._rcont.rcont, scipy.ndimage._nd_image, scipy.ndimage._rank_filter_1d, _ni_label, scipy.ndimage._ni_label, sklearn.utils._isfinite, sklearn.utils.sparsefuncs_fast, sklearn.utils.murmurhash, sklearn.utils._openmp_helpers, sklearn.metrics.cluster._expected_mutual_info_fast, sklearn.preprocessing._csr_polynomial_expansion, sklearn.preprocessing._target_encoder_fast, sklearn.metrics._dist_metrics, sklearn.metrics._pairwise_distances_reduction._datasets_pair, sklearn.utils._cython_blas, sklearn.metrics._pairwise_distances_reduction._base, sklearn.metrics._pairwise_distances_reduction._middle_term_computer, sklearn.utils._heap, sklearn.utils._sorting, sklearn.metrics._pairwise_distances_reduction._argkmin, sklearn.metrics._pairwise_distances_reduction._argkmin_classmode, sklearn.utils._vector_sentinel, sklearn.metrics._pairwise_distances_reduction._radius_neighbors, sklearn.metrics._pairwise_distances_reduction._radius_neighbors_classmode, sklearn.metrics._pairwise_fast, PIL._imagingft, av._core, av.logging, av.bytesource, av.buffer, av.audio.format, av.error, av.dictionary, av.container.pyio, av.utils, av.option, av.descriptor, av.format, av.stream, av.container.streams, av.sidedata.motionvectors, av.sidedata.sidedata, av.opaque, av.packet, av.container.input, av.container.output, av.container.core, av.codec.context, av.video.format, av.video.reformatter, av.plane, av.video.plane, av.video.frame, av.video.stream, av.codec.hwaccel, av.codec.codec, av.frame, av.audio.layout, av.audio.plane, av.audio.frame, av.audio.stream, av.filter.pad, av.filter.link, av.filter.context, av.filter.graph, av.filter.filter, av.filter.loudnorm, av.audio.resampler, av.audio.codeccontext, av.audio.fifo, av.bitstream, av.video.codeccontext, cuda_utils, msgpack._cmsgpack, regex._regex, scipy.io.matlab._mio_utils, scipy.io.matlab._streams, scipy.io.matlab._mio5_utils, zmq.backend.cython._zmq, msgspec._core, setproctitle, uvloop.loop, ray._raylet, vllm.cumem_allocator, sentencepiece._sentencepiece, __triton_launcher, PIL._imagingmath (total: 269)
W0425 15:32:31.625000 1224982 site-packages/torch/distributed/elastic/multiprocessing/api.py:897] Sending process 1225055 closing signal SIGTERM
W0425 15:32:31.628000 1224982 site-packages/torch/distributed/elastic/multiprocessing/api.py:897] Sending process 1225056 closing signal SIGTERM
W0425 15:32:31.629000 1224982 site-packages/torch/distributed/elastic/multiprocessing/api.py:897] Sending process 1225057 closing signal SIGTERM
E0425 15:32:37.269000 1224982 site-packages/torch/distributed/elastic/multiprocessing/api.py:869] failed (exitcode: -6) local_rank: 0 (pid: 1225054) of binary: /data2/anaconda3/envs/chxm/bin/python
Traceback (most recent call last):
  File "/data2/anaconda3/envs/chxm/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/data2/anaconda3/envs/chxm/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/data2/anaconda3/envs/chxm/lib/python3.10/site-packages/torch/distributed/run.py", line 922, in <module>
    main()
  File "/data2/anaconda3/envs/chxm/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 355, in wrapper
    return f(*args, **kwargs)
  File "/data2/anaconda3/envs/chxm/lib/python3.10/site-packages/torch/distributed/run.py", line 918, in main
    run(args)
  File "/data2/anaconda3/envs/chxm/lib/python3.10/site-packages/torch/distributed/run.py", line 909, in run
    elastic_launch(
  File "/data2/anaconda3/envs/chxm/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 138, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/data2/anaconda3/envs/chxm/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 269, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
========================================================
/data2/chxm/ms-swift-main/swift/cli/rlhf.py FAILED
--------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
--------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2025-04-25_15:32:31
  host      : ubuntn
  rank      : 0 (local_rank: 0)
  exitcode  : -6 (pid: 1225054)
  error_file: <N/A>
  traceback : Signal 6 (SIGABRT) received by PID 1225054
========================================================
(chxm) (base) member@ubuntn:~$ pip list
Package                                  Version        Editable project location
---------------------------------------- -------------- -------------------------
absl-py                                  2.2.2
accelerate                               1.6.0
addict                                   2.4.0
aiofiles                                 24.1.0
aiohappyeyeballs                         2.6.1
aiohttp                                  3.11.16
aiosignal                                1.3.2
airportsdata                             20250224
aliyun-python-sdk-core                   2.16.0
aliyun-python-sdk-kms                    2.16.5
altair                                   5.5.0
annotated-types                          0.7.0
antlr4-python3-runtime                   4.7.2
anyio                                    4.9.0
argon2-cffi                              23.1.0
argon2-cffi-bindings                     21.2.0
arrow                                    1.3.0
arxiv                                    2.2.0
astor                                    0.8.1
asttokens                                3.0.0
async-lru                                2.0.5
async-timeout                            5.0.1
attrdict                                 2.0.1
attrs                                    25.3.0
auto_gptq                                0.7.1
av                                       14.3.0
babel                                    2.17.0
beautifulsoup4                           4.13.3
binpacking                               1.5.2
bitsandbytes                             0.45.5
blake3                                   1.0.4
bleach                                   6.2.0
blinker                                  1.9.0
boto3                                    1.37.32
botocore                                 1.37.32
Brotli                                   1.1.0
cachetools                               5.5.2
certifi                                  2025.1.31
cffi                                     1.17.1
charset-normalizer                       3.4.1
click                                    8.1.8
cloudpickle                              3.1.1
colorama                                 0.4.6
comm                                     0.2.2
compressed-tensors                       0.9.3
contourpy                                1.3.1
cpm-kernels                              1.0.11
crcmod                                   1.7
cryptography                             43.0.3
cupy-cuda12x                             13.4.1
cycler                                   0.12.1
dacite                                   1.9.2
datasets                                 3.2.0
debugpy                                  1.8.14
decorator                                5.2.1
decord                                   0.6.0
deepspeed                                0.16.5
defusedxml                               0.7.1
Deprecated                               1.2.18
depyf                                    0.18.0
dill                                     0.3.8
diskcache                                5.6.3
distro                                   1.9.0
dnspython                                2.7.0
docker-pycreds                           0.4.0
duckduckgo_search                        5.3.1b1
einops                                   0.8.1
email_validator                          2.2.0
et_xmlfile                               2.0.0
evalscope                                0.14.0
evaluate                                 0.4.3
exceptiongroup                           1.2.2
executing                                2.2.0
fastapi                                  0.115.12
fastapi-cli                              0.0.7
fastjsonschema                           2.21.1
fastrlock                                0.8.3
feedparser                               6.0.11
ffmpy                                    0.5.0
filelock                                 3.18.0
fire                                     0.7.0
flash_attn                               2.7.4.post1
fonttools                                4.57.0
fqdn                                     1.5.1
frozenlist                               1.5.0
fsspec                                   2024.9.0
func_timeout                             4.3.5
future                                   1.0.0
fuzzywuzzy                               0.18.0
gekko                                    1.3.0
gguf                                     0.14.0
gitdb                                    4.0.12
GitPython                                3.1.44
google-ai-generativelanguage             0.6.15
google-api-core                          2.24.2
google-api-python-client                 2.167.0
google-auth                              2.39.0
google-auth-httplib2                     0.2.0
google-generativeai                      0.8.5
googleapis-common-protos                 1.70.0
gradio                                   5.24.0
gradio_client                            1.8.0
griffe                                   0.49.0
groovy                                   0.1.2
grpcio                                   1.71.0
grpcio-status                            1.71.0
h11                                      0.14.0
h2                                       4.2.0
h5py                                     3.13.0
hf-xet                                   1.0.3
hjson                                    3.1.0
hpack                                    4.1.0
httpcore                                 1.0.7
httplib2                                 0.22.0
httptools                                0.6.4
httpx                                    0.28.1
huggingface-hub                          0.30.2
human-eval                               1.0.3
hyperframe                               6.1.0
idna                                     3.10
imageio                                  2.37.0
immutabledict                            4.2.1
importlib_metadata                       8.0.0
interegular                              0.3.3
ipykernel                                6.29.5
ipython                                  8.35.0
ipywidgets                               8.1.6
isoduration                              20.11.0
jedi                                     0.19.2
jieba                                    0.42.1
Jinja2                                   3.1.6
jiter                                    0.9.0
jmespath                                 0.10.0
joblib                                   1.4.2
json5                                    0.12.0
jsonlines                                4.0.0
jsonpointer                              3.0.0
jsonschema                               4.23.0
jsonschema-specifications                2024.10.1
jupyter                                  1.1.1
jupyter_client                           8.6.3
jupyter-console                          6.6.3
jupyter_core                             5.7.2
jupyter-events                           0.12.0
jupyter-lsp                              2.2.5
jupyter_server                           2.15.0
jupyter_server_terminals                 0.5.3
jupyterlab                               4.4.0
jupyterlab_pygments                      0.3.0
jupyterlab_server                        2.27.3
jupyterlab_widgets                       3.0.14
kiwisolver                               1.4.8
lagent                                   0.2.4
langdetect                               1.0.9
lark                                     1.2.2
latex2sympy2                             1.9.1
latex2sympy2_extended                    1.10.1
lazy_loader                              0.4
Levenshtein                              0.27.1
lightning-utilities                      0.14.3
llguidance                               0.7.16
llvmlite                                 0.44.0
lm-format-enforcer                       0.10.11
lxml                                     5.3.2
Markdown                                 3.7
markdown-it-py                           3.0.0
MarkupSafe                               3.0.2
math-verify                              0.7.0
matplotlib                               3.10.1
matplotlib-inline                        0.1.7
mdurl                                    0.1.2
mistral_common                           1.5.4
mistune                                  3.1.3
mmengine                                 0.10.7
mmengine-lite                            0.10.7
modelscope                               1.25.0
mpmath                                   1.3.0
ms-opencompass                           0.1.6
ms_swift                                 3.4.0.dev0     /data2/chxm/ms-swift-main
ms-vlmeval                               0.0.16
msgpack                                  1.1.0
msgspec                                  0.19.0
multidict                                6.4.3
multiprocess                             0.70.16
narwhals                                 1.34.1
nbclient                                 0.10.2
nbconvert                                7.16.6
nbformat                                 5.10.4
nest-asyncio                             1.6.0
networkx                                 3.4.2
ninja                                    1.11.1.4
nltk                                     3.9.1
notebook                                 7.4.0
notebook_shim                            0.2.4
numba                                    0.61.2
numpy                                    1.26.4
nvidia-cublas-cu12                       12.4.5.8
nvidia-cuda-cupti-cu12                   12.4.127
nvidia-cuda-nvrtc-cu12                   12.4.127
nvidia-cuda-runtime-cu12                 12.4.127
nvidia-cudnn-cu12                        9.1.0.70
nvidia-cufft-cu12                        11.2.1.3
nvidia-curand-cu12                       10.3.5.147
nvidia-cusolver-cu12                     11.6.1.9
nvidia-cusparse-cu12                     12.3.1.170
nvidia-cusparselt-cu12                   0.6.2
nvidia-ml-py                             12.570.86
nvidia-nccl-cu12                         2.21.5
nvidia-nvjitlink-cu12                    12.4.127
nvidia-nvtx-cu12                         12.4.127
nvitop                                   1.4.2
omegaconf                                2.0.0
openai                                   1.72.0
OpenCC                                   1.1.9
opencv-python                            4.11.0.86
opencv-python-headless                   4.11.0.86
openpyxl                                 3.1.5
opentelemetry-api                        1.26.0
opentelemetry-exporter-otlp              1.26.0
opentelemetry-exporter-otlp-proto-common 1.26.0
opentelemetry-exporter-otlp-proto-grpc   1.26.0
opentelemetry-exporter-otlp-proto-http   1.26.0
opentelemetry-proto                      1.26.0
opentelemetry-sdk                        1.26.0
opentelemetry-semantic-conventions       0.47b0
opentelemetry-semantic-conventions-ai    0.4.3
orjson                                   3.10.16
oss2                                     2.19.1
outlines                                 0.1.11
outlines_core                            0.1.26
overrides                                7.7.0
packaging                                24.2
pandas                                   2.2.3
pandocfilters                            1.5.1
parso                                    0.8.4
partial-json-parser                      0.2.1.1.post5
peft                                     0.14.0
pexpect                                  4.9.0
phx-class-registry                       4.1.0
pillow                                   11.1.0
pip                                      25.0
platformdirs                             4.3.7
portalocker                              3.1.1
prettytable                              3.16.0
prometheus_client                        0.21.1
prometheus-fastapi-instrumentator        7.1.0
prompt_toolkit                           3.0.50
propcache                                0.3.1
proto-plus                               1.26.1
protobuf                                 5.29.4
psutil                                   7.0.0
ptyprocess                               0.7.0
pure_eval                                0.2.3
py-cpuinfo                               9.0.0
pyarrow                                  19.0.1
pyasn1                                   0.6.1
pyasn1_modules                           0.4.2
pycocotools                              2.0.8
pycountry                                24.6.1
pycparser                                2.22
pycryptodome                             3.22.0
pydantic                                 2.11.3
pydantic_core                            2.33.1
pydeck                                   0.9.1
pydub                                    0.25.1
Pygments                                 2.19.1
pynvml                                   12.0.0
pyparsing                                3.2.3
pypinyin                                 0.54.0
python-dateutil                          2.9.0.post0
python-dotenv                            1.1.0
python-json-logger                       3.3.0
python-Levenshtein                       0.27.1
python-multipart                         0.0.20
pytz                                     2025.2
PyYAML                                   6.0.2
pyzmq                                    26.4.0
qwen-vl-utils                            0.0.10
rank-bm25                                0.2.2
RapidFuzz                                3.13.0
ray                                      2.43.0
referencing                              0.36.2
regex                                    2024.11.6
requests                                 2.32.3
rfc3339-validator                        0.1.4
rfc3986-validator                        0.1.1
rich                                     14.0.0
rich-toolkit                             0.14.1
rouge                                    1.0.1
rouge-chinese                            1.0.3
rouge_score                              0.1.2
rpds-py                                  0.24.0
rsa                                      4.9.1
ruff                                     0.11.5
s3transfer                               0.11.4
sacrebleu                                2.5.1
safehttpx                                0.1.6
safetensors                              0.5.3
scikit-image                             0.25.2
scikit-learn                             1.6.1
scipy                                    1.15.2
seaborn                                  0.13.2
semantic-version                         2.10.0
Send2Trash                               1.8.3
sentence-transformers                    4.0.2
sentencepiece                            0.2.0
sentry-sdk                               2.26.1
setproctitle                             1.3.5
setuptools                               69.5.1
sgmllib3k                                1.0.0
shellingham                              1.5.4
simplejson                               3.20.1
six                                      1.17.0
smmap                                    5.0.2
sniffio                                  1.3.1
socksio                                  1.0.0
sortedcontainers                         2.4.0
soupsieve                                2.6
stack-data                               0.6.3
starlette                                0.46.1
streamlit                                1.44.1
sty                                      1.0.6
swankit                                  0.1.7
swanlab                                  0.5.5
sympy                                    1.13.1
tabulate                                 0.9.0
tenacity                                 9.1.2
tensorboard                              2.19.0
tensorboard-data-server                  0.7.2
termcolor                                3.0.1
terminado                                0.18.1
threadpoolctl                            3.6.0
tifffile                                 2025.3.30
tiktoken                                 0.9.0
timeout-decorator                        0.5.0
tinycss2                                 1.4.0
tokenizers                               0.21.1
toml                                     0.10.2
tomli                                    2.2.1
tomlkit                                  0.13.2
torch                                    2.6.0
torchaudio                               2.6.0
torchmetrics                             1.7.1
torchvision                              0.21.0
tornado                                  6.4.2
tqdm                                     4.67.1
traitlets                                5.14.3
transformers                             4.51.3
transformers-stream-generator            0.0.5
triton                                   3.2.0
trl                                      0.16.1
typer                                    0.15.2
types-python-dateutil                    2.9.0.20241206
typing_extensions                        4.13.2
typing-inspection                        0.4.0
tzdata                                   2025.2
uri-template                             1.3.0
uritemplate                              4.1.1
urllib3                                  2.4.0
uvicorn                                  0.34.0
uvloop                                   0.21.0
validators                               0.34.0
vllm                                     0.8.4
volcengine-python-sdk                    1.1.5
wandb                                    0.19.9
watchdog                                 6.0.0
watchfiles                               1.0.5
wcwidth                                  0.2.13
webcolors                                24.11.1
webencodings                             0.5.1
websocket-client                         1.8.0
websockets                               15.0.1
Werkzeug                                 3.1.3
wheel                                    0.45.1
widgetsnbextension                       4.0.14
word2number                              1.1
wrapt                                    1.17.2
xformers                                 0.0.29.post2
xgrammar                                 0.1.18
XlsxWriter                               3.2.2
xtuner                                   0.1.23
xxhash                                   3.5.0
yapf                                     0.43.0
yarl                                     1.19.0
zipp                                     3.21.0
zstandard                                0.23.0
(chxm) (base) member@ubuntn:~$ conda list
# packages in environment at /data2/anaconda3/envs/chxm:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                        main  
_openmp_mutex             5.1                       1_gnu  
absl-py                   2.2.2                    pypi_0    pypi
accelerate                1.6.0                    pypi_0    pypi
addict                    2.4.0                    pypi_0    pypi
aiofiles                  24.1.0                   pypi_0    pypi
aiohappyeyeballs          2.6.1                    pypi_0    pypi
aiohttp                   3.11.16                  pypi_0    pypi
aiosignal                 1.3.2                    pypi_0    pypi
airportsdata              20250224                 pypi_0    pypi
aliyun-python-sdk-core    2.16.0                   pypi_0    pypi
aliyun-python-sdk-kms     2.16.5                   pypi_0    pypi
altair                    5.5.0                    pypi_0    pypi
annotated-types           0.7.0                    pypi_0    pypi
antlr4-python3-runtime    4.7.2                    pypi_0    pypi
anyio                     4.9.0                    pypi_0    pypi
argon2-cffi               23.1.0                   pypi_0    pypi
argon2-cffi-bindings      21.2.0                   pypi_0    pypi
arrow                     1.3.0                    pypi_0    pypi
arxiv                     2.2.0                    pypi_0    pypi
astor                     0.8.1                    pypi_0    pypi
asttokens                 3.0.0                    pypi_0    pypi
async-lru                 2.0.5                    pypi_0    pypi
async-timeout             5.0.1                    pypi_0    pypi
attrdict                  2.0.1                    pypi_0    pypi
attrs                     25.3.0                   pypi_0    pypi
auto-gptq                 0.7.1                    pypi_0    pypi
av                        14.3.0                   pypi_0    pypi
babel                     2.17.0                   pypi_0    pypi
beautifulsoup4            4.13.3                   pypi_0    pypi
binpacking                1.5.2                    pypi_0    pypi
bitsandbytes              0.45.5                   pypi_0    pypi
blake3                    1.0.4                    pypi_0    pypi
bleach                    6.2.0                    pypi_0    pypi
blinker                   1.9.0                    pypi_0    pypi
boto3                     1.37.32                  pypi_0    pypi
botocore                  1.37.32                  pypi_0    pypi
brotli                    1.1.0                    pypi_0    pypi
bzip2                     1.0.8                h5eee18b_6  
ca-certificates           2025.2.25            h06a4308_0  
cachetools                5.5.2                    pypi_0    pypi
certifi                   2025.1.31                pypi_0    pypi
cffi                      1.17.1                   pypi_0    pypi
charset-normalizer        3.4.1                    pypi_0    pypi
click                     8.1.8                    pypi_0    pypi
cloudpickle               3.1.1                    pypi_0    pypi
colorama                  0.4.6                    pypi_0    pypi
comm                      0.2.2                    pypi_0    pypi
compressed-tensors        0.9.3                    pypi_0    pypi
contourpy                 1.3.1                    pypi_0    pypi
cpm-kernels               1.0.11                   pypi_0    pypi
crcmod                    1.7                      pypi_0    pypi
cryptography              43.0.3                   pypi_0    pypi
cupy-cuda12x              13.4.1                   pypi_0    pypi
cycler                    0.12.1                   pypi_0    pypi
dacite                    1.9.2                    pypi_0    pypi
datasets                  3.2.0                    pypi_0    pypi
debugpy                   1.8.14                   pypi_0    pypi
decorator                 5.2.1                    pypi_0    pypi
decord                    0.6.0                    pypi_0    pypi
deepspeed                 0.16.5                   pypi_0    pypi
defusedxml                0.7.1                    pypi_0    pypi
deprecated                1.2.18                   pypi_0    pypi
depyf                     0.18.0                   pypi_0    pypi
dill                      0.3.8                    pypi_0    pypi
diskcache                 5.6.3                    pypi_0    pypi
distro                    1.9.0                    pypi_0    pypi
dnspython                 2.7.0                    pypi_0    pypi
docker-pycreds            0.4.0                    pypi_0    pypi
duckduckgo-search         5.3.1b1                  pypi_0    pypi
einops                    0.8.1                    pypi_0    pypi
email-validator           2.2.0                    pypi_0    pypi
et-xmlfile                2.0.0                    pypi_0    pypi
evalscope                 0.14.0                   pypi_0    pypi
evaluate                  0.4.3                    pypi_0    pypi
exceptiongroup            1.2.2                    pypi_0    pypi
executing                 2.2.0                    pypi_0    pypi
fastapi                   0.115.12                 pypi_0    pypi
fastapi-cli               0.0.7                    pypi_0    pypi
fastjsonschema            2.21.1                   pypi_0    pypi
fastrlock                 0.8.3                    pypi_0    pypi
feedparser                6.0.11                   pypi_0    pypi
ffmpy                     0.5.0                    pypi_0    pypi
filelock                  3.18.0                   pypi_0    pypi
fire                      0.7.0                    pypi_0    pypi
flash-attn                2.7.4.post1              pypi_0    pypi
fonttools                 4.57.0                   pypi_0    pypi
fqdn                      1.5.1                    pypi_0    pypi
frozenlist                1.5.0                    pypi_0    pypi
fsspec                    2024.9.0                 pypi_0    pypi
func-timeout              4.3.5                    pypi_0    pypi
future                    1.0.0                    pypi_0    pypi
fuzzywuzzy                0.18.0                   pypi_0    pypi
gekko                     1.3.0                    pypi_0    pypi
gguf                      0.14.0                   pypi_0    pypi
gitdb                     4.0.12                   pypi_0    pypi
gitpython                 3.1.44                   pypi_0    pypi
google-ai-generativelanguage 0.6.15                   pypi_0    pypi
google-api-core           2.24.2                   pypi_0    pypi
google-api-python-client  2.167.0                  pypi_0    pypi
google-auth               2.39.0                   pypi_0    pypi
google-auth-httplib2      0.2.0                    pypi_0    pypi
google-generativeai       0.8.5                    pypi_0    pypi
googleapis-common-protos  1.70.0                   pypi_0    pypi
gradio                    5.24.0                   pypi_0    pypi
gradio-client             1.8.0                    pypi_0    pypi
griffe                    0.49.0                   pypi_0    pypi
groovy                    0.1.2                    pypi_0    pypi
grpcio                    1.71.0                   pypi_0    pypi
grpcio-status             1.71.0                   pypi_0    pypi
h11                       0.14.0                   pypi_0    pypi
h2                        4.2.0                    pypi_0    pypi
h5py                      3.13.0                   pypi_0    pypi
hf-xet                    1.0.3                    pypi_0    pypi
hjson                     3.1.0                    pypi_0    pypi
hpack                     4.1.0                    pypi_0    pypi
httpcore                  1.0.7                    pypi_0    pypi
httplib2                  0.22.0                   pypi_0    pypi
httptools                 0.6.4                    pypi_0    pypi
httpx                     0.28.1                   pypi_0    pypi
huggingface-hub           0.30.2                   pypi_0    pypi
human-eval                1.0.3                    pypi_0    pypi
hyperframe                6.1.0                    pypi_0    pypi
idna                      3.10                     pypi_0    pypi
imageio                   2.37.0                   pypi_0    pypi
immutabledict             4.2.1                    pypi_0    pypi
importlib-metadata        8.0.0                    pypi_0    pypi
interegular               0.3.3                    pypi_0    pypi
ipykernel                 6.29.5                   pypi_0    pypi
ipython                   8.35.0                   pypi_0    pypi
ipywidgets                8.1.6                    pypi_0    pypi
isoduration               20.11.0                  pypi_0    pypi
jedi                      0.19.2                   pypi_0    pypi
jieba                     0.42.1                   pypi_0    pypi
jinja2                    3.1.6                    pypi_0    pypi
jiter                     0.9.0                    pypi_0    pypi
jmespath                  0.10.0                   pypi_0    pypi
joblib                    1.4.2                    pypi_0    pypi
json5                     0.12.0                   pypi_0    pypi
jsonlines                 4.0.0                    pypi_0    pypi
jsonpointer               3.0.0                    pypi_0    pypi
jsonschema                4.23.0                   pypi_0    pypi
jsonschema-specifications 2024.10.1                pypi_0    pypi
jupyter                   1.1.1                    pypi_0    pypi
jupyter-client            8.6.3                    pypi_0    pypi
jupyter-console           6.6.3                    pypi_0    pypi
jupyter-core              5.7.2                    pypi_0    pypi
jupyter-events            0.12.0                   pypi_0    pypi
jupyter-lsp               2.2.5                    pypi_0    pypi
jupyter-server            2.15.0                   pypi_0    pypi
jupyter-server-terminals  0.5.3                    pypi_0    pypi
jupyterlab                4.4.0                    pypi_0    pypi
jupyterlab-pygments       0.3.0                    pypi_0    pypi
jupyterlab-server         2.27.3                   pypi_0    pypi
jupyterlab-widgets        3.0.14                   pypi_0    pypi
kiwisolver                1.4.8                    pypi_0    pypi
lagent                    0.2.4                    pypi_0    pypi
langdetect                1.0.9                    pypi_0    pypi
lark                      1.2.2                    pypi_0    pypi
latex2sympy2              1.9.1                    pypi_0    pypi
latex2sympy2-extended     1.10.1                   pypi_0    pypi
lazy-loader               0.4                      pypi_0    pypi
ld_impl_linux-64          2.40                 h12ee557_0  
levenshtein               0.27.1                   pypi_0    pypi
libffi                    3.4.4                h6a678d5_1  
libgcc-ng                 11.2.0               h1234567_1  
libgomp                   11.2.0               h1234567_1  
libstdcxx-ng              11.2.0               h1234567_1  
libuuid                   1.41.5               h5eee18b_0  
lightning-utilities       0.14.3                   pypi_0    pypi
llguidance                0.7.16                   pypi_0    pypi
llvmlite                  0.44.0                   pypi_0    pypi
lm-format-enforcer        0.10.11                  pypi_0    pypi
lxml                      5.3.2                    pypi_0    pypi
markdown                  3.7                      pypi_0    pypi
markdown-it-py            3.0.0                    pypi_0    pypi
markupsafe                3.0.2                    pypi_0    pypi
math-verify               0.7.0                    pypi_0    pypi
matplotlib                3.10.1                   pypi_0    pypi
matplotlib-inline         0.1.7                    pypi_0    pypi
mdurl                     0.1.2                    pypi_0    pypi
mistral-common            1.5.4                    pypi_0    pypi
mistune                   3.1.3                    pypi_0    pypi
mmengine                  0.10.7                   pypi_0    pypi
mmengine-lite             0.10.7                   pypi_0    pypi
modelscope                1.25.0                   pypi_0    pypi
mpmath                    1.3.0                    pypi_0    pypi
ms-opencompass            0.1.6                    pypi_0    pypi
ms-swift                  3.4.0.dev0                dev_0    <develop>
ms-vlmeval                0.0.16                   pypi_0    pypi
msgpack                   1.1.0                    pypi_0    pypi
msgspec                   0.19.0                   pypi_0    pypi
multidict                 6.4.3                    pypi_0    pypi
multiprocess              0.70.16                  pypi_0    pypi
narwhals                  1.34.1                   pypi_0    pypi
nbclient                  0.10.2                   pypi_0    pypi
nbconvert                 7.16.6                   pypi_0    pypi
nbformat                  5.10.4                   pypi_0    pypi
ncurses                   6.4                  h6a678d5_0  
nest-asyncio              1.6.0                    pypi_0    pypi
networkx                  3.4.2                    pypi_0    pypi
ninja                     1.11.1.4                 pypi_0    pypi
nltk                      3.9.1                    pypi_0    pypi
notebook                  7.4.0                    pypi_0    pypi
notebook-shim             0.2.4                    pypi_0    pypi
numba                     0.61.2                   pypi_0    pypi
numpy                     1.26.4                   pypi_0    pypi
nvidia-cublas-cu12        12.4.5.8                 pypi_0    pypi
nvidia-cuda-cupti-cu12    12.4.127                 pypi_0    pypi
nvidia-cuda-nvrtc-cu12    12.4.127                 pypi_0    pypi
nvidia-cuda-runtime-cu12  12.4.127                 pypi_0    pypi
nvidia-cudnn-cu12         9.1.0.70                 pypi_0    pypi
nvidia-cufft-cu12         11.2.1.3                 pypi_0    pypi
nvidia-curand-cu12        10.3.5.147               pypi_0    pypi
nvidia-cusolver-cu12      11.6.1.9                 pypi_0    pypi
nvidia-cusparse-cu12      12.3.1.170               pypi_0    pypi
nvidia-cusparselt-cu12    0.6.2                    pypi_0    pypi
nvidia-ml-py              12.570.86                pypi_0    pypi
nvidia-nccl-cu12          2.21.5                   pypi_0    pypi
nvidia-nvjitlink-cu12     12.4.127                 pypi_0    pypi
nvidia-nvtx-cu12          12.4.127                 pypi_0    pypi
nvitop                    1.4.2                    pypi_0    pypi
omegaconf                 2.0.0                    pypi_0    pypi
openai                    1.72.0                   pypi_0    pypi
opencc                    1.1.9                    pypi_0    pypi
opencv-python             4.11.0.86                pypi_0    pypi
opencv-python-headless    4.11.0.86                pypi_0    pypi
openpyxl                  3.1.5                    pypi_0    pypi
openssl                   3.0.16               h5eee18b_0  
opentelemetry-api         1.26.0                   pypi_0    pypi
opentelemetry-exporter-otlp 1.26.0                   pypi_0    pypi
opentelemetry-exporter-otlp-proto-common 1.26.0                   pypi_0    pypi
opentelemetry-exporter-otlp-proto-grpc 1.26.0                   pypi_0    pypi
opentelemetry-exporter-otlp-proto-http 1.26.0                   pypi_0    pypi
opentelemetry-proto       1.26.0                   pypi_0    pypi
opentelemetry-sdk         1.26.0                   pypi_0    pypi
opentelemetry-semantic-conventions 0.47b0                   pypi_0    pypi
opentelemetry-semantic-conventions-ai 0.4.3                    pypi_0    pypi
orjson                    3.10.16                  pypi_0    pypi
oss2                      2.19.1                   pypi_0    pypi
outlines                  0.1.11                   pypi_0    pypi
outlines-core             0.1.26                   pypi_0    pypi
overrides                 7.7.0                    pypi_0    pypi
packaging                 24.2                     pypi_0    pypi
pandas                    2.2.3                    pypi_0    pypi
pandocfilters             1.5.1                    pypi_0    pypi
parso                     0.8.4                    pypi_0    pypi
partial-json-parser       0.2.1.1.post5            pypi_0    pypi
peft                      0.14.0                   pypi_0    pypi
pexpect                   4.9.0                    pypi_0    pypi
phx-class-registry        4.1.0                    pypi_0    pypi
pillow                    11.1.0                   pypi_0    pypi
pip                       25.0            py310h06a4308_0  
platformdirs              4.3.7                    pypi_0    pypi
portalocker               3.1.1                    pypi_0    pypi
prettytable               3.16.0                   pypi_0    pypi
prometheus-client         0.21.1                   pypi_0    pypi
prometheus-fastapi-instrumentator 7.1.0                    pypi_0    pypi
prompt-toolkit            3.0.50                   pypi_0    pypi
propcache                 0.3.1                    pypi_0    pypi
proto-plus                1.26.1                   pypi_0    pypi
protobuf                  5.29.4                   pypi_0    pypi
psutil                    7.0.0                    pypi_0    pypi
ptyprocess                0.7.0                    pypi_0    pypi
pure-eval                 0.2.3                    pypi_0    pypi
py-cpuinfo                9.0.0                    pypi_0    pypi
pyarrow                   19.0.1                   pypi_0    pypi
pyasn1                    0.6.1                    pypi_0    pypi
pyasn1-modules            0.4.2                    pypi_0    pypi
pycocotools               2.0.8                    pypi_0    pypi
pycountry                 24.6.1                   pypi_0    pypi
pycparser                 2.22                     pypi_0    pypi
pycryptodome              3.22.0                   pypi_0    pypi
pydantic                  2.11.3                   pypi_0    pypi
pydantic-core             2.33.1                   pypi_0    pypi
pydeck                    0.9.1                    pypi_0    pypi
pydub                     0.25.1                   pypi_0    pypi
pygments                  2.19.1                   pypi_0    pypi
pynvml                    12.0.0                   pypi_0    pypi
pyparsing                 3.2.3                    pypi_0    pypi
pypinyin                  0.54.0                   pypi_0    pypi
python                    3.10.16              he870216_1  
python-dateutil           2.9.0.post0              pypi_0    pypi
python-dotenv             1.1.0                    pypi_0    pypi
python-json-logger        3.3.0                    pypi_0    pypi
python-levenshtein        0.27.1                   pypi_0    pypi
python-multipart          0.0.20                   pypi_0    pypi
pytz                      2025.2                   pypi_0    pypi
pyyaml                    6.0.2                    pypi_0    pypi
pyzmq                     26.4.0                   pypi_0    pypi
qwen-vl-utils             0.0.10                   pypi_0    pypi
rank-bm25                 0.2.2                    pypi_0    pypi
rapidfuzz                 3.13.0                   pypi_0    pypi
ray                       2.43.0                   pypi_0    pypi
readline                  8.2                  h5eee18b_0  
referencing               0.36.2                   pypi_0    pypi
regex                     2024.11.6                pypi_0    pypi
requests                  2.32.3                   pypi_0    pypi
rfc3339-validator         0.1.4                    pypi_0    pypi
rfc3986-validator         0.1.1                    pypi_0    pypi
rich                      14.0.0                   pypi_0    pypi
rich-toolkit              0.14.1                   pypi_0    pypi
rouge                     1.0.1                    pypi_0    pypi
rouge-chinese             1.0.3                    pypi_0    pypi
rouge-score               0.1.2                    pypi_0    pypi
rpds-py                   0.24.0                   pypi_0    pypi
rsa                       4.9.1                    pypi_0    pypi
ruff                      0.11.5                   pypi_0    pypi
s3transfer                0.11.4                   pypi_0    pypi
sacrebleu                 2.5.1                    pypi_0    pypi
safehttpx                 0.1.6                    pypi_0    pypi
safetensors               0.5.3                    pypi_0    pypi
scikit-image              0.25.2                   pypi_0    pypi
scikit-learn              1.6.1                    pypi_0    pypi
scipy                     1.15.2                   pypi_0    pypi
seaborn                   0.13.2                   pypi_0    pypi
semantic-version          2.10.0                   pypi_0    pypi
send2trash                1.8.3                    pypi_0    pypi
sentence-transformers     4.0.2                    pypi_0    pypi
sentencepiece             0.2.0                    pypi_0    pypi
sentry-sdk                2.26.1                   pypi_0    pypi
setproctitle              1.3.5                    pypi_0    pypi
setuptools                69.5.1                   pypi_0    pypi
sgmllib3k                 1.0.0                    pypi_0    pypi
shellingham               1.5.4                    pypi_0    pypi
simplejson                3.20.1                   pypi_0    pypi
six                       1.17.0                   pypi_0    pypi
smmap                     5.0.2                    pypi_0    pypi
sniffio                   1.3.1                    pypi_0    pypi
socksio                   1.0.0                    pypi_0    pypi
sortedcontainers          2.4.0                    pypi_0    pypi
soupsieve                 2.6                      pypi_0    pypi
sqlite                    3.45.3               h5eee18b_0  
stack-data                0.6.3                    pypi_0    pypi
starlette                 0.46.1                   pypi_0    pypi
streamlit                 1.44.1                   pypi_0    pypi
sty                       1.0.6                    pypi_0    pypi
swankit                   0.1.7                    pypi_0    pypi
swanlab                   0.5.5                    pypi_0    pypi
sympy                     1.13.1                   pypi_0    pypi
tabulate                  0.9.0                    pypi_0    pypi
tenacity                  9.1.2                    pypi_0    pypi
tensorboard               2.19.0                   pypi_0    pypi
tensorboard-data-server   0.7.2                    pypi_0    pypi
termcolor                 3.0.1                    pypi_0    pypi
terminado                 0.18.1                   pypi_0    pypi
threadpoolctl             3.6.0                    pypi_0    pypi
tifffile                  2025.3.30                pypi_0    pypi
tiktoken                  0.9.0                    pypi_0    pypi
timeout-decorator         0.5.0                    pypi_0    pypi
tinycss2                  1.4.0                    pypi_0    pypi
tk                        8.6.14               h39e8969_0  
tokenizers                0.21.1                   pypi_0    pypi
toml                      0.10.2                   pypi_0    pypi
tomli                     2.2.1                    pypi_0    pypi
tomlkit                   0.13.2                   pypi_0    pypi
torch                     2.6.0                    pypi_0    pypi
torchaudio                2.6.0                    pypi_0    pypi
torchmetrics              1.7.1                    pypi_0    pypi
torchvision               0.21.0                   pypi_0    pypi
tornado                   6.4.2                    pypi_0    pypi
tqdm                      4.67.1                   pypi_0    pypi
traitlets                 5.14.3                   pypi_0    pypi
transformers              4.51.3                   pypi_0    pypi
transformers-stream-generator 0.0.5                    pypi_0    pypi
triton                    3.2.0                    pypi_0    pypi
trl                       0.16.1                   pypi_0    pypi
typer                     0.15.2                   pypi_0    pypi
types-python-dateutil     2.9.0.20241206           pypi_0    pypi
typing-extensions         4.13.2                   pypi_0    pypi
typing-inspection         0.4.0                    pypi_0    pypi
tzdata                    2025.2                   pypi_0    pypi
uri-template              1.3.0                    pypi_0    pypi
uritemplate               4.1.1                    pypi_0    pypi
urllib3                   2.4.0                    pypi_0    pypi
uvicorn                   0.34.0                   pypi_0    pypi
uvloop                    0.21.0                   pypi_0    pypi
validators                0.34.0                   pypi_0    pypi
vllm                      0.8.4                    pypi_0    pypi
volcengine-python-sdk     1.1.5                    pypi_0    pypi
wandb                     0.19.9                   pypi_0    pypi
watchdog                  6.0.0                    pypi_0    pypi
watchfiles                1.0.5                    pypi_0    pypi
wcwidth                   0.2.13                   pypi_0    pypi
webcolors                 24.11.1                  pypi_0    pypi
webencodings              0.5.1                    pypi_0    pypi
websocket-client          1.8.0                    pypi_0    pypi
websockets                15.0.1                   pypi_0    pypi
werkzeug                  3.1.3                    pypi_0    pypi
wheel                     0.45.1          py310h06a4308_0  
widgetsnbextension        4.0.14                   pypi_0    pypi
word2number               1.1                      pypi_0    pypi
wrapt                     1.17.2                   pypi_0    pypi
xformers                  0.0.29.post2             pypi_0    pypi
xgrammar                  0.1.18                   pypi_0    pypi
xlsxwriter                3.2.2                    pypi_0    pypi
xtuner                    0.1.23                   pypi_0    pypi
xxhash                    3.5.0                    pypi_0    pypi
xz                        5.6.4                h5eee18b_1  
yapf                      0.43.0                   pypi_0    pypi
yarl                      1.19.0                   pypi_0    pypi
zipp                      3.21.0                   pypi_0    pypi
zlib                      1.2.13               h5eee18b_1  
zstandard                 0.23.0                   pypi_0    pypi

@hjh0119
Copy link
Collaborator

hjh0119 commented Apr 27, 2025

Can't reproduce this issue in the main branch. Does anyone have a clean environment and a script that can stably reproduce?

below is my repro script

export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
export NPROC_PER_NODE=8

swift rlhf \
    --rlhf_type grpo \
    --model Qwen/Qwen2.5-1.5B-Instruct \
    --reward_funcs accuracy format \
    --use_vllm true \
    --vllm_device auto \
    --vllm_gpu_memory_utilization 0.6 \
    --vllm_max_model_len 2048 \
    --train_type lora \
    --torch_dtype bfloat16 \
    --dataset AI-MO/NuminaMath-TIR \
    --max_completion_length 2048 \
    --num_train_epochs 3 \
    --per_device_train_batch_size 8 \
    --per_device_eval_batch_size 8 \
    --learning_rate 1e-6 \
    --gradient_accumulation_steps 1 \
    --eval_steps 200 \
    --save_steps 200 \
    --save_total_limit 2 \
    --logging_steps 5 \
    --max_length 4096 \
    --warmup_ratio 0.05 \
    --dataloader_num_workers 4 \
    --dataset_num_proc 4 \
    --num_generations 4 \
    --temperature 1.0 \
    --log_completions true \
    --beta 0.001 \
    --sleep_level 1 \
    --system swift/examples/train/grpo/prompt.txt \
    --num_infer_workers 8 \
    --deepspeed zero3

cuda: 12.1
python: 3.11

Package                                  Version       Editable project location
---------------------------------------- ------------- -----------------------------------
absl-py                                  2.2.2
accelerate                               1.4.0
addict                                   2.4.0
aiofiles                                 24.1.0
aiohappyeyeballs                         2.6.1
aiohttp                                  3.11.16
aiosignal                                1.3.2
airportsdata                             20250224
aliyun-python-sdk-core                   2.16.0
aliyun-python-sdk-kms                    2.16.5
annotated-types                          0.7.0
antlr4-python3-runtime                   4.13.2
anyio                                    4.9.0
astor                                    0.8.1
asttokens                                3.0.0
attrdict                                 2.0.1
attrs                                    25.3.0
av                                       14.3.0
binpacking                               1.5.2
bitsandbytes                             0.45.5
blake3                                   1.0.4
cachetools                               5.5.2
certifi                                  2025.1.31
cffi                                     1.17.1
charset-normalizer                       3.4.1
click                                    8.1.8
cloudpickle                              3.1.1
colorama                                 0.4.6
comm                                     0.2.2
compressed-tensors                       0.9.3
contourpy                                1.3.1
cpm-kernels                              1.0.11
crcmod                                   1.7
cryptography                             44.0.2
cupy-cuda12x                             13.4.1
cycler                                   0.12.1
dacite                                   1.9.2
datasets                                 3.2.0
debugpy                                  1.8.14
decorator                                5.2.1
deepspeed                                0.16.7
Deprecated                               1.2.18
depyf                                    0.18.0
dill                                     0.3.8
diskcache                                5.6.3
distro                                   1.9.0
dnspython                                2.7.0
docker-pycreds                           0.4.0
einops                                   0.8.1
email_validator                          2.2.0
evalscope                                0.14.0
executing                                2.2.0
fastapi                                  0.115.12
fastapi-cli                              0.0.7
fastrlock                                0.8.3
ffmpy                                    0.5.0
filelock                                 3.18.0
flash_attn                               2.7.4.post1
fonttools                                4.57.0
frozenlist                               1.5.0
fsspec                                   2024.9.0
future                                   1.0.0
gguf                                     0.16.2
gitdb                                    4.0.12
GitPython                                3.1.44
googleapis-common-protos                 1.70.0
gradio                                   5.25.0
gradio_client                            1.8.0
groovy                                   0.1.2
grpcio                                   1.71.0
h11                                      0.14.0
hf_transfer                              0.1.9
hf-xet                                   1.0.3
hjson                                    3.1.0
httpcore                                 1.0.8
httptools                                0.6.4
httpx                                    0.28.1
huggingface-hub                          0.30.2
idna                                     3.10
immutabledict                            4.2.1
importlib_metadata                       8.0.0
iniconfig                                2.1.0
inquirerpy                               0.3.4
interegular                              0.3.3
ipykernel                                6.29.5
ipython                                  9.1.0
ipython_pygments_lexers                  1.1.1
jedi                                     0.19.2
jieba                                    0.42.1
Jinja2                                   3.1.6
jiter                                    0.9.0
jmespath                                 0.10.0
joblib                                   1.4.2
jsonlines                                4.0.0
jsonschema                               4.23.0
jsonschema-specifications                2024.10.1
jupyter_client                           8.6.3
jupyter_core                             5.7.2
kiwisolver                               1.4.8
langdetect                               1.0.9
lark                                     1.2.2
latex2sympy2                             1.9.1
latex2sympy2_extended                    1.0.6
liger_kernel                             0.5.8
llguidance                               0.7.16
llvmlite                                 0.44.0
lm-format-enforcer                       0.10.11
lxml                                     5.3.2
Markdown                                 3.8
markdown-it-py                           3.0.0
MarkupSafe                               3.0.2
math-verify                              0.5.2
matplotlib                               3.10.1
matplotlib-inline                        0.1.7
mdurl                                    0.1.2
mistral_common                           1.5.4
modelscope                               1.25.0
mpmath                                   1.3.0
ms_swift                                 3.4.0.dev0    /mnt/nas2/hujinghan.hjh/swift
msgpack                                  1.1.0
msgspec                                  0.19.0
multidict                                6.4.3
multiprocess                             0.70.16
nanobind                                 2.7.0
nest-asyncio                             1.6.0
networkx                                 3.4.2
ninja                                    1.11.1.4
nltk                                     3.9.1
numba                                    0.61.2
numpy                                    1.26.4
nvidia-cublas-cu11                       11.11.3.6
nvidia-cublas-cu12                       12.4.5.8
nvidia-cuda-cupti-cu11                   11.8.87
nvidia-cuda-cupti-cu12                   12.4.127
nvidia-cuda-nvrtc-cu11                   11.8.89
nvidia-cuda-nvrtc-cu12                   12.4.127
nvidia-cuda-runtime-cu11                 11.8.89
nvidia-cuda-runtime-cu12                 12.4.127
nvidia-cudnn-cu11                        9.1.0.70
nvidia-cudnn-cu12                        9.1.0.70
nvidia-cufft-cu11                        10.9.0.58
nvidia-cufft-cu12                        11.2.1.3
nvidia-curand-cu11                       10.3.0.86
nvidia-curand-cu12                       10.3.5.147
nvidia-cusolver-cu11                     11.4.1.48
nvidia-cusolver-cu12                     11.6.1.9
nvidia-cusparse-cu11                     11.7.5.86
nvidia-cusparse-cu12                     12.3.1.170
nvidia-cusparselt-cu12                   0.6.2
nvidia-ml-py                             12.570.86
nvidia-nccl-cu11                         2.21.5
nvidia-nccl-cu12                         2.21.5
nvidia-nvjitlink-cu12                    12.4.127
nvidia-nvtx-cu11                         11.8.86
nvidia-nvtx-cu12                         12.4.127
open-r1                                  0.1.0.dev0    /mnt/nas2/hujinghan.hjh/open-r1/src
openai                                   1.73.0
opencv-python-headless                   4.11.0.86
opentelemetry-api                        1.26.0
opentelemetry-exporter-otlp              1.26.0
opentelemetry-exporter-otlp-proto-common 1.26.0
opentelemetry-exporter-otlp-proto-grpc   1.26.0
opentelemetry-exporter-otlp-proto-http   1.26.0
opentelemetry-proto                      1.26.0
opentelemetry-sdk                        1.26.0
opentelemetry-semantic-conventions       0.47b0
opentelemetry-semantic-conventions-ai    0.4.3
orjson                                   3.10.16
oss2                                     2.19.1
outlines                                 0.1.11
outlines_core                            0.1.26
packaging                                24.2
pandas                                   2.2.3
parso                                    0.8.4
partial-json-parser                      0.2.1.1.post5
peft                                     0.15.1
pexpect                                  4.9.0
pfzy                                     0.3.4
pillow                                   11.2.1
pip                                      25.0
platformdirs                             4.3.7
pluggy                                   1.5.0
portalocker                              3.1.1
prometheus_client                        0.21.1
prometheus-fastapi-instrumentator        7.1.0
prompt_toolkit                           3.0.50
propcache                                0.3.1
protobuf                                 4.25.6
psutil                                   7.0.0
ptyprocess                               0.7.0
pure_eval                                0.2.3
py-cpuinfo                               9.0.0
pyarrow                                  19.0.1
pybind11                                 2.13.6
pycountry                                24.6.1
pycparser                                2.22
pycryptodome                             3.22.0
pydantic                                 2.11.3
pydantic_core                            2.33.1
pydub                                    0.25.1
Pygments                                 2.19.1
pyparsing                                3.2.3
pytest                                   8.3.5
python-dateutil                          2.9.0.post0
python-dotenv                            1.1.0
python-json-logger                       3.3.0
python-multipart                         0.0.20
pytz                                     2025.2
PyYAML                                   6.0.2
pyzmq                                    26.4.0
qwen-vl-utils                            0.0.10
ray                                      2.43.0
referencing                              0.36.2
regex                                    2024.11.6
requests                                 2.32.3
rich                                     14.0.0
rich-toolkit                             0.14.1
rouge                                    1.0.1
rouge-chinese                            1.0.3
rouge_score                              0.1.2
rpds-py                                  0.24.0
ruff                                     0.11.5
sacrebleu                                2.5.1
safehttpx                                0.1.6
safetensors                              0.5.3
scikit-learn                             1.6.1
scipy                                    1.15.2
seaborn                                  0.13.2
semantic-version                         2.10.0
sentencepiece                            0.2.0
sentry-sdk                               2.26.0
setproctitle                             1.3.5
setuptools                               69.5.1
shellingham                              1.5.4
simplejson                               3.20.1
six                                      1.17.0
smmap                                    5.0.2
sniffio                                  1.3.1
sortedcontainers                         2.4.0
stack-data                               0.6.3
starlette                                0.46.2
sympy                                    1.13.1
tabulate                                 0.9.0
tensorboard                              2.19.0
tensorboard-data-server                  0.7.2
threadpoolctl                            3.6.0
tiktoken                                 0.9.0
timm                                     1.0.15
tokenizers                               0.21.1
tomlkit                                  0.13.2
torch                                    2.6.0
torchaudio                               2.6.0
torchvision                              0.21.0
tornado                                  6.4.2
tqdm                                     4.67.1
traitlets                                5.14.3
transformers                             4.51.2
transformers-stream-generator            0.0.5
triton                                   3.2.0
trl                                      0.17.0.dev0   /mnt/nas2/hujinghan.hjh/trl
typer                                    0.15.2
typing_extensions                        4.13.2
typing-inspection                        0.4.0
tzdata                                   2025.2
urllib3                                  2.4.0
uvicorn                                  0.34.1
uvloop                                   0.21.0
vllm                                     0.8.4
wandb                                    0.19.9
watchfiles                               1.0.5
wcwidth                                  0.2.13
websockets                               15.0.1
Werkzeug                                 3.1.3
wheel                                    0.45.1
word2number                              1.1
wrapt                                    1.17.2
xformers                                 0.0.29.post2
xgrammar                                 0.1.18
xxhash                                   3.5.0
yarl                                     1.19.0
zipp                                     3.21.0
zstandard                                0.23.0

@winni0
Copy link
Author

winni0 commented Apr 27, 2025

无法在主分支中复现此问题。有人有干净的环境和可以稳定复现的脚本吗?

下面是我的复制脚本

export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
export NPROC_PER_NODE=8

swift rlhf
--rlhf_type grpo
--model Qwen/Qwen2.5-1.5B-Instruct
--reward_funcs accuracy format
--use_vllm true
--vllm_device auto
--vllm_gpu_memory_utilization 0.6
--vllm_max_model_len 2048
--train_type lora
--torch_dtype bfloat16
--dataset AI-MO/NuminaMath-TIR
--max_completion_length 2048
--num_train_epochs 3
--per_device_train_batch_size 8
--per_device_eval_batch_size 8
--learning_rate 1e-6
--gradient_accumulation_steps 1
--eval_steps 200
--save_steps 200
--save_total_limit 2
--logging_steps 5
--max_length 4096
--warmup_ratio 0.05
--dataloader_num_workers 4
--dataset_num_proc 4
--num_generations 4
--temperature 1.0
--log_completions true
--beta 0.001
--sleep_level 1
--system swift/examples/train/grpo/prompt.txt
--num_infer_workers 8
--deepspeed zero3
CUDA:12.1 Python:3.11

Package                                  Version       Editable project location
---------------------------------------- ------------- -----------------------------------
absl-py                                  2.2.2
accelerate                               1.4.0
addict                                   2.4.0
aiofiles                                 24.1.0
aiohappyeyeballs                         2.6.1
aiohttp                                  3.11.16
aiosignal                                1.3.2
airportsdata                             20250224
aliyun-python-sdk-core                   2.16.0
aliyun-python-sdk-kms                    2.16.5
annotated-types                          0.7.0
antlr4-python3-runtime                   4.13.2
anyio                                    4.9.0
astor                                    0.8.1
asttokens                                3.0.0
attrdict                                 2.0.1
attrs                                    25.3.0
av                                       14.3.0
binpacking                               1.5.2
bitsandbytes                             0.45.5
blake3                                   1.0.4
cachetools                               5.5.2
certifi                                  2025.1.31
cffi                                     1.17.1
charset-normalizer                       3.4.1
click                                    8.1.8
cloudpickle                              3.1.1
colorama                                 0.4.6
comm                                     0.2.2
compressed-tensors                       0.9.3
contourpy                                1.3.1
cpm-kernels                              1.0.11
crcmod                                   1.7
cryptography                             44.0.2
cupy-cuda12x                             13.4.1
cycler                                   0.12.1
dacite                                   1.9.2
datasets                                 3.2.0
debugpy                                  1.8.14
decorator                                5.2.1
deepspeed                                0.16.7
Deprecated                               1.2.18
depyf                                    0.18.0
dill                                     0.3.8
diskcache                                5.6.3
distro                                   1.9.0
dnspython                                2.7.0
docker-pycreds                           0.4.0
einops                                   0.8.1
email_validator                          2.2.0
evalscope                                0.14.0
executing                                2.2.0
fastapi                                  0.115.12
fastapi-cli                              0.0.7
fastrlock                                0.8.3
ffmpy                                    0.5.0
filelock                                 3.18.0
flash_attn                               2.7.4.post1
fonttools                                4.57.0
frozenlist                               1.5.0
fsspec                                   2024.9.0
future                                   1.0.0
gguf                                     0.16.2
gitdb                                    4.0.12
GitPython                                3.1.44
googleapis-common-protos                 1.70.0
gradio                                   5.25.0
gradio_client                            1.8.0
groovy                                   0.1.2
grpcio                                   1.71.0
h11                                      0.14.0
hf_transfer                              0.1.9
hf-xet                                   1.0.3
hjson                                    3.1.0
httpcore                                 1.0.8
httptools                                0.6.4
httpx                                    0.28.1
huggingface-hub                          0.30.2
idna                                     3.10
immutabledict                            4.2.1
importlib_metadata                       8.0.0
iniconfig                                2.1.0
inquirerpy                               0.3.4
interegular                              0.3.3
ipykernel                                6.29.5
ipython                                  9.1.0
ipython_pygments_lexers                  1.1.1
jedi                                     0.19.2
jieba                                    0.42.1
Jinja2                                   3.1.6
jiter                                    0.9.0
jmespath                                 0.10.0
joblib                                   1.4.2
jsonlines                                4.0.0
jsonschema                               4.23.0
jsonschema-specifications                2024.10.1
jupyter_client                           8.6.3
jupyter_core                             5.7.2
kiwisolver                               1.4.8
langdetect                               1.0.9
lark                                     1.2.2
latex2sympy2                             1.9.1
latex2sympy2_extended                    1.0.6
liger_kernel                             0.5.8
llguidance                               0.7.16
llvmlite                                 0.44.0
lm-format-enforcer                       0.10.11
lxml                                     5.3.2
Markdown                                 3.8
markdown-it-py                           3.0.0
MarkupSafe                               3.0.2
math-verify                              0.5.2
matplotlib                               3.10.1
matplotlib-inline                        0.1.7
mdurl                                    0.1.2
mistral_common                           1.5.4
modelscope                               1.25.0
mpmath                                   1.3.0
ms_swift                                 3.4.0.dev0    /mnt/nas2/hujinghan.hjh/swift
msgpack                                  1.1.0
msgspec                                  0.19.0
multidict                                6.4.3
multiprocess                             0.70.16
nanobind                                 2.7.0
nest-asyncio                             1.6.0
networkx                                 3.4.2
ninja                                    1.11.1.4
nltk                                     3.9.1
numba                                    0.61.2
numpy                                    1.26.4
nvidia-cublas-cu11                       11.11.3.6
nvidia-cublas-cu12                       12.4.5.8
nvidia-cuda-cupti-cu11                   11.8.87
nvidia-cuda-cupti-cu12                   12.4.127
nvidia-cuda-nvrtc-cu11                   11.8.89
nvidia-cuda-nvrtc-cu12                   12.4.127
nvidia-cuda-runtime-cu11                 11.8.89
nvidia-cuda-runtime-cu12                 12.4.127
nvidia-cudnn-cu11                        9.1.0.70
nvidia-cudnn-cu12                        9.1.0.70
nvidia-cufft-cu11                        10.9.0.58
nvidia-cufft-cu12                        11.2.1.3
nvidia-curand-cu11                       10.3.0.86
nvidia-curand-cu12                       10.3.5.147
nvidia-cusolver-cu11                     11.4.1.48
nvidia-cusolver-cu12                     11.6.1.9
nvidia-cusparse-cu11                     11.7.5.86
nvidia-cusparse-cu12                     12.3.1.170
nvidia-cusparselt-cu12                   0.6.2
nvidia-ml-py                             12.570.86
nvidia-nccl-cu11                         2.21.5
nvidia-nccl-cu12                         2.21.5
nvidia-nvjitlink-cu12                    12.4.127
nvidia-nvtx-cu11                         11.8.86
nvidia-nvtx-cu12                         12.4.127
open-r1                                  0.1.0.dev0    /mnt/nas2/hujinghan.hjh/open-r1/src
openai                                   1.73.0
opencv-python-headless                   4.11.0.86
opentelemetry-api                        1.26.0
opentelemetry-exporter-otlp              1.26.0
opentelemetry-exporter-otlp-proto-common 1.26.0
opentelemetry-exporter-otlp-proto-grpc   1.26.0
opentelemetry-exporter-otlp-proto-http   1.26.0
opentelemetry-proto                      1.26.0
opentelemetry-sdk                        1.26.0
opentelemetry-semantic-conventions       0.47b0
opentelemetry-semantic-conventions-ai    0.4.3
orjson                                   3.10.16
oss2                                     2.19.1
outlines                                 0.1.11
outlines_core                            0.1.26
packaging                                24.2
pandas                                   2.2.3
parso                                    0.8.4
partial-json-parser                      0.2.1.1.post5
peft                                     0.15.1
pexpect                                  4.9.0
pfzy                                     0.3.4
pillow                                   11.2.1
pip                                      25.0
platformdirs                             4.3.7
pluggy                                   1.5.0
portalocker                              3.1.1
prometheus_client                        0.21.1
prometheus-fastapi-instrumentator        7.1.0
prompt_toolkit                           3.0.50
propcache                                0.3.1
protobuf                                 4.25.6
psutil                                   7.0.0
ptyprocess                               0.7.0
pure_eval                                0.2.3
py-cpuinfo                               9.0.0
pyarrow                                  19.0.1
pybind11                                 2.13.6
pycountry                                24.6.1
pycparser                                2.22
pycryptodome                             3.22.0
pydantic                                 2.11.3
pydantic_core                            2.33.1
pydub                                    0.25.1
Pygments                                 2.19.1
pyparsing                                3.2.3
pytest                                   8.3.5
python-dateutil                          2.9.0.post0
python-dotenv                            1.1.0
python-json-logger                       3.3.0
python-multipart                         0.0.20
pytz                                     2025.2
PyYAML                                   6.0.2
pyzmq                                    26.4.0
qwen-vl-utils                            0.0.10
ray                                      2.43.0
referencing                              0.36.2
regex                                    2024.11.6
requests                                 2.32.3
rich                                     14.0.0
rich-toolkit                             0.14.1
rouge                                    1.0.1
rouge-chinese                            1.0.3
rouge_score                              0.1.2
rpds-py                                  0.24.0
ruff                                     0.11.5
sacrebleu                                2.5.1
safehttpx                                0.1.6
safetensors                              0.5.3
scikit-learn                             1.6.1
scipy                                    1.15.2
seaborn                                  0.13.2
semantic-version                         2.10.0
sentencepiece                            0.2.0
sentry-sdk                               2.26.0
setproctitle                             1.3.5
setuptools                               69.5.1
shellingham                              1.5.4
simplejson                               3.20.1
six                                      1.17.0
smmap                                    5.0.2
sniffio                                  1.3.1
sortedcontainers                         2.4.0
stack-data                               0.6.3
starlette                                0.46.2
sympy                                    1.13.1
tabulate                                 0.9.0
tensorboard                              2.19.0
tensorboard-data-server                  0.7.2
threadpoolctl                            3.6.0
tiktoken                                 0.9.0
timm                                     1.0.15
tokenizers                               0.21.1
tomlkit                                  0.13.2
torch                                    2.6.0
torchaudio                               2.6.0
torchvision                              0.21.0
tornado                                  6.4.2
tqdm                                     4.67.1
traitlets                                5.14.3
transformers                             4.51.2
transformers-stream-generator            0.0.5
triton                                   3.2.0
trl                                      0.17.0.dev0   /mnt/nas2/hujinghan.hjh/trl
typer                                    0.15.2
typing_extensions                        4.13.2
typing-inspection                        0.4.0
tzdata                                   2025.2
urllib3                                  2.4.0
uvicorn                                  0.34.1
uvloop                                   0.21.0
vllm                                     0.8.4
wandb                                    0.19.9
watchfiles                               1.0.5
wcwidth                                  0.2.13
websockets                               15.0.1
Werkzeug                                 3.1.3
wheel                                    0.45.1
word2number                              1.1
wrapt                                    1.17.2
xformers                                 0.0.29.post2
xgrammar                                 0.1.18
xxhash                                   3.5.0
yarl                                     1.19.0
zipp                                     3.21.0
zstandard                                0.23.0

我用的这个镜像
modelscope-registry.us-west-1.cr.aliyuncs.com/modelscope-repo/modelscope:ubuntu22.04-cuda12.4.0-py311-torch2.5.1-modelscope1.25.0-swift3.2.2

@hjh0119
Copy link
Collaborator

hjh0119 commented Apr 27, 2025

@winni0 do you have any reproducible scripts? (Please use open-source models and datasets to ensure experimental consistency)

@chengximeng67
Copy link

nproc_per_node=4
MAX_PIXELS=250880 \
CUDA_VISIBLE_DEVICES=0,1,2,3 \
NPROC_PER_NODE=$nproc_per_node \
swift rlhf \
    --rlhf_type grpo \
    --model /home/member/data1/chxm/MultiModal-R1/GRPO_RESIZE/multimodal_cot_2epoch/v6-20250426-002814/checkpoint-1100 \
    --external_plugins /home/member/data2/chxm/ms-swift-main/examples/train/grpo/plugin/plugin.py \
    --reward_funcs MultiModal_Iou_Shaped Consistency_Reward Multimodal_Format \
    --use_vllm true \
    --vllm_device auto \
    --vllm_gpu_memory_utilization 0.7 \
    --train_type full \
    --torch_dtype bfloat16 \
    --dataset '/home/member/data2/chxm/MultiModal-R1/dataset/LLVIP/LLVIP_CoT/step7_grpo_train_dataset_detection_raw_gt.json' \
    --max_length 2048 \
    --max_completion_length 2048 \
    --num_train_epochs 1 \
    --per_device_train_batch_size 2 \
    --per_device_eval_batch_size 2 \
    --learning_rate 1e-6 \
    --vllm_limit_mm_per_prompt '{"image": 2}' \
    --gradient_accumulation_steps 4 \
    --save_strategy 'steps' \
    --eval_strategy 'steps' \
    --eval_steps 100 \
    --save_steps 100 \
    --save_total_limit 50 \
    --logging_steps 1 \
    --output_dir /data1/chxm/MultiModal-R1/GRPO_RESIZE/GRPO \
    --warmup_ratio 0.01 \
    --dataloader_num_workers 4 \
    --num_generations 4 \
    --temperature 1.0 \
    --repetition_penalty 1 \
    --system 'You are a helpful assistant.' \
    --deepspeed zero3 \
    --log_completions true \
    --num_iterations 1 \
    --num_infer_workers 4 \
    --async_generate false \
    --beta 0.05 \
    --max_grad_norm 0.7 \
    --sleep_level 1 \
    --attn_impl flash_attn

data_format:LLVIP

  {
    "images": [
      "/home/member/data2/chxm/MultiModal-R1/dataset/LLVIP/LLVIP_merged_distorted/visible/train/080335.jpg",
      "/home/member/data2/chxm/MultiModal-R1/dataset/LLVIP/LLVIP_merged_distorted/infrared/train/080335.jpg"
    ],
    "messages": [
      {
        "role": "user",
        "content": "question"
      }
    ],
    "solution": "<answer>\n[{\"bbox_2d\":[37,90,88,184],\"label\":\"person\"},{\"bbox_2d\":[462,169,502,275],\"label\":\"person\"},{\"bbox_2d\":[502,200,542,303],\"label\":\"person\"},{\"bbox_2d\":[483,337,518,447],\"label\":\"person\"},{\"bbox_2d\":[537,178,559,294],\"label\":\"person\"},{\"bbox_2d\":[479,303,547,388],\"label\":\"person\"}]\n</answer>"
  },

torch2.4
python3.10.16
cuda12.4
flash_aattn=True

@chengximeng67
Copy link

nproc_per_node=4
MAX_PIXELS=250880 \
CUDA_VISIBLE_DEVICES=0,1,2,3 \
NPROC_PER_NODE=$nproc_per_node \
swift rlhf \
    --rlhf_type grpo \
    --model /home/member/data1/chxm/MultiModal-R1/GRPO_RESIZE/multimodal_cot_2epoch/v6-20250426-002814/checkpoint-1100 \
    --external_plugins /home/member/data2/chxm/ms-swift-main/examples/train/grpo/plugin/plugin.py \
    --reward_funcs MultiModal_Iou_Shaped Consistency_Reward Multimodal_Format \
    --use_vllm true \
    --vllm_device auto \
    --vllm_gpu_memory_utilization 0.7 \
    --train_type full \
    --torch_dtype bfloat16 \
    --dataset '/home/member/data2/chxm/MultiModal-R1/dataset/LLVIP/LLVIP_CoT/step7_grpo_train_dataset_detection_raw_gt.json' \
    --max_length 2048 \
    --max_completion_length 2048 \
    --num_train_epochs 1 \
    --per_device_train_batch_size 2 \
    --per_device_eval_batch_size 2 \
    --learning_rate 1e-6 \
    --vllm_limit_mm_per_prompt '{"image": 2}' \
    --gradient_accumulation_steps 4 \
    --save_strategy 'steps' \
    --eval_strategy 'steps' \
    --eval_steps 100 \
    --save_steps 100 \
    --save_total_limit 50 \
    --logging_steps 1 \
    --output_dir /data1/chxm/MultiModal-R1/GRPO_RESIZE/GRPO \
    --warmup_ratio 0.01 \
    --dataloader_num_workers 4 \
    --num_generations 4 \
    --temperature 1.0 \
    --repetition_penalty 1 \
    --system 'You are a helpful assistant.' \
    --deepspeed zero3 \
    --log_completions true \
    --num_iterations 1 \
    --num_infer_workers 4 \
    --async_generate false \
    --beta 0.05 \
    --max_grad_norm 0.7 \
    --sleep_level 1 \
    --attn_impl flash_attn

data_format:LLVIP

  {
    "images": [
      "/home/member/data2/chxm/MultiModal-R1/dataset/LLVIP/LLVIP_merged_distorted/visible/train/080335.jpg",
      "/home/member/data2/chxm/MultiModal-R1/dataset/LLVIP/LLVIP_merged_distorted/infrared/train/080335.jpg"
    ],
    "messages": [
      {
        "role": "user",
        "content": "question"
      }
    ],
    "solution": "<answer>\n[{\"bbox_2d\":[37,90,88,184],\"label\":\"person\"},{\"bbox_2d\":[462,169,502,275],\"label\":\"person\"},{\"bbox_2d\":[502,200,542,303],\"label\":\"person\"},{\"bbox_2d\":[483,337,518,447],\"label\":\"person\"},{\"bbox_2d\":[537,178,559,294],\"label\":\"person\"},{\"bbox_2d\":[479,303,547,388],\"label\":\"person\"}]\n</answer>"
  },

torch2.4 python3.10.16 cuda12.4 flash_aattn=真

swift=3.4.0

@alanMachineLeraning
Copy link

同样遇到此问题,每次训练1000多iter的时候就会出现,稳定复现

@alanMachineLeraning
Copy link

swift.version
'3.3.1'

@chengximeng67
Copy link

chengximeng67 commented May 2, 2025

您有没有可重现的脚本?(请使用开源模型和数据集,以确保实验的一致性)

我们使用的是基于开源数据集二次构建的数据集,和普通的数据训练的特殊之处在于向vllm传入2_images参数,并且采用多图进行训练

@alanMachineLeraning
Copy link

大家后来怎么解决的,我稳定复现这个报错,我的命令脚本如下:
CUDA_VISIBLE_DEVICES=0,1,2,3
NPROC_PER_NODE=4
MAX_PIXELS=50176
swift rlhf
--rlhf_type grpo
--model Qwen2.5-VL-32B-Instruct
--external_plugins swift/qwen_fine_tuning/plugin/plugin.py
--train_type lora
--dataset 'lmms-lab/multimodal-open-r1-8k-verified'
--torch_dtype bfloat16
--num_train_epochs 3
--max_length 2048
--per_device_train_batch_size 1
--per_device_eval_batch_size 1
--gradient_accumulation_steps 1
--eval_steps 1000
--save_steps 1000
--learning_rate 1e-5
--save_total_limit 2
--logging_steps 5
--output_dir output/GRPO
--warmup_ratio 0.05
--dataloader_num_workers 4
--max_completion_length 1024
--reward_funcs accuracy format
--num_generations 4
--system ms-swift/examples/train/grpo/prompt.txt
--use_vllm true
--vllm_gpu_memory_utilization 0.4
--vllm_max_model_len 2048
--deepspeed zero3
--temperature 1.0
--top_p 1.0
--log_completions true
--num_infer_workers 4
--tensor_parallel_size 4
--async_generate false
--move_model_batches 16
--offload_optimizer true
--offload_model true
--gc_collect_after_offload true
--sleep_level 1

@chengximeng67
Copy link

大家后来怎么解决的,我稳定复现这个报错,我的命令脚本如下: CUDA_VISIBLE_DEVICES=0,1,2,3 NPROC_PER_NODE=4 MAX_PIXELS=50176 swift rlhf --rlhf_type grpo --model Qwen2.5-VL-32B-Instruct --external_plugins swift/qwen_fine_tuning/plugin/plugin.py --train_type lora --dataset 'lmms-lab/multimodal-open-r1-8k-verified' --torch_dtype bfloat16 --num_train_epochs 3 --max_length 2048 --per_device_train_batch_size 1 --per_device_eval_batch_size 1 --gradient_accumulation_steps 1 --eval_steps 1000 --save_steps 1000 --learning_rate 1e-5 --save_total_limit 2 --logging_steps 5 --output_dir output/GRPO --warmup_ratio 0.05 --dataloader_num_workers 4 --max_completion_length 1024 --reward_funcs accuracy format --num_generations 4 --system ms-swift/examples/train/grpo/prompt.txt --use_vllm true --vllm_gpu_memory_utilization 0.4 --vllm_max_model_len 2048 --deepspeed zero3 --temperature 1.0 --top_p 1.0 --log_completions true --num_infer_workers 4 --tensor_parallel_size 4 --async_generate false --move_model_batches 16 --offload_optimizer true --offload_model true --gc_collect_after_offload true --sleep_level 1

同样采用--sleep_level 1 --deepspeed zero3

@winni0
Copy link
Author

winni0 commented May 2, 2025

大家后来怎么解决的,我稳定复现这个报错,我的命令脚本如下: CUDA_VISIBLE_DEVICES=0,1,2,3 NPROC_PER_NODE=4 MAX_PIXELS=50176 swift rlhf --rlhf_type grpo --model Qwen2.5-VL-32B-Instruct --external_plugins swift/qwen_fine_tuning/plugin/plugin.py --train_type lora --dataset 'lmms-lab/multimodal-open-r1-8k-verified' --torch_dtype bfloat16 --num_train_epochs 3 --max_length 2048 --per_device_train_batch_size 1 --per_device_eval_batch_size 1 --gradient_accumulation_steps 1 --eval_steps 1000 --save_steps 1000 --learning_rate 1e-5 --save_total_limit 2 --logging_steps 5 --output_dir output/GRPO --warmup_ratio 0.05 --dataloader_num_workers 4 --max_completion_length 1024 --reward_funcs accuracy format --num_generations 4 --system ms-swift/examples/train/grpo/prompt.txt --use_vllm true --vllm_gpu_memory_utilization 0.4 --vllm_max_model_len 2048 --deepspeed zero3 --temperature 1.0 --top_p 1.0 --log_completions true --num_infer_workers 4 --tensor_parallel_size 4 --async_generate false --move_model_batches 16 --offload_optimizer true --offload_model true --gc_collect_after_offload true --sleep_level 1

一直没解决呢

@winni0
Copy link
Author

winni0 commented May 2, 2025

@winni0 do you have any reproducible scripts? (Please use open-source models and datasets to ensure experimental consistency)

用的数据集是modelscope上开源的数据集,模型是开源模型Qwen2.5-7B-Instruct

@hjh0119
Copy link
Collaborator

hjh0119 commented May 6, 2025

The refactoring of the internal vLLM codebase is currently a work in progress.
It is recommended to use the external vLLM server for now.

@littttttlebird
Copy link

这个问题都解决了吗?稳定复现,1890步报错,我还以为是我的数据有问题呢

@hjh0119 hjh0119 mentioned this issue May 7, 2025
9 tasks
@hjh0119 hjh0119 self-assigned this May 8, 2025
@hjh0119
Copy link
Collaborator

hjh0119 commented May 13, 2025

The code for GRPOTrainer has been refactored. Please try again using the main branch.

@DogeWatch
Copy link

The code for GRPOTrainer has been refactored. Please try again using the main branch.

v3.4.1 这个版本修复了吗?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants