Skip to content

开启了ignore_empty_think,框架会自动删除<think>\n\n</think>\n\n,导致模型不思考 #4854

Closed
@leipan797

Description

@leipan797

Describe the bug
在混合推理训练的时候,开启了ignore_empty_think,并且数据集中非思考数据开头也加上了\n\n\n\n,训练后,并不能达到混合推理的效果,模型完全不思考

Additional context
NNODES=$NNODES
NPROC_PER_NODE=8
NODE_RANK=$NODE_RANK
megatron sft
--load Qwen3-1.7B-Base-mcore
--dataset ''
--loss_scale ignore_empty_think
--tensor_model_parallel_size 1
--sequence_parallel true
--micro_batch_size 1
--global_batch_size 128
--packing true
--recompute_granularity full
--recompute_method uniform
--recompute_num_layers 1
--max_epochs 3
--finetune true
--cross_entropy_loss_fusion true
--lr 2e-4
--min_lr 1e-12
--save megatron_output/random_select_1m_mixed_think
--save_interval 1500
--max_length 16384
--num_workers 8
--dataset_num_proc 8
--no_save_optim true
--no_save_rng true
--sequence_parallel true
--log_interval 1
--use_flash_attn true
--log_interval 1
--packing_cache shared_cache

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions