Skip to content

单卡A800微调PPDocBee报错,文件缺失 #1358

@xdaiycl

Description

@xdaiycl

Downloading from https://bj.bcebos.com/paddlenlp/models/community/PaddleMIX/PPDocBee-2B-1129/image_preprocessor_config.json failed with code 404!

Shell如下:
set -x

GPUS=${GPUS:-1}
BATCH_SIZE=${BATCH_SIZE:-8}
PER_DEVICE_BATCH_SIZE=${PER_DEVICE_BATCH_SIZE:-1}

GRADIENT_ACC=$((BATCH_SIZE / PER_DEVICE_BATCH_SIZE / GPUS))
tensor_parallel_degree=${tensor_parallel_degree:-1}
sharding_parallel_degree=$((GPUS / tensor_parallel_degree))

export PYTHONPATH="${PYTHONPATH}:$(pwd)"
export MASTER_PORT=34229
export TF_CPP_MIN_LOG_LEVEL=3

OUTPUT_DIR='work_dirs/ppdocbee_public_dataset'

if [ ! -d "$OUTPUT_DIR" ]; then
mkdir -p "$OUTPUT_DIR"
fi

TRAINING_MODEL_RESUME="None"
TRAINER_INSTANCES='127.0.0.1'
MASTER='127.0.0.1:8080'

meta_path="/root/autodl-tmp/Jsons/train.json"

meta_path="paddlemix/examples/ppdocbee/configs/ppdocbee_public_dataset.json"

TRAINING_PYTHON="python -m paddle.distributed.launch --master ${MASTER} --nnodes 1 --nproc_per_node ${GPUS} --rank 0 --ips ${TRAINER_INSTANCES} --run_mode=collective"
${TRAINING_PYTHON} --log_dir ${OUTPUT_DIR}/paddle_distributed_logs
paddlemix/examples/ppdocbee/ppdocbee_finetune.py
--do_train
--model_name_or_path "PaddleMIX/PPDocBee-2B-1129"
--output_dir ${OUTPUT_DIR}
--logging_dir ${OUTPUT_DIR}/logs
--meta_path ${meta_path}
--overwrite_output_dir True
--dataloader_num_workers 8
--bf16 True
--fp16 False
--fp16_opt_level "O2"
--num_train_epochs 1
--per_device_train_batch_size ${PER_DEVICE_BATCH_SIZE}
--gradient_accumulation_steps ${GRADIENT_ACC}
--freeze_vit True
--max_seq_length 8192
--image_resolution 512
--recompute False
--max_grad_norm 1.0
--evaluation_strategy "no"
--save_strategy "steps"
--save_steps 1000
--save_total_limit 1
--learning_rate 1e-8
--warmup_ratio 0.1
--warmup_steps 100
--weight_decay 0.1
--optim "adamw"
--lr_scheduler_type "cosine"
--logging_steps 1
--report_to "visualdl"
--tensor_parallel_degree=${tensor_parallel_degree}
--sharding_parallel_degree=${sharding_parallel_degree}
--pipeline_parallel_degree=1
--sep_parallel_degree=1
--sharding="stage1"
--amp_master_grad=1
--hybrid_parallel_topo_order="sharding_first"
2>&1 | tee -a "${OUTPUT_DIR}/training_log.txt"

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions