-
Notifications
You must be signed in to change notification settings - Fork 35
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
(spark-rapids-benchmarks) ➜ nds git:(main) ✗ python nds_gen_data.py local 1 2 /data/tpcds/sf=1/updates --update 20 --range 1,2
Warning: This scale factor is valid for QUALIFICATION ONLY
dsdgen Population Generator (Version 3.2.0)
Copyright Transaction Processing Performance Council (TPC) 2001 - 2021
Writing s_customer_address ... Done
Writing s_call_center ... Done
Writing s_catalog_order and s_catalog_order_lineitem ... Done
Writing s_catalog_page ... Done
Writing s_customer ... Done
Writing s_inventory ... Done
Writing s_item ... Done
Writing s_promotion ... Done
Writing s_purchase and s_purchase_lineitem ... Done
Writing s_store ... Done
Writing s_warehouse ... Done
Writing s_web_order and s_web_order_lineitem ... Done
Writing s_web_page ... Done
Writing s_web_site ... Done
Writing s_zip_to_gmt ... Done
ERROR: /data/tpcds/sf=1/updates/delete_20.dat exists. Either remove it or use the FORCE option to overwrite it.
the error is due to the replication of the same delete_n.dat file generated by the native dsdgen (compiled by make in the tpcds-gen folder) with diferent child numbers. A typical repro is like this:
~/spark-rapids-benchmarks/nds/tpcds-gen/target/tools$ ./dsdgen -scale 1 -dir $PWD/sf1 -parallel 2 -child 1 -verbose -update 20
Warning: This scale factor is valid for QUALIFICATION ONLY
dsdgen Population Generator (Version 3.2.0)
Copyright Transaction Processing Performance Council (TPC) 2001 - 2021
Writing s_customer_address ... Done
Writing s_call_center ... Done
Writing s_catalog_order and s_catalog_order_lineitem ... Done
Writing s_catalog_page ... Done
Writing s_customer ... Done
Writing s_inventory ... Done
Writing s_item ... Done
Writing s_promotion ... Done
Writing s_purchase and s_purchase_lineitem ... Done
Writing s_store ... Done
Writing s_warehouse ... Done
Writing s_web_order and s_web_order_lineitem ... Done
Writing s_web_page ... Done
Writing s_web_site ... Done
Writing s_zip_to_gmt ... Done
~/spark-rapids-benchmarks/nds/tpcds-gen/target/tools$ ./dsdgen -scale 1 -dir $PWD/sf1 -parallel 2 -child 2 -verbose -update 20
ERROR: ~/spark-rapids-benchmarks/nds/tpcds-gen/target/tools/sf1/delete_20.dat exists. Either remove it or use the FORCE option to overwrite it.
A simple fix is to detect the update flag, and always honor the overwrite_output as well when update is on.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working