FLD utility modules, such as corpus loader, corpus serializer, and metrics calculators.
See the entry-point repository about the whole FLD project.
We have currently three branches:
NeurIPS_2024branch (2024-12)NLP_2024_KOBE_BEEFbranch (2024-01-24)ICML_2023branch (2023-08-22)
Please read CAREFULLY the instructions in other FLD repositories to determine which branch is required.
pip install -e .
python -c "import nltk; nltk.download('punkt')"Once the raw FLD corpora are created by FLD-generator, we have to prepare prompt-output pairs for LLM training as follows:
python ./scripts/serialize.py \
--train {train_jsonl_path} \
--valid {valid_jsonl_path} \
--test {test_jsonl_path} \
--output-dir {output_dir}This command will output examples with added prompt_serial and proof_serial fields, corresponding to the prompt and output of the LLMs, respectively.
python ./scripts/push_to_hub.py \
--train {serialized_train_jsonl_path} \
--valid {serialized_valid_jsonl_path} \
--test {serialized_test_jsonl_path} \
--repo-id {your_name/dataset_name} \
--config-name default