Skip to content

Reproducing the results for BoN #44

@HossamAmer12

Description

@HossamAmer12

I tried running your code with the default setup given. I get the following results:

Model=LLama-3.2-1B
Reward model=RLHF LLama model
search_batch_size=25
n=4
4xV100 32GB GPUs
seed=42
search=BoN

n acc_naive acc_weighted acc_maj
0 1 24.6 24.6 24.6
1 2 31.4 31.4 24.6
2 4 33.4 35.0 31.0

They have the same increasing trend for TTC.
But why are they slightly different than the ones you report for BoN? Does the seed have an impact on the results?

I see this comment in your docs: # Repeat for seeds 0-4 - Why do you suggest so?
Looking at the results you report, I see that the scores noticeably change from one seed to another. Is there a good seed that we should choose?

Note: I did this change to get the model to run on my V100:
I believe that should not cause a huge difference.

llm = LLM(
        model=config.model_path,
        gpu_memory_utilization=config.gpu_memory_utilization,
        enable_prefix_caching=True,
        seed=config.seed,
        tensor_parallel_size=num_gpus,
        # V100 change
        dtype = "float"
    )

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions