Reproducing the results for BoN

I tried running your code with the default setup given. I get the following results:

Model=LLama-3.2-1B
Reward model=RLHF LLama model
search_batch_size=25
n=4
4xV100 32GB GPUs
seed=42
search=BoN

>    n  acc_naive  acc_weighted  acc_maj
> 0  1       24.6          24.6     24.6
> 1  2       31.4          31.4     24.6
> 2  4       33.4          35.0     31.0
> 

They have the same increasing trend for TTC. 
But why are they slightly different than the ones you report for BoN? Does the seed have an impact on the results?

I see this comment in your docs: `# Repeat for seeds 0-4` - Why do you suggest so?
Looking at the results you report, I see that the scores noticeably change from one seed to another. Is there a good seed that we should choose?

Note: I did this change to get the model to run on my V100:
I believe that should not cause a huge difference.

```
llm = LLM(
        model=config.model_path,
        gpu_memory_utilization=config.gpu_memory_utilization,
        enable_prefix_caching=True,
        seed=config.seed,
        tensor_parallel_size=num_gpus,
        # V100 change
        dtype = "float"
    )
```



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reproducing the results for BoN #44

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Reproducing the results for BoN #44

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions