Skip to content

Commit ccf5f2b

Browse files
Refacored the harness
1 parent 70b7aae commit ccf5f2b

24 files changed

+3552
-343
lines changed

harness/Client/loadgen_client.py

Lines changed: 256 additions & 34 deletions
Large diffs are not rendered by default.

harness/DATASET_CONFIG_DESIGN.md

Lines changed: 136 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,136 @@
1+
# Dataset Configuration Design
2+
3+
## Overview
4+
5+
This design provides a generic, configuration-driven approach to handling different datasets and models in MLPerf Inference harnesses. The system uses YAML configuration files to define dataset field mappings and model-specific settings, allowing the same harness code to work with different datasets and scenarios.
6+
7+
## Key Components
8+
9+
### 1. Dataset Configuration System (`harness/data/dataset_config.py`)
10+
11+
- **DatasetConfigLoader**: Loads YAML configuration files for datasets
12+
- **DatasetConfig**: Data class containing dataset field mappings
13+
- **ModelDatasetConfig**: Model-specific dataset configuration
14+
15+
### 2. Enhanced Dataset Processor (`harness/data/dataset_processor.py`)
16+
17+
- Automatically loads dataset configuration from YAML files
18+
- Uses field mappings from config (input_column, input_ids_column, output_column)
19+
- Falls back to defaults if config not available
20+
21+
### 3. Base Harness (`harness/harness/base_harness.py`)
22+
23+
- Works for both Offline and Server scenarios
24+
- Automatically loads dataset configuration
25+
- Provides hooks for model-specific customizations:
26+
- `_pre_run_setup()`: Pre-run initialization
27+
- `_post_run_processing()`: Post-run processing
28+
- `_cleanup_custom()`: Custom cleanup
29+
30+
### 4. Model-Specific Harnesses
31+
32+
- **DeepSeek R1** (`language/deepseek-r1/harness_deepseek_r1.py`): Extends BaseHarness
33+
- **Llama 3.1 8B** (`harness/harness_llama3.1_8b.py`): Extends BaseHarness
34+
35+
## Configuration Files
36+
37+
Configuration files are stored in `harness/data/configs/`:
38+
39+
- `llama3.1-8b.yaml`: Llama 3.1 8B dataset configuration
40+
- `deepseek-r1.yaml`: DeepSeek R1 dataset configuration
41+
42+
### Configuration Structure
43+
44+
```yaml
45+
name: dataset-name
46+
description: "Description"
47+
48+
fields:
49+
input_column: "text_input"
50+
input_ids_column: "tok_input"
51+
output_column: "ref_output"
52+
input_lens_column: null # Optional
53+
54+
file_format: "auto"
55+
total_sample_count: 4388
56+
57+
model_specific:
58+
default_model_name: "model-name"
59+
```
60+
61+
## Usage Examples
62+
63+
### Using BaseHarness Directly
64+
65+
```python
66+
from harness.base_harness import BaseHarness
67+
68+
harness = BaseHarness(
69+
model_name="deepseek-ai/DeepSeek-R1-0528",
70+
dataset_path="./dataset.pkl",
71+
dataset_name="deepseek-r1",
72+
scenario="Offline", # or "Server"
73+
test_mode="performance"
74+
)
75+
76+
results = harness.run()
77+
```
78+
79+
### Creating Model-Specific Harness
80+
81+
```python
82+
from harness.base_harness import BaseHarness
83+
84+
class MyModelHarness(BaseHarness):
85+
def __init__(self, **kwargs):
86+
if 'dataset_name' not in kwargs:
87+
kwargs['dataset_name'] = 'my-dataset'
88+
super().__init__(**kwargs)
89+
90+
def _pre_run_setup(self):
91+
# Model-specific setup
92+
pass
93+
```
94+
95+
## Benefits
96+
97+
1. **Code Reusability**: Same harness code works for different datasets
98+
2. **Easy Configuration**: Add new datasets by creating YAML files
99+
3. **Scenario Agnostic**: Works for both Offline and Server scenarios
100+
4. **Extensible**: Model-specific customizations via subclass hooks
101+
5. **Maintainable**: All dataset info centralized in config files
102+
103+
## Adding New Datasets
104+
105+
1. Create YAML config file in `harness/data/configs/`
106+
2. Define field mappings
107+
3. Use in harness with `dataset_name` parameter
108+
109+
No code changes needed!
110+
111+
## Design Principles
112+
113+
- **Configuration over Code**: Dataset-specific info in YAML, not code
114+
- **Inheritance Hierarchy**: BaseHarness → ModelHarness (if needed)
115+
- **Backward Compatible**: Falls back to defaults if config not available
116+
- **Extensible**: Hooks for model-specific behavior
117+
118+
## File Structure
119+
120+
```
121+
harness/
122+
├── data/
123+
│ ├── dataset_config.py # Configuration loader
124+
│ ├── dataset_processor.py # Enhanced processor
125+
│ └── configs/
126+
│ ├── llama3.1-8b.yaml # Llama config
127+
│ └── deepseek-r1.yaml # DeepSeek config
128+
├── harness/
129+
│ └── base_harness.py # Base harness (with dataset config support)
130+
└── harness_llama3.1_8b.py # Extends BaseHarness
131+
132+
language/
133+
└── deepseek-r1/
134+
└── harness_deepseek_r1.py # New DeepSeek harness
135+
```
136+

0 commit comments

Comments
 (0)