Concerns about the HOIQA dataset

Dear Authors,

Thank you very much for providing access to the HOIQA dataset and the accompanying code. We greatly appreciate your efforts and contribution to this research area. Upon examining the dataset and attempting to run evaluations, we have encountered a few discrepancies and would like to request your clarification on the following points.

---

**1. Data Count Discrepancies**

We have found that the number of samples mentioned in the paper does not match the quantities derived from the provided JSON files.

**From the Paper (referenced figures)**

*(Referenced image in the paper)*

![image](https://github.com/user-attachments/assets/a7b8ead8-3210-4b6d-a652-8e57ab2b9423)

**Calculated from the Provided JSON Files**

- **Epic-Kitchens**
    - Total folders (videos): 480
    - Total files (images): 208,584
    
    **Training Data: `epic-kitchens-train.json`**
    
    - Total QA pairs: 899,257
    - Images: 191,679
    
    **Test Data: `epic-test.json`**
    
    - Total QA pairs: 9,668
    - Images: 9,662
    
    **Test Data: `epic-visor-test.json`**
    
    - Total QA pairs: 47,020
    - Images: 7,320
    
    **Image Overlap**
    
    - `epic-kitchens-train.json` & `epic-test.json`: 0
    - `epic-kitchens-train.json` & `epic-visor-test.json`: 72
    - `epic-test.json` & `epic-visor-test.json`: 5
    
    There is a small amount of overlap between `epic-visor-test.json` and the other datasets. However, it remains unclear which Epic-Kitchens dataset should be used for evaluation, given the discrepancies and lack of explicit instructions.
    
- **Ego4D**
    - Total folders (videos): 920
    - Total files (images): 270,005
    
    **Training Data: `ego4d-train.json`**
    
    - Total QA pairs: 798,663
    - Images: 164,226
    
    **Test Data: `ego4d-test.json`**
    
    - Total QA pairs: 308,814
    - Images: 105,779
    
    No image overlap was found between these files.
    

In both datasets, the number of samples we obtained is lower than the values stated in the paper. Furthermore, for Epic-Kitchens, it is not clear which dataset was intended for evaluation.

---

**2. Missing Evaluation Data for Epic-Kitchens**

While we were able to run the evaluation code for Ego4D without major issues, there appear to be some problems with the Epic-Kitchens test data, which may require adjustments.

**JSON File Structures**

- `ego4d-test.json` provides `[bbox]` and `[noun]` fields.
    
    ```json
    {"a0705b91-51b7-489d-8b7d-09282f85db6e_9267f964-d5a1-4062-872f-657e2a4cbe81_0_90023-frame_0000090023-lhs": 
    	{"image_path": "Ego4D/v2/frames/9267f964-d5a1-4062-872f-657e2a4cbe81/frame_0000090023.jpg",
        "bbox": 
          [187.02,
          2.62,
          320.48,
          64.1],
        "noun": "left hand"},
     ...
     }
    ```
    
- `epic-test.json` lacks `[bbox]` values, causing errors with the default evaluation code.
    
    ```json
    {"P01_11_0_57": {
        "frame_path": "EPIC_Kitchens/frames/P01_11/frame_0000000057.jpg",
        "verb": "take",
        "noun": "plate",
        "narration": "take plate"},
      ...
    }
    ```
    
- `epic-visor-test.json` contains `[bbox]` values, and we initially considered extracting these values to match the structure of `epic-test.json`.
    
    ```json
    [
      {"question": "[refer] Where is the left hand of the person?",
        "answer": [
          864,
          939,
          1132,
          1080
        ],
        "image_path": "EPIC_Kitchens/frames/P02_09/frame_0000108904.jpg",
        "id": "P02_09_frame_0000108904-001"},
      ...
    ]
    ```
    

However, upon re-examining the overlap, we found only 5 common images between `epic-test.json` and `epic-visor-test.json`, suggesting that the vast majority of the data is different. If we choose `epic-test.json` for evaluation, the required `[bbox]` coordinates are absent, and since `epic-visor-test.json` primarily contains different data, it does not serve as a direct substitute. On the other hand, if we opt to use `epic-visor-test.json` for evaluation, we need to extract data with “[refer]” questions and then derive `[bbox]` and `[noun]` fields from the associated question and answer pairs.

---

**Request for Clarification**

In light of these observations, we kindly request your guidance on the following:

1. Which Epic-Kitchens dataset was originally intended for evaluation, and how should it be handled given the discrepancies in data counts and overlaps?
2. How should we handle the lower-than-expected data counts for both the Epic-Kitchens and Ego4D datasets to ensure proper reproducibility and fairness in evaluation?
3. How can we properly perform the evaluation, especially for Epic-Kitchens, given that `epic-test.json` lacks `[bbox]` values and the limited overlap with `epic-visor-test.json`?

Any advice or additional instructions you could provide would be greatly appreciated. Thank you for your time and consideration.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Concerns about the HOIQA dataset #7

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Concerns about the HOIQA dataset #7

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions