Skip to content

Bug Report: KeyError for Scorer Despite Matching Header in .h5 #3024

Closed
@clive1154

Description

@clive1154

Is there an existing issue for this?

  • I have searched the existing issues

Operating System

Create training data: KeyError #739 - User encountered a KeyError during create_training_dataset() when using the "Crop and Label Data" feature. While not identical, it suggests DLC’s data generation step can misalign scorers or headers under certain conditions.

KeyError Keypoint Labeling #2999 - User encountered a KeyError during keypoint labeling, likely due to mismatched config or header structure — again, similar in spirit.

Image.SC Forum Thread - User reported his multi-animal project ran into KeyError issues when trying to access individuals in a pandas MultiIndex. The traceback is eerily close to mine, and the root cause was a mismatch between expected and actual column levels.

DeepLabCut version

2.3.11

What engine are you using?

tensorflow

DeepLabCut mode

single animal

Device type

Intel Core Ultra 7 265KF

Bug description 🐛

Description: DeepLabCut throws a KeyError when calling create_training_dataset() due to a scorer mismatch, even though the scorer exists at Level 0 of the .h5 file.

Setup:
DLC version: 2.3.11
OS: Ubuntu 24.04.2 LTS
Python: 3.10
TensorFlow: Built with oneDNN (CPU fallback)
Single-animal project
Manually converted annotations (CVAT → DLC format)

Steps To Reproduce

import deeplabcut
deeplabcut.create_training_dataset("config.yaml")

Relevant log output

This leads to:
KeyError: 'name20250628'

Despite:
# Manual check
import pandas as pd
df = pd.read_hdf("...CollectedData_name20250628.h5")
print(df.columns.levels[0])
# Output: Index(['name20250628'], dtype='object', name='scorer')


Also confirmed:
cfg = deeplabcut.auxiliaryfunctions.read_config("config.yaml")
print(cfg["scorer"])  # 'name20250628'

Anything else?

Additional Context:

.h5 is regenerated by DLC via fallback to merge_annotateddatasets()
File appears to save correctly, but create_training_dataset() fails immediately after
Debug statements confirm scorer string matches in both config and DataFrame
Suspect DLC is prematurely trying to access the DataFrame before it's finalized or cached incorrectly

Code of Conduct

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions