Description
Is there an existing issue for this?
- I have searched the existing issues
Operating System
Create training data: KeyError #739 - User encountered a KeyError during create_training_dataset() when using the "Crop and Label Data" feature. While not identical, it suggests DLC’s data generation step can misalign scorers or headers under certain conditions.
KeyError Keypoint Labeling #2999 - User encountered a KeyError during keypoint labeling, likely due to mismatched config or header structure — again, similar in spirit.
Image.SC Forum Thread - User reported his multi-animal project ran into KeyError issues when trying to access individuals in a pandas MultiIndex. The traceback is eerily close to mine, and the root cause was a mismatch between expected and actual column levels.
DeepLabCut version
2.3.11
What engine are you using?
tensorflow
DeepLabCut mode
single animal
Device type
Intel Core Ultra 7 265KF
Bug description 🐛
Description: DeepLabCut throws a KeyError when calling create_training_dataset() due to a scorer mismatch, even though the scorer exists at Level 0 of the .h5 file.
Setup:
DLC version: 2.3.11
OS: Ubuntu 24.04.2 LTS
Python: 3.10
TensorFlow: Built with oneDNN (CPU fallback)
Single-animal project
Manually converted annotations (CVAT → DLC format)
Steps To Reproduce
import deeplabcut
deeplabcut.create_training_dataset("config.yaml")
Relevant log output
This leads to:
KeyError: 'name20250628'
Despite:
# Manual check
import pandas as pd
df = pd.read_hdf("...CollectedData_name20250628.h5")
print(df.columns.levels[0])
# Output: Index(['name20250628'], dtype='object', name='scorer')
Also confirmed:
cfg = deeplabcut.auxiliaryfunctions.read_config("config.yaml")
print(cfg["scorer"]) # 'name20250628'
Anything else?
Additional Context:
.h5 is regenerated by DLC via fallback to merge_annotateddatasets()
File appears to save correctly, but create_training_dataset() fails immediately after
Debug statements confirm scorer string matches in both config and DataFrame
Suspect DLC is prematurely trying to access the DataFrame before it's finalized or cached incorrectly
Code of Conduct
- I agree to follow this project's Code of Conduct