To ensure animal welfare and effective management in pig farming, monitoring individual behavior is a crucial prerequisite. While monitoring tasks have traditionally been carried out manually, advances in machine learning have made it possible to collect individualized information in an increasingly automated way. Central to these methods is the localization of animals across space (object detection) and time (multi-object tracking). Despite extensive research of these two tasks in pig farming, a systematic benchmarking study has not yet been conducted. In this work, we address this gap by curating two datasets: PigDetect for object detection and PigTrack for multi-object tracking. The datasets are based on diverse image and video material from realistic barn conditions, and include challenging scenarios such as occlusions or bad visibility. For object detection, we show that challenging training images improve detection performance beyond what is achievable with randomly sampled images alone. Comparing different approaches, we found that state-of-the-art models offer substantial improvements in detection quality over real-time alternatives. For multi-object tracking, we observed that SORT-based methods achieve superior detection performance compared to end-to-end trainable models. However, end-to-end models show better association performance, suggesting they could become strong alternatives in the future. We also investigate characteristic failure cases of end-to-end models, providing guidance for future improvements. The detection and tracking models trained on our datasets perform well in unseen pens, suggesting good generalization capabilities. This highlights the importance of high-quality training data. The datasets and research code are made publicly available to facilitate reproducibility, re-use and further development.
The preprint of the paper is available here. The datasets and pre-trained model weights associated with this work are available here and here. This repository also includes automatic download commands to obtain all necessary files for training and inference. So you do not need to download them manually. Simply follow the instructions in this README.
For a quick demo of the detection and tracking models presented in our work, we prepared a google colab notebook:
If you want to use this repository on your own system, you need to first set up the environment. Before doing so, keep the following important information in mind:
- The setup has been tested on a linux machine. We cannot provide any information for other operating systems.
- Your GPUs and NVIDIA driver must be compatible with the CUDA version specified in our setup (version 11.8). We cannot provide any information for environment setup with other CUDA versions.
- MOTIP and MOTRv2 use custom CUDA operators, which require a suitable GCC compiler to build them (e.g. version 11.4). Furthermore, the compilation relies on several CUDA-related environment variables that might not be automatically set by your system. Therefore, this part of the environment setup is error prone, but it is also not required if you do not plan to use these models.
- The setup script might throw some warnings and potentially also an error that there are some incompatibilities with the "requests" package. This can be ignored.
To set up the environment, we recommend using Conda. If Conda is installed and activated, run either of the following two setup scripts depending on which models you plan to use. If you plan to only use the detection models and SORT-based tracking models, run the following script:
source _setup/setup.sh
If you want to create an environment that supports all models in this repository (including MOTIP and MOTRv2), run the following script instead:
source _setup/setup_with_e2e_models.sh
This repository provides functionality for training, evaluation and inference of pig detection and tracking models. Information on how to use the detection functionality can be found in the detection guide. Information on tracking can be found in the tracking guide. All models in this repository require GPU access for training. While inference might also work on a CPU (we did not test this for all models though), it is much slower than on a GPU. Therefore, we highly recommend using a GPU for inference as well.
This repository is a collection of several independent code bases. Please refer to the LICENSE file within each subdirectory for the specific licensing terms:
detection/– GNU GPL v3tracking/boxmot/– GNU AGPL v3tracking/motip/– Apache-2.0tracking/motrv2/– Apache-2.0
Any code outside those subdirectories is licensed under the MIT license.
This work was funded with NextGenerationEU funds from the European Union by the Federal Ministry of Research, Technology and Space under the funding code 16DKWN038. The responsibility for the content of this publication lies with the authors.
This repository builds on several existing object detection and multi-object tracking code bases:
These code bases are in turn built on code from many previous works including TrackEval, MOTR, Deformable DETR, ByteTrack, YOLOX, OC-SORT, DanceTrack and BDD100K. We thank the authors of all of these projects for making their work publicly available.
