Yiming Li* , Ziang Cao* , Andrew Liang, Benjamin Liang, Luoyao Chen, Hang Zhao, Chen Feng
Egocentric 3D action target prediction is a very challenging task. We propose a simple baseline method which uses two backbone networks separately for multimodality representation learning, followed by utilizing concatenation to achieve multimodality feature fusion, and we employ a recurrent neural network (RNN) to achieve continuous update for the 3D action target.
This code has been tested on Ubuntu 20.04, Python 3.7.0, Pytorch 1.9.0, CUDA 11.2.
Please install related libraries before running this code. The detailed information is included in ./requirement.txt
.
Download the datasets from here and put it into ./benchmark/
.
Dataset/
├──annotrain/ # The annotation of train set
├── bathroomCabinet/ # The name of different scenes
├── bathroomCabinet_1.txt/ # The groundtruth of each clip
├── bathroomCabinet_2.txt/
├── bathroomCabinet_3.txt/
└── bathroomCabinet_4.txt/
├── bathroomCounter/
├── ...
└── nightstand/
├──annonoveltest_final/ # The annotation of test set (unseen scenes)
└── ...
├──annotest_final/ # The annotation of test set (seen scenes)
├──annovalidate_final/ # The annotation of validation set
├──sequences/ # The pointcloud and imu data of each scene
├── bathroomCabinet/
├── bathroomCabinet_1/
├── pointcloud/ # The pointcloud files
├── transformation/ # The odometry files
└── data.txt/ # The imu data of bathroomCabinet_1
.
.
└── bathroomCabinet_6/
.
.
└── woodenTable
To train the predictor model, run train.py
with the desired configs:
python train.py
Download the pre-trained model here and set the checkpoints directory.
python test.py \
--model_name LSTM-based \ # tracker_name
--checkpoint ./experiment/LSTM-based.pth #model_path
--datapath ./data_path #data_path
python validate.py \
--model_name LSTM-based \ # tracker_name
--checkpoint ./experiment/LSTM-based.pth #model_path
--datapath ./data_path #data_path
The testing and validating result will be saved in the ./results/model_name
directory.
python eval.py \
--data_path ./benchmark \ # The path of the dataset
--model_name LSTM # The name of predictor
--vis 0/1 # Whether to view the pointcloud
--visclip 0/1 # Whether to save the results of each clips
The results and visualization results will be saved in the ./results/
directory.
The code of predictor is implemented based on PointConv. We would like to express our sincere thanks to the contributors.
If you find our work useful in your research, please cite:
@InProceedings{Li_2022_CVPR,
title = {Egocentric Prediction of Action Target in 3D},
author = {Li, Yiming and Cao, Ziang and Liang, Andrew and Liang, Benjamin and Chen, Luoyao and Zhao, Hang and Feng, Chen},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2022}
}