In this work, we propose a composite deep convolutional neural network architecture that learns to predict both the semantic category and motion status of each pixel from a pair of consecutive monocular images. The composition of our SMSnet architecture can be deconstructed into three components: a section that learns motion features from generated optical flow maps, a parallel section that generates features for semantic segmentation, and a fusion section that combines both the motion and semantic features and further learns deep representations for pixel-wise semantic motion segmentation.
Please find the detailed description of the architecture in our IROS 2017 paper.
To facilitate training of neural networks for semantic motion segmentation and to allow for credible quantitative evaluation, we make the following datasets publicly available. Each of these datasets have pixel-wise semantic labels for 10 object classes and their motion status (static or moving). Annotations are provided for the following classes: sky, building, road, sidewalk, cyclist, vegetation, pole, car, sign and pedestrian.
Please cite our work if you use the Cityscapes-Motion Dataset or the KITTI-Motion Dataset and report results based on it.
@InProceedings{Valada_2017_IROS,
author = {Johan Vertens and Abhinav Valada and Wolfram Burgard},
title = {SMSnet: Semantic Motion Segmentation using Deep Convolutional Neural Networks},
booktitle = {Proc.~of the IEEE Int.~Conf.~on Intelligent Robots and Systems (IROS)},
year = 2017,
url = {http://ais.informatik.uni-freiburg.de/publications/papers/valada17iros.pdf},
address = {Vancouver, Canada}
}
The data is provided for non-commercial use only. By downloading the data, you accept the license agreement which can be downloaded here. If you report results based on the Cityscapes-Motion or the KITTI-Motion datasets, please consider citing the first paper mentioned under publications.
The Cityscapes-Motion dataset is a suppliment to the semantic annotations provided by the Cityscapes dataset, containing 2975 training images and 500 validation images. We provide manually annotated motion labels for the category of cars. The images are of resolution 2048×1024 pixels.
Johan Vertens,
Abhinav Valada,
Wolfram Burgard
SMSnet: Semantic Motion Segmentation
using Deep Convolutional Neural Networks
Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, Vancouver, Canada, 2017.