Supplementary material to the paper "Moving Object Reconstruction in Monocular Video Data Using Boundary Generation"
We present a method to reconstruct the three-dimensional shape of a moving instance of a known object category in video data. We exploit state-of-the-art semantic segmentation techniques to extract the object's two-dimensional shape in each frame. Therefore, our method is robust to occlusion, handles stationary objects and extends naturally to multiple video sequences. We apply Structure from Motion (SfM) to previously generated object images in order to compute a three-dimensional representation of the object. Our approach allows us to remove outliers in SfM reconstructions and to compute clean object meshes by leveraging previously computed semantic segmentations and virtual camera positions. We evaluate the accuracy of our method using a multi-view dataset of a moving vehicle. A laser scan serves as ground truth. We applied our algorithm on publicly available video data and on 25 sequences from our dataset. The algorithm achieves an average point distance of 3.3 cm evaluated on seven trajectories contained in the dataset.
The dataset consists of 25 videos capturing a car moving on eight different trajectories. The video sequences cover a high variety of viewing angles. We created a laser scan of the vehicle present in the dataset serving as ground truth. The laser scans were acquired using a Zoller+Fröhlich TLS scanner. The distance to the reflecting object is estimated based on the phase shift between received and emitted signal. The scanning head was operated on a rigid tripod which results in ranging accuracies of a few millimeters. The laser scans have been automatically registered. Noise in the measurement was manually removed.
Laser Scan (Ground Truth):
Example Reconstruction Results Using Boundary Generation
Sparse Reconstruction Result:
Boundary Generation Result:
Boundary Mesh Overlayed with Input Video:
S. Bullinger, C. Bodensteiner, S. Wuttke and M. Arens, "Moving Object Reconstruction in Monocular Video Data Using Boundary Generation", ICPR 2016.