Multi-Frame Subspace Flow (MFSF)

The Multi-Frame Subspace Flow (MFSF) algorithm estimates multi-frame optical flow of an image sequence. This is equivalent to estimating dense 2D tracks over the sequence or dense registration of each frame of the sequence to a reference frame. This algorithm has been introduced in the following publications, where more details can be found:

Ravi Garg, Anastasios Roussos, Lourdes Agapito, "A Variational Approach to Video Registration with Subspace Constraints", International journal of computer vision 104 (3), 286-314, 2013

Ravi Garg, Anastasios Roussos, Lourdes Agapito, "Robust Trajectory-Space TV-L1 Optical Flow for Non-rigid Sequences", Proc. 8th Int. Conf. on Energy Minimization Methods in Computer Vision and Pattern Recognition (EMMCVPR), Saint Petersburg, Russia, July 25-27, 2011.

In this webpage, we make the code publicly available, share the ground truth data for a benchmark dataset that we created, and demonstrate some experimental results and comparisons.

To start with, here is a demo video of the Multi-Frame Subspace Flow algorithm:

Code (New!)

The code that implements the MFSF algorithm is in MATLAB, using CUDA kernels and requiring an NVIDIA GPU card. We have made the code available via Bitbucket. You can download it from here:
[Download code]

The corresponding Bitbucket repository is the following:
[Bitbucket repository]

Ground Truth Data

The following zip file contains all the ground truth data as well as a readme file and demo scripts. MATLAB is needed to read the data and run the demo scripts:
[Download the data]

A short description of this benchmark data follows. The data consists of ground truth 3D mesh and Optical Flow data of a waving flag. We use sparse motion capture (MOCAP) data from the work of R. White et al. :

White, R., Crane, K., Forsyth, D., "Capturing and animating occluded cloth", In ACM Trans. on Graphics, 2007.

We have used the MOCAP data that capture the real deformations of a waving flag in 3D. We interpolated this sparse data to have a continuous dense 3D surface using the motion capture markers as the control points for smooth Spline interpolation. Here is a video that shows this procedure:

Video: MOCAP data and interpolated 3D surface.

This dense 3D surface is then projected synthetically onto the image plane using an orthographic camera. We use texture mapping to associate some texture to the surface while rendering 60 images of size 500x500 pixels. The advantage of this new sequence is that, since it is based on MOCAP data, it captures the complex natural deformations of a real non-rigid object while allowing us to have access to dense ground truth optical flow. We have also used three degraded versions of the original rendered sequence by adding (a) synthetic occlusions generated by superimposing some black circles of radius 20 pixels moving in linear orbits, (b) gaussian noise, of standard deviation 0.2 relative to the range of image intensities and (c) salt & pepper noise of density 10%. The following video shows the original rendered video as well as the rendered video with synthetic occlusions:

Also, these are the versions of the rendered video with added gaussian noise and salt & pepper noise:

Videos with Results and Comparisons

We have evaluated our method and compared its performance with state of the art optical flow and image registration algorithms:

LDOF (Large Displacement Optical Flow) [Brox, T., Malik, J.: Large displacement optical flow: Descriptor matching in variational motion estimation, TPAMI (2010)], one of the best performing current optical flow algorithms, that can deal with large displacements by integrating rich feature descriptors into a variational optic flow approach to compute dense flow. In order to apply this method in a sequence, we register each frame independently with the reference frame.
ITV-L1 (Improved Total Variation - L1) algorithm [Wedel, A., Pock, T., Zach, C., Bischof, H., Cremers, D.: An improved algorithm for TV-L1 optical flow. In: Statistical and Geometrical Approaches to Visual Motion Analysis (2009)], which uses a duality based TV-L1 algorithm. Our method can be seen as its extension to the case of multi-frame non-rigid optical flow via robust trajectory subspace constraints. For this method also, we register each frame independently with the reference frame.
Pizarro et al. state of the art keypoint-based nonrigid registration algorithm [Pizarro, D., Bartoli, A.: Feature-based deformable surface detection with self-occlusion reasoning. In: International Symposium on 3D Data Processing, Visualization and Transmission, 3DPVT (2010)].
Garg et al. method [Garg, R., Pizarro, L., Rueckert, D., Agapito, L.: Dense multi-frame optic flow for non-rigid objects using subspace constraints. In: ACCV (2010)], a previous method of our group that also uses subspace constraints.

Videos with results on several different sequences follow.

Benchmark Sequence

Video: comparison of the performance of the different algorithms on the original flag sequence. In the case of our algorithm we use a PCA basis of rank R = 75 and a full rank DCT basis R = 120.

Video: similar comparison as above but in the presence of synthetic occlusions.

Actor Sequence

This challenging sequence is a 39 frame long clip from a well known film, acquired at 25 frames per second with images of size 500 × 550 pixels. We apply our algorithm on both the grayscale (top row) and colour versions of this sequence. Our algorithm with subspace constraints outperforms the algorithm without imposing subspace constraints. Using colour information in the data term also leads to better results.

Video: Each frame of the sequence is warped back to the reference frame using different optical flow results. Improvements can clearly be seen when using our approach with subspace constraints . The colour version of our algorithm further improves the optic flow.

Video: Improvement in sparse tracking (augmented with the images sequence) by using subspace constraint.

Bending paper sequences

The first input sequence is particularly challenging because of its length (100 frames) and large rotation of the camera:

Video: Comparisons on a 100 frames paper bending sequence.

The second input sequence is widely used in the structure from motion literature and contains 71 frames:

Video: Comparison on a 71 frames paper bending sequence.

Application to Augmented Reality

Video: Augmented texture using the optical flow evaluated from our method on the actress sequence.

Video: Augmented texture using the optical flow evaluated from our method on the paper bending sequence.

Page created by Anastasios Roussos, Ravi Garg and Lourdes Agapito.
For comments or questions, please contact the authors.
Page last modified: May 2015