I am Michael Firman, and I work UCL in the Vision and Graphics group. I am currently working as a postdoc on the Engage project, making machine learning tools accessible to scientists across different disciplines.
My PhD has been supervised by Dr. Simon Julier and Dr. Jan Boehm. During my PhD I predominantly worked on the problem of inferring a full volumetric reconstruction of a scene, given only a single depth image as input. This problem has many applications in robotics, computer graphics and entertainment devices.
During the summer of 2012 I worked at the National Institute of Informatics, Tokyo, under the supervision of Prof. Akihiro Sugimoto.
There are many great lists of computer vision datasets on the web but no dedicated source for datasets captured by Kinect or similar devices. I created this list in an attempt to remedy the situation.
CVPR Workshop on Large Scale 3D Data: Acquisition, Modelling and Analysis 2016
Abstract: Since the launch of the Microsoft Kinect, scores of RGBD datasets have been released. These have propelled advances in areas from reconstruction to gesture recognition. In this paper we explore the field, reviewing datasets across eight categories: semantics, object pose estimation, camera tracking, scene reconstruction, object tracking, human actions, faces and identification. By extracting relevant information in each category we help researchers to find appropriate data for their needs, and we consider which datasets have succeeded in driving computer vision forward and why.
Finally, we examine the future of RGBD datasets. We identify key areas which are currently underexplored, and suggest that future directions may include synthetic data and dense reconstructions of static and dynamic scenes.
Computer Vision and Pattern Recognition (CVPR) 2016 (Oral)
Abstract: Building a complete 3D model of a scene, given only a single depth image, is underconstrained. To gain a full volumetric model, one needs either multiple views, or a single view together with a library of unambiguous 3D models that will fit the shape of each individual object in the scene.
We hypothesize that objects of dissimilar semantic classes often share similar 3D shape components, enabling a limited dataset to model the shape of a wide range of objects, and hence estimate their hidden geometry. Exploring this hypothesis, we propose an algorithm that can complete the unobserved geometry of tabletop-sized objects, based on a supervised model trained on already available volumetric elements. Our model maps from a local observation in a single depth image to an estimate of the surface shape in the surrounding neighborhood. We validate our approach both qualitatively and quantitatively on a range of indoor object collections and challenging real scenes.
International Conference on Intelligent Robots and Systems (IROS) 2013
Abstract: We introduce a method to discover objects from RGB-D image collections which does not require a user to specify the number of objects expected to be found. We propose a probabilistic formulation to find pairwise similarity between image segments, using a classifier trained on labelled pairs from the recently released RGB-D Object Dataset. We then use a correlation clustering solver to both find the optimal clustering of all the segments in the collection and to recover the number of clusters. Unlike traditional supervised learning methods, our training data need not be of the same class or category as the objects we expect to discover. We show that this parameter-free supervised clustering method has superior performance to traditional clustering methods.
Further information: This work was begun during an internship at NII, Tokyo in the summer of 2012, and was partially supported by an NII internship grant. MATLAB code for feature learning and the clustering of segmented objects will be available from the links to the left shortly. Please contact me at m.firman (at) cs.ucl.ac.uk if you have any questions.
International Conference on Intelligent Robots and Systems (IROS) 2011
Abstract: Recent work in the domain of classification of point clouds has shown that topic models can be suitable tools for inferring class groupings in an unsupervised manner. However, point clouds are frequently subject to non-negligible amounts of sensor noise. In this paper, we analyze the effect on classification accuracy of noise added to both an artificial data set and data collected from a Light Detection and Ranging (LiDAR) scanner, and show that topic models are less robust to 'misspelled' words than the more naive k‑means classifier. Furthermore, standard spin images prove to be a more robust feature under noise than their derivative, 'angular' spin images. We additionally show that only a small subset of local features are required in order to give comparable classification accuracy to a full feature set.
A digital copy of Dr. Simon Prince's book 'Computer Vision: Models, Learning, and Inference', which forms a core of the syllabus, can be downloaded from www.computervisionmodels.com
I have an interest in the lyrical and musical content of music, and it seems sensible to try to use a computer to automate some of the process of discovering themes and trends in music.
As an experiment in Python, I wrote a program to analyse the lyrics of 33,000 songs from the US 40, from 1955 to the present day. I used Brian Langenberger's gdbm-based rhyming dictionary to automatically detect rhyme pairs in each song. This allowed the most popular pairs of rhyming words to be discovered, and changing trends in rhymes over time to be analysed.
A wordcloud showing the most popular rhymes in the whole corpus can be downloaded from the sidebar to the left.
As far as I am aware, this is the first time that rhyme analysis has been performed in this way, and on such a large scale. I presented a preliminary version of this work at the Comparative Innovations Workshop at King's College London, in May 2013. When I have the time I will publish a version of this work here, with explanations of the methodology used and more results.