My Research

The Devil is in the Decoder

Zbigniew Wojna, Vittorio Ferrari, Sergio Guadarrama, Nathan Silberman, Liang-Chieh Chen, Alireza Fathi, Jasper Uijlings

Spotlight at British Machine Vision Conference (BMVC) 2017

Many machine vision applications require predictions for every pixel of the input image (for example semantic segmentation, boundary detection). Models for such problems usually consist of encoders which decreases spatial resolution while learning a high-dimensional representation, followed by decoders who recover the original input resolution and result in low-dimensional predictions. While encoders have been studied rigorously, relatively few studies address the decoder side. Therefore this paper presents an extensive comparison of a variety of decoders for a variety of pixel-wise prediction tasks. Our contributions are: (1) Decoders matter: we observe significant variance in results between different types of decoders on various problems. (2) We introduce a novel decoder: bilinear additive upsampling. (3) We introduce new residual-like connections for decoders. (4) We identify two decoder types which give a consistently high performance.

Attention-based Extraction of Structured Information from Street View Imagery

Zbigniew Wojna, Alex Gorban, Dar-Shyang Lee, Kevin Murphy, Qian Yu, Yeqing Li, Julian Ibarz

Published at International Conference on Document Analysis and Recognition (ICDAR) 2017 Source code

Featured in news: Google Blogpost , Android Headlines , Android Headlines , Venture Beat , O’Reilly

We present a neural network model - based on CNNs, RNNs and a novel attention mechanism - which achieves 84.2% accuracy on the challenging French Street Name Signs (FSNS) dataset, significantly outperforming the previous state of the art (Smith'16), which achieved 72.46%. Furthermore, our new method is much simpler and more general than the previous approach. To demonstrate the generality of our model, we show that it also performs well on an even more challenging dataset derived from Google Street View, in which the goal is to extract business names from store fronts. Finally, we study the speed/accuracy tradeoff that results from using CNN feature extractors of different depths. Surprisingly, we find that deeper is not always better (in terms of accuracy, as well as speed). Our resulting model is simple, accurate and fast, allowing it to be used at scale on a variety of challenging real-world text extraction problems.

9th place in the world in The Digital Mammography DREAM Challenge

Stephen Morrell, Robert Kemp, Can Khoo, Karl Trygve Kalleberg, Gerard Cardoso, Zbigniew Wojna

Post competition write-up

The deep convnet used for the image classifier was a modified Inception_Resnet_V2 link. We used transfer learning and pre-training on an external dataset of approximately 25,000 cancerous and an equal number of non-cancerous mammograms (OPTIMAM). A random forest was used for processing metadata in sub challenge 2.

Semantic Instance Segmentation via Deep Metric Learning

Alireza Fathi, Zbigniew Wojna, Vivek Rathod, Peng Wang, Hyun Oh Song, Sergio Guadarrama, Kevin P Murphy

We propose a new method for semantic instance segmentation, by first computing how likely two pixels are to belong to the same object, and then by grouping similar pixels together. Our similarity metric is based on a deep, fully convolutional embedding model. Our grouping method is based on selecting all points that are sufficiently similar to a set of "seed points", chosen from a deep, fully convolutional scoring model. We show competitive results on the Pascal VOC instance segmentation benchmark.

Speed/accuracy trade-offs for modern convolutional object detectors

Jonathan Huang, Vivek Rathod, Chen Sun, Menglong Zhu, Anoop Korattikara, Alireza Fathi, Ian Fischer, Zbigniew Wojna, Yang Song, Sergio Guadarrama, Kevin Murphy

Published at Computer Vision and Pattern Recognition (CVPR) 2017
Source code

Winner submission of 2017 MS COCO object detection challenge.

The goal of this paper is to serve as a guide for selecting a detection architecture that achieves the right speed/memory/accuracy balance for a given application and platform. To this end, we investigate various ways to trade accuracy for speed and memory usage in modern convolutional object detection systems. A number of successful systems have been proposed in recent years, but apples-to-apples comparisons are difficult due to different base feature extractors (e.g., VGG, Residual Networks), different default image resolutions, as well as different hardware and software platforms. We present a unified implementation of the Faster R-CNN [Ren et al., 2015], R-FCN [Dai et al., 2016] and SSD [Liu et al., 2015] systems, which we view as "meta-architectures" and trace out the speed/accuracy trade-off curve created by using alternative feature extractors and varying other critical parameters such as image size within each of these meta-architectures. On one extreme end of this spectrum where speed and memory are critical, we present a detector that achieves real time speeds and can be deployed on a mobile device. On the opposite end in which accuracy is critical, we present a detector that achieves state-of-the-art performance measured on the COCO detection task.

Rethinking the Inception Architecture for Computer Vision

Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, Zbigniew Wojna

Published at Computer Vision and Pattern Recognition (CVPR) 2016
Source code

First single model to beat humans on ImageNet benchmark.

Convolutional networks are at the core of most state-of-the-art computer vision solutions for a wide variety of tasks. Since 2014 very deep convolutional networks started to become mainstream, yielding substantial gains in various benchmarks. Although increased model size and computational cost tend to translate to immediate quality gains for most tasks (as long as enough labeled data is provided for training), computational efficiency and low parameter count are still enabling factors for various use cases such as mobile vision and big-data scenarios. Here we explore ways to scale up networks in ways that aim at utilizing the added computation as efficiently as possible by suitably factorized convolutions and aggressive regularization. We benchmark our methods on the ILSVRC 2012 classification challenge validation set demonstrate substantial gains over the state of the art: 21.2% top-1 and 5.6% top-5 error for single frame evaluation using a network with a computational cost of 5 billion multiply-adds per inference and with using less than 25 million parameters. With an ensemble of 4 models and multi-crop evaluation, we report 3.5% top-5 error on the validation set (3.6% error on the test set) and 17.3% top-1 error on the validation set.

Fast methods in training deep neural networks for image recognition

Under supervision of Prof John Shawe-Taylor, University College London, United Kingdom

Master of Research Thesis in Machine Learning

This thesis investigates the recent findings in the deep learning area. They form an introduction to PhD studies and review of the literature concerning the best performing architectures. The main limitation of neural machine vision is training time. The experiments are designed to explore models that converge quickly relating transfer learning. It achieves state-of-the-art results in Salient Object Subitizing problem due to better architecture in less than one hour of training on mid-class GPU. There is also conducted experiment replicating the fastest learning network on the ILSVRC2014 dataset - the biggest image recognition challenge.

Portfolio Selection System and Conveyor

Under supervision of Dr Michal Galas, Warsaw University (Poland) in collaboration with UK PhD Centre in Financial Computing (UK)

Research project in algorithmic trading as Master's Thesis in Computer Science

The goal of the project is to develop a scalable algorithmic trading platform and portfolio builder based on large groups of related securities, that allows for real-time testing in Java. I build a messaging system that can transfer with high-throughput market data to the other components. Take advantage of open-source Apache Kafka based on server Apache Zookeeper that enables highly reliable distributed coordination. Previously worked on portfolio selection engine and computations management. The portfolio consists of the securities that create a nonstationary linear system due to cointegration coefficients. The research is chiefly concerned about finding similarities within a huge number of dense time series.

Framework supporting development and backtesting of trading systems for financial markets

Under supervision of Dr Janusz Jablonowski, Warsaw University, Poland

Bachelor's Thesis in Computer Science

The goal of the project was to create the framework supporting developing, backtesting with historic market data and deploying automatic transactional systems for Warsaw Stock Exchange. As a result, there was created a Java package library, that simplifies analyzing the behavior of market data with complex stock indexes and automated trading on Warsaw Stock Exchange using user's implemented strategies. The framework allows for downloading, storing and managing historic data from the stock exchange. Moreover, it provides simple investing strategies that can be connected directly to a brokerage account. The target audiences are stock market players and market analyzers, who want to create and test advanced systems.

Comparison of decision systems and statistical methods used for time series prediction

Under supervision of Dr hab. Dominik Slezak, Warsaw University, Poland

Bachelor's Thesis in Mathematics

I performed the study of time series prediction methods in comparison to decision systems methods. The goal of my thesis was to disprove that simple mathematical models are more effective than complex decision systems in forecasting. Tests were executed based on data from Warsaw Stock Exchange employing R framework and Weka. The work is also an introduction to building stock exchange mathematical models.

PIKA - Portfolio of Actuarial and Catastrophic Instruments - pricing and simulation

Under supervision of Dr Kamil Kulesza, Centre for Industrial Applications of Mathematics and Systems Engineering (CIAMSE) of Polish Academy of Sciences, Poland

Winter School 2010: Industrial Mathematics

The target of the project was to examine and improve the statistical model of natural disaster occurrences. Pricing of derivatives for catastrophes was developed by theoretical methods, as well as using the Monte Carlo simulation.

Zbigniew Wojna

Deep Learning research for computer vision applications

Tensorflight

University College London

About Me

Contact Details