Bryson Research Group
Computational Systems Biology

UCL Identity

Undergraduate Project Suggestions 2017/2018

Our lab works in the area of bioinformatics or computational biology using Python and R languages.

We use Python for machine learning (scikit-learn and the Keras deep-learning framework), visualization (matplotlib) and also for general data processing such as handling biological sequence and structure data (BioPython).

We use R for the analysis of high-throughput ‘-omics’ data such as gene expression data using BioConductor.

We generally make new approaches available to the life scientist community by implementing them as web applications using HTML/CSS/JavaScript, Bootstrap, Django and MySQL.

These projects, except for the last project, are most suitable if you intend to study the COMPM058 Bioinformatics module during Term 2 or already have done a bioinformatics module in your 3rd year (International Programme students for example). For the machine learning projects, ideally you would be studying a module in machine learning (3rd year COMP3058 Artificial Intelligence and Neural Computing, or one of the 4th year Machine Learning modules).

Project 1: Optimization of an unsupervised biclustering method for cancer data classification (Co-supervised by Prof. Gyorgy Szabadkai and Dr Robert Bentham in Cell and Developmental Biology at UCL)

 

Skills developed: bioinformatics, algorithm design, R and C programming.

Currently the medical and bioscience community are creating very large –omics datasets for diseases such as cancer (some examples in cancer research include the Cancer Cell Line Encyclopaedia (CCLE) dataset and The Cancer Genome Atlas (TCGA) dataset). One of the challenges with these huge datasets is to “mine” information and patterns that may be useful for disease diagnosis or prognosis. We have recently published a biclustering (unsupervised learning) method called MCbiclust that can detect patterns within these large genomic datasets (Nucleic Acids Research, Volume 45, Issue 15, 6 September 2017, Pages 8712–8730).

The method has been made available as a BioConductor / R package and this first project involves developing a new version which is more computationally efficient since the current method is quite CPU intensive, often involving the use of the Legion supercomputer for analysis.

So the aims of this project area are:

  1. Understand the algorithm itself and how it works.
  2. Devise an effective benchmarking procedure which allows the algorithm to be tested for accuracy and also determine its computational complexity as different aspects of the problem grow in size.
  3. Optimize the algorithm, possibly including heuristics for the most intensive components, with the aim of both increasing the accuracy and its speed for typical problems.
  4. Optimize the implementation, possibly implementing key stages of the R-code in the C language.
  5. Experimentally test the newly developed version against the original and potentially publish it as a new version of the BioConductor package if the improvements have been substantial.

 

Project 2: Making an unsupervised biclustering method available to the general bioscience community as a web application (Co-supervised by Prof. Gyorgy Szabadkai and Dr Robert Bentham in Cell and Developmental Biology at UCL)

 

Skills developed: bioinformatics; web application design using Django and Javascript; Python and R programming.

This second project also involves the further development of the MCbiclust method that is outlined in the description of Project 1.

Currently the MCbiclust method is only accessible to users that can develop scripts in the R programming language. This greatly reduces the number of bioscience researches that can employ this method. The key aim of this project is to make this new biclustering method available as a web application, thus making it available to the full bioscience community.

Some of the key aims of the project are:

  1. Understand the algorithm itself and how it works.
  2. Determine an approach to allow some users to compute “shared biclustering profiles” (the CPU intensive part) while other users can employ these shared bicluster profiles to analyse their own data.
  3. Development of an interactive graphical Django web application that integrates Javascript and R modules. (The method itself involves quite complex inter-related stages that potentially generate interactive graphical output.)

Project 3: Development of protein disorder prediction server using machine learning and Django

 

Skills developed: bioinformatics; Python programming; machine learning; web application design using Django and Javascript.

Supervised machine learning is extensively applied the field of bioinformatics for all sorts of prediction tasks: predicting structural aspects of proteins from only their sequence, predicting protein function from sequence, predicting cancer subtypes from gene expression data.

This project focusses one particularly structural aspect of proteins called intrinsic disorder. Normally proteins form one stable “native” fold which gives the protein its function. Intrinsically disordered regions of proteins do not do this – but jump between many different conformations. This allows certain proteins to have some unusual functions such as “entropic springs” that are employed within muscle and spider silk. Intrinsic disorder has also been implicated in a number of diseases.

The DisProt database has information about protein disorder where the different types of disorder have been characterized in terms of an Ontology. They key idea for this project is not only to predict disordered regions within proteins, but also to predict the ontology terms for these disordered regions, so essentially a multi-class prediction problem.

The project would involve the complete development of a bioinformatics web application: from processing of raw DisProt and PDB structural data; to the application of scikit-learn for machine learning; to the development of a Django web application so that medical and bioscience users can employ your newly developed method.

 

Project 4: Designing an interactive PCR Primer Workbench web application for the life science community

 

Skills developed: bioinformatics; Python programming; web application design using Django and Javascript, some biology.

Polymerase Chain Reaction (PCR) revolutionized molecular biology and won Kary Mullis the 1993 Novel Prize in Chemistry. It allows minute quantities of specific types of DNA to be amplified and analysed. It has a wide range of applications from DNA fingerprinting for forensics to helping in the sequencing of the human genome. In collaboration with the Royal Free hospital, we have published a new approach to help identify different types of infectious organisms using PCR primers that have been computationally determined using a machine learning decision tree approach (J. Clin. Microbiol. July 2012 vol. 50 no. 7 2419-2427.).

This project intends to make the development of these techniques much more widely available to the experimental and clinical community by developing an interactive web application that allows the optimal design of PCR primers that satisfy complex criteria across different phylogenetic species.

 

Project 5: Developing an animal monitoring system using Imaginations Creative Ci40 IoT framework

 

Skills developed: experience with embedded hardware; networking; CNN neural networks for image recognition; Python and C programming.

The idea behind this project is to develop an Internet-of-Things application on a MIPS-based hardware kit (Creative Ci40 IoT platform by Imaginations Technology).

In particular the aim would be to develop distributed sensors (MikroBUS cameras, PIR sensors) on PIC32 microcontroller “Clicker” boards that communicate using 6LowPAN to a central Ci40 hub board. The central Ci40 hub would orchestrate how the sensors are employed to detect images given movement of, say, different types of animals in the wild. The Ci40 card would then employ convolutional neural networks (CNNs) to do image recognition and communicate up-to-date count information of different animals to a central web application via a long-distance low-power LoRa network.

Such a system could be used for animal monitoring in a remote location where battery power can only be used and mobile networking does not exist.