UCL DEPARTMENT OF COMPUTER SCIENCE
UCL Bioinformatics Group Logo

UCL logo

Research fields of interest:

    My research activities focus on structural bioinformatics and on sequence analysis. Past and present specific areas of interest include:
     
  • Prediction of protein-DNA interactions
  • The identification and characterization of DNA binding proteins and of their specific target sequences is an important goal of post-genome biology. Recent strategies for determining protein-DNA binding specificities combine experimental high throughput data with bioinformatics algorithms providing encouraging results. Structure-based approaches are particularly promising, as they can predict previously undetected binding sites and open the road to the rational design of novel regulatory molecules. We aim at expanding current knowledge of protein-DNA binding modes through structural bioinformatics methods for predicting the structure and specificity of DNA binding proteins, focusing in particular on natively unfolded protein regions.
     
  • Model quality assessment
  • The quality of a protein structure model dictates its correct usage. The availability of experimental structural data for target proteins allows to effectively test and compare protein structure prediction methods by standard procedures that we applied to the CASP8 template-based models. However, everyday practice requires a priori estimates of model quality for successful applications to biomedical problems. During the CASP7 and CASP8 experiments I was involved in the assessment of the predictions submitted to the Quality Assessment category. In CASP10 I was a member of the the assessment team that evaluated the models in the Structure Refinement category.
     
  • Analysis of protein-protein interaction data
  • Protein interactions can be mediated by specific motifs, surface patches or domains. The co-occurrence of such features in proteins binding to the same partner can hint at mutually exclusive interactions and, therefore, help inferring their involvement in common biological processes. The identification of similar linear motifs in proteins interacting with the same partner allows for more reliable functional annotations. The challenge is to extend this approach to the search for discontinuous motifs brought together by protein three-dimensional structure.
     
  • Analysis of sequence alignment accuracy
  • The reliability of homology-based models heavily depends on the quality of the sequence alignment between the target and the template. In turn, this parameter is affected by the number of sequences included in the alignment and by their similarity distribution. Intuitively increasing the number of aligned sequences yields more accurate results, but we showed in collaboration with the group of Alfonso Valencia that metagenomic sequences represent a relevant counterexample to this rule of thumb. We also developed a method to estimate the difficulty, and by extension accuracy, of the pair-wise alignment between two sequences given a multiple alignment including them and other similar ones.
     
  • Prediction of domain boundaries from protein sequence
  • Analysis of complete genome sequences has highlighted the abundance of protein chains that fold into two or more domains � structural regions that usually look compact and independent and that can perform specialized biochemical functions. When only sequence data are available, singling out protein domains has important ramifications for both experimental and computational studies. For instance, a complete experimental structure or a correct computational model for such proteins is harder to obtain, if compared to single domain chains. I contributed to the development of the DPS predictor, which infers domain boundaries from patterns of PSI-BLAST alignments occurring in sequence similarity searches.
     
  • Access to manually curated protein models
  • Structural models resulting from synergistic computational and experimental efforts are not easy to retrieve. They are rarely publicly available and, in general, only accessible via direct interaction with the authors. To make matters worse, the PDB has discontinued theoretical model depositions since mid October 2006. I am part of the team that established and maintains the PMDB database for manually curated models and their supporting evidence.