|
Figure
1: Nuclear pore complex, through which nuclear proteins pass between
the cytoplasm and the nucleus of a cell. |
Nuclear proteins are those localised to the nucleus of a cell. Not all nuclear proteins are exclusively localised to the nucleus, however; many nuclear proteins spend their life shuttling between the cytoplasm and the nucleus. They are imported into the nucleus in a folded state via large channels in the nuclear membrane called nuclear pore complexes (see figure 1), and are allowed to pass through upon recognition by the cell of its nuclear localisation signal (or NLS), an amino acid sequence which acts like a tag on the exposed surface of the protein. The enormous diversity of targeting signals makes the nuclear localisation of proteins an extremely complicated process. Furthermore, most nuclear localisation predictors - that is, programs designed to predict the localisation of a protein based on its primary amino acid sequence - use only exclusively nuclear localised proteins in their training sets, thus all shuttling proteins are excluded from the development of the model.
John Hawkins and Dr Mikael Bodén of the University of Queensland have created Nucleo, a protein sequence prediction service designed to determine the nuclear transport of proteins in eukaryotic cells. Nucleo accepts amino acid sequences, presented in the FASTA format, and determines the probability of nuclear import. In order to ensure the program was trained on predicting nuclear localisation, not exclusive nuclear localisation, the set of nuclear proteins used as a training set included dual localised proteins. A set of all known proteins with a known single localisation to other organelles, excluding the endoplasmic reticulum, was used as a negative training set. While there is still some work to be done on Nucleo, it is the only prediction service trained on the full task of predicting nuclear matrix proteins imported via the nuclear pore complex.
As well as Nucleo, John and Mikael have built other predictors. In a soon to be published paper, John and Mikael describe the development of a predictor known as PProwler, for determining the localisation of a given protein based on its amino acid sequence, as well as identifying peroxisomal proteins. Peroxisomes are the organelles of a cell which carry out fat metabolism. Lab biologists ran this predictor over the whole mouse genome, and were able to identify some potential candidates which weren't known before. Thus in this way predictors can be used to guide experimentation.
Due to the nature of this work, John often works in collaboration with experimental biologists - at UQ he's worked closely with Rohan Teasdale's group in the IMB. These collaborations provide John with a very efficient feedback cycle - the biologists provide him with data, he then builds the predictors, they are able to use the predictors and carry out experiments based on these results.
On a large scale his project aims to build up a computational model of processes in the human body - using the predictors and models he builds to get a sense of the functions of each component of the whole human proteome.
Most of the code is written in Java, the majority of this by John and Mikael. The predictors are built using machine learning technique known as support vector machines.
Software
John and Mikael have so far produced two predictors:
- Nucleo - Nuclear Protein Localisation Predictor; and
- PProwler - Protein Prowler Subcellular Localisation Predictor.
Contact
John
Hawkins, Mikael
Bodén
Division of
Complex and Intelligent Systems Research (ITEE), University
of Queensland
Publications
Hawkins, J., Davis, L. and Bodén, M. "Predicting Nuclear Proteins". Bioinformatics. Submitted.
Bodén, M. and Hawkins, J., "Evolving discriminative motifs for recognizing proteins imported to the peroxisome via the PTS2 pathway". CEC 2006. (Winner: Best Paper in Session).
Hawkins, J. and Bodén, M., "Detecting and Sorting Targeting Peptides with Recurrent Networks and Support Vector Machines". Journal of Bioinformatics and Computational Biology, 4(1), 2006.
Written by T. Curtis, November 2006.
