Prediction of a protein's three-dimensional structure from primary sequence is the central problem in structural bioinformatics. The most commonly used methods of determining a protein's tertiary structure are X-ray crystallography and NMR spectroscopy, and while these methods are useful in the study of soluble globular proteins, it is much more difficult to apply these methods to membrane proteins. Membrane proteins are embedded in cell membranes, and affect how ions are transported into and out of the cells. Of the more than 30 000 structures that have been deposited to the Protein Data Bank (PDB), only about 300 are membrane proteins, and so finding ways to predict the structures of these proteins is of great importance in understanding their biological function.
Dr Zheng Yuan of the Institute for Molecular Bioscience (IMB) at UQ is part of Dr Rohan Teasdale's research group seeking methods of predicting the structures of these proteins. One method of interest is the support vector regression (SVR) approach. In order to gauge the suitability of this approach they have applied it to predicting three separate characteristics of proteins: accessible surface area (ASA), contact number and B-factor.
The ASA of a protein is the surface area of the protein that is accessible to a solvent. Using two different datasets (the first containing 28 alpha-helix proteins, the second containing 14 beta-barrel proteins), the group devised an SVR algorithm to estimate the relationship between protein sequence and relative ASA of residues located within transmembrane domains. The transmembrane domains of alpha-helix and beta-barrel proteins have different structures and lengths, so the group has developed two different prediction methods, depending on the type of protein. These methods, along with the prediction method for soluble proteins, have been made available to the public as ASAP (Protein Solvent Accessible Surface Area Predictor). The predicted and observed ASAs for the residues of the membrane protein Halorhodopsin are shown in Figure 1.
|
Figure 1: A plot showing
predicted and observed ASAs (shown in red and
blue respectively) |
The contact number of a residue in a folded protein is a measure of its exposure to the local environment, and is defined as the number of beta-carbon atoms in other residues within a sphere around the beta-carbon atom of the residue of interest. Contact number is partly conserved between protein folds, thus it is useful in protein fold and structure prediction. Dr Yuan and his colleagues have used an SVR algorithm to predict contact number from protein sequence with greater accuracy than previous methods, measuring correlation coefficients between predicted and observed contact numbers of more than 0.70.
The B-factors of protein crystal structures reflect the fluctuation of atoms about their average positions and provide important information about protein dynamics. Dr Teasdale's group has developed a method of predicting B-factor profiles from protein sequences alone, with most prediction accuracies greater than 70%.
The huge datasets of proteins used for these calculations has necessitated the use of the QCIF-funded HPC facility at UQ. Dr Yuan has used programs such as SVMlight and run C and Python code on the gust, storm and cyclone supercomputers to develop and test these prediction techniques, often running simulations for up to a month at a time.
Contacts
Dr Zheng Yuan, Dr
Rohan Teasdale and colleagues
Institute for Molecular Bioscience
(IMB), University of Queensland
Publications
Zheng Yuan, 'Better prediction of protein contact number using a support vector regression analysis of amino acid sequence', BMC Bioinformatics 2005, 6:248.
Zheng Yuan et. al., 'Predicting the Solvent Accessibility of Transmembrane Residues from Protein Sequence', Journal of Proteome Research 2006, 5:1063-1070.
Yuan Z., Bailey T.L., Teasdale R.D., 'Prediction of protein B-factor profiles'. Proteins 2005, 58:905-912.
Written by T. Curtis, August 2006
