Queensland Cyber Infrastructure Foundation Homepage Search  
Queensland Cyber Infrastructure Foundation
Home  |  News & Articles  |  Industry |  Research  |  QCIF Users  |  Education  |  About Us  |  Contacts  

 Orthology and Paralogy in Eukaryotes

Paralogs are homologous genes derived by duplication events, while orthologous genes are derived by speciation from a common ancestor.  The detection of orthologous relationships is an important step for mapping the evolutionary events that have shaped life on earth - the determination of how closely related certain species sharing a similar gene are may give us a better understanding of how long ago the species diverged.

Similar to the studies of his colleague Dr Robert Beiko, Dr Simon Wong of the Institute for Molecular Biosciences (IMB) at UQ is carrying out research that will allow him to map the orthology and paralogy of proteins of eukaryotic genomes, in work that may one day be extended to DNA.

The phylogenetic analysis of a protein usually begins with the identification of similarly sequences proteins.  It is generally assumed that  proteins with similar sequences are derived from a common ancestral protein, thus similarly sequenced proteins are likely to be homologous.  This identification is usually accomplished using software such as BLAST (Basic Local Alignment Search Tool), which compares protein sequences taken from a collection of all proteins from several genomes of interest.  However, Dr Wong is hoping to make use of a technique known as SSEARCH, which is a much more rigorous method, utilising what is known as the Smith-Waterman algorithm.  While it is many times slower than BLAST, it is possible to get a large performance gain from parallelisation, making the technique practical on multiprocessor machines.

Once these homologous proteins have been identified, it is necessary to determine which components of specific parts of proteins are related - eukaryote proteins have multiple domains, and so different sections of protein sequences may have different evolutionary histories. The result of this is sets of proteins which, although they may not be related over their entire sequence, may have parts of the sequences that have evolved from a common ancestor.  This would involve using a large scale approach - breaking down the proteins into individual amino acids and trying to cluster them to homologous single residues.

One idea Dr Wong hopes to implement is that of representing these residues as vertices in a graph.  The edges of the graph would represent homology, linking proteins that are evolutionarily related.  The graph would consist of around 30 million vertices, while the edge set would consist of about 3 or 4 billion edges, so building this graph is an enormously complex task.  This will be done using a clustering technique running parallelised code on APAC supercomputers at ANU in Canberra.

Once all related sequences have been found, it is then possible to build the phylogenetic trees, using programs such as MrBayes.  These phylogenetic trees, once built, give the evolutionary history of the sequences, and allows the detection of orthology and paralogy relationships.  These trees are are then compared with the species tree in a process known as reconciliation.  This process is also very computationally expensive, and requires the use of supercomputers such as those provided by QCIF and APAC. 

This research is still in a very early stage.  Dr Wong is currently working on the algorithm to build the homology graph - a recent test run involving 500 nodes took about a day to complete, so the process still requires much refinement.

See also: Lateral Genetic Transfers in Prokaryotes

 

Contacts

Dr Simon Wong, Professor Mark Ragan
Institute for Molecular Biosciences, University of Queensland

Written by T. Curtis, September 2006