Homology Analysis for Structure, Function, and Malfunction
Fred Stevens
Sr. Biophysicist
Bldg: 202. Room: B229
E-mail: fstevens@anl.gov
Phone: (630) 252-3837
Biographical Sketch
Publications
> Research
A major challenge for structural biology is to maximize the information content in the amino acid sequence of a protein. The information relevant to functional genomics includes function(s), stability, and interaction partners. For two proteins closely linked by evolution, comparison of global amino acid sequences can provide good guidance for structural recognition and, in fewer cases than is often acknowledged, function. However, confident assignment of functional attributes requires knowledge of the limited number of amino acids that impart those functions. This information is usually the product of multiple crystallographic analyses of a protein of known function to reveal interaction with substrates and cofactors or their analogs. To accurately assign function or functions to a protein based on amino acid sequence, it is necessary to first recognize the fold of the protein. This identifies one or more structural homologs of the protein of interest; the structural and functional studies of these proteins can then be mined to achieve accurate annotation. In some cases what will be achieved is accurate knowledge that the function remains unknown or partially known. In other cases, substantive hypotheses suitable for efficient experimental testing will emerge.

Thus, maximizing the information content of protein sequence data is dependent upon increasing our ability to recognize the fold of a protein. Most soluble bacterial proteins appear to have known, or partially known folds. In many of the cases in which we do not recognize a fold, it is probable that the fold has already been characterized in another protein. Although our coverage of the structures of human proteins is less complete, it remains likely that a higher percentage of these proteins can be structurally analyzed at a level sufficient to facilitate generation of hypotheses to guide experimentation. Since most proteins with related structure and function are too evolutionarily dispersed to be considered “significantly similar” by statistically based comparisons, a fundamental need exists to replace “statistical significance” with some form “evolutional significance” in which statistical significance is based not on a global comparison, which is dominated by evolutionary noise, but by statistical significance at the limited number of positions that dominate the structural and functional properties of any protein.
> Tri-Psi BLAST against NR database
- Psi-BLAST iteratively uses sequence data to effectively optimize similarity parameters to recognize homologs.
- The NCBI database of non-redundant protein sequences provides much more “training” data than is present in the database of PDB sequences.
- Three rounds of Psi-BLAST appears to be near optimal for fold recognition while minimizing drift to identification of distant evolutionary remnants.
- Late round Psi-BLAST recognition does not imply functional relationships.
