Diane J. Rodi - Principal Investigator

Project Members:

Suneeta Mandava 
Satish Devarapalli
Lee Makowski 

 

 

This project involves analysis of the mechanism by which small molecules bind to proteins. In spite of the fact that multiple three dimensional pictures have been obtained of protein-ligand pairings, the pace of rational drug design has been hindered by a lack of global coherent rules underlying small molecule-protein interactions. By studying the pattern of virally-presented combinatorial peptides binding to common metabolites such as ATP and glucose, and correlating those sequences with three dimensional structures of known metabolite/protein pairs, we aim to create a database of peptide sequences which are predictive for metabolite binding in known protein sequences. Information derived from this work can eventually be extended to combinatorial chemistry-derived drugs to predict potential targets within the human body prior to clinical trials.

Our ability to rapidly sequence prokaryotic genomes is generating an unprecedented amount of DNA sequence data.  These explosively growing databases, however, are fueling an increasing demand for novel high throughput, sensitive and reliable analytical methods for the identification and characterization of genes, and/or their products, based on function [see recent review, Nature Insight].  Gene identification is providing us with a new paradigm for not only understanding the inner workings of cells at their most fundamental level, but also acquiring the ability to predict a cell’s response to both external and internal stimuli.  The ability to quickly (and economically) identify proteins that are capable of binding particular chemicals would greatly accelerate both our utilization of, and defense against, these organisms.  It is just these non-housekeeping ORFs, unrelated to previously characterized gene products, that are the critical elements in mining biological species for use by scientists in nominally unrelated areas as medicine, agriculture, energy production, environmental clean-up and biological nonproliferation.  Mapping the specialized pathways in a human pathogen can lead to the development of a target-specific small molecule inhibitor.  Similarly, knowing the specific pathways utilized by a toxin-consuming microbe can greatly accelerate the development of methods to break down the same toxin in a hazardous waste site.  The development of novel genomics tools for the classification of proteins based upon their binding to small molecule ligands is essential to achieving the goals of the microbial genome project at DOE.  The primary utility of a protein/small molecule linkage map resides in its ability to suggest new functions for proteins and thereby provide guidance for further experimentation. A technique that can selectively identify proteins that will bind environmental toxins, for instance, would be a major step towards addressing DOE mission goals.  The experiments here are designed to establish a novel tool with that capability.

 

Rasmol view of amino acid residues (as predicted by phage displayed peptides) contacting guanosine-5'-monophosphate

 

Of 4,405 predicted E. coli genes, only around 2,200 or 50% have been characterized to any extent (Univ. of Wisconsin E.coli Genome Center, http://www.genetics.wisc.edu/). Within the EcoCyc database, which characterizes the known network of E. coli small molecule metabolism, it has been shown that out of 744 reactions catalyzed by 607 enzymes, 100 enzymes are multifunctional, 68 of the reactions are catalyzed by greater than one enzyme, with the network containing a mind-boggling total of 791 chemical substrates [Ouzounis and Karp, 2000].  In theory, at the current rate of about 10 identifications per month, in the absence of any significant rate acceleration, it will take almost 20 years to annotate E. coli alone [Thomas, 1999].  Whole genome comparisons have allowed for a general functional characterization of housekeeping genes, leaving the more specialized and/or unusual proteins functionless. An automated method for specific ligand corroboration would not only confirm these predictions much faster, but distinguish binding patterns for structurally related ligands (for example CTP vs. dCTP as well as structurally related proteins).  Large-scale mapping of protein/protein interactions is currently undergoing massive development and optimization [see for example Uetz, P. et al, 2000].  An analogous evolution in the study of protein/small molecule interactions must take place, given the ubiquitous and essential role small molecules play in biological processes.  The rate of evolution of electronic and laboratory approaches to functional genomics is not keeping pace with the rate of sequence generation [Rastan & Beeley, 1997].  In spite of the large number of functional genomics tools currently available, typically about 40% of predicted ORFs remain unidentified in terms of function, even after the application of sequence similarity comparisons, genomic context analysis, profile comparisons across multiple genomes, and structural genomics methods. Characterization of the type(s) of small molecules to which an ORFan binds gives a vital clue as to its function within the cell.  Depending upon the type of ligand, not only can a protein of unknown function be placed within a particular pathway, but it might be identified as a cross-over protein tying together two previously unconnected pathways. What kinds of proteins are these remaining unidentified ORFs?   Many of these so-called ORFans closely resemble predicted gene products of unknown function in other bacteria.  These results illustrate the critical need for developing new tools for functional analyses.  

This project seeks to apply a novel approach to genome-wide identification of small molecule binding proteins.  Preliminary results demonstrate that the similarity between the sequence of a protein and the sequences of phage-displayed peptides affinity-selected against small molecules can be predictive for protein binding to that small molecule ligand. This analysis not only accurately predicted the position of the taxol binding site on tubulin [Rodi et al., 2001], but also led to the successful identification of human Bcl-2 as a taxol-binding protein [Rodi et al., 1999]. Affinity-selected peptides provide information analogous to that of a consensus-binding sequence, and can be used in an analogous fashion to identify ligand binding sites.  Can we use this method as a global approach to predicting which proteins bind ATP and to identify the position of ATP binding sites?  To answer this question we have selected a combination of protein structural analyses and phage-display technology to search for additional ATP-binding motifs by comparing the random peptide libraries for affinity to 2 sets of data:   The E. Coli K12 genome, and ATP binding proteins of the PDB.  Analysis of these data will present the reliability of our approaches in both the identification of ATP binding proteins and in identifying binding sites within ATP proteins.

M13: A single-stranded filamentous DNA bacteriophage

Phage display technology involves the expression of exogenous proteins, peptides or peptide libraries on the surface of bacteriophage. The design of random peptide phage display libraries typically involves the in-frame insertion of a short random peptide into the M13 minor coat protein, g3p, the functional activity of which remains intact. As a result, the peptide is displayed on the surface of the bacteriophage and the phage remains infectious for E. coli. 

It has long been known that certain peptide consensus sequences are predictive for small molecule binding. Novel protein sequences are routinely surveyed for the presence of these sequences.  Unfortunately, consensus binding sequences have only been identified for a few small molecule ligands, limiting this potentially powerful method.  Further, since peptides are capable of binding to any given small molecule in many different ways, no single consensus sequence is likely to identify all the proteins that bind to a particular ligand.  Rather, the full peptide-binding potential of a particular small molecule is likely to include several, if not dozens, of peptide motifs.  Realizing the potential utility of a method that could characterize a significant proportion of the peptides that bind a particular ligand, we chose to explore the use of phage-displayed peptide libraries as a source of this type of information.  The phage display system of combinatorial peptide libraries has been extensively exploited to design and create novel proteins as well as identify protein/protein interactions. These studies have been made possible by the relative ease of screening large numbers of library members, separation of binding phage, and amplification of viral particles.  The sequences of approximately 100 phage-displayed peptides were selected for affinity to taxol, and their sequences compared to those of all available human proteins.  This analysis not only accurately predicted the position of the taxol binding site on tubulin [Rodi et al., 2001], but led to the successful identification of human Bcl-2 as a taxol-binding protein [Rodi et al., 1999].  These results demonstrate the validity of this method for the identification of protein targets of small molecule ligands.  The demonstration that sequences of affinity-selected, phage-displayed peptides can be predictive for small molecule binding when compared to the sequences of naturally occuring proteins provides the basis for a novel approach to the functional analysis of gene products at the whole-genome scale.  

 

12mer library of phage displayed peptides that accurately predicted the binding site (red) of ATP for Phosphoenolpyruvate Carboxykinase of E. Coli  

tfvtsstdtrrs idmqktnlahgp qttlnsdfprtr lrvppllsvnpr sssfittlsgpr tptalstdsiwi llpgsyttlsgr

elglsktrlspw ntmpnypvsksa ynlsvdptgpsq ssqadipttfss hlwmgataqttw tvstqvvepsws vhtskttgarlp



Our work has demonstrated that the similarity between the sequences of ligand-selected peptides and proteins known to bind those ligands provides information about the regions of the protein chain involved in ligand binding [Rodi et al, 2001].  These results imply that peptides displayed on the surface of bacteriophage particles exhibit binding properties that resemble the binding properties of peptides of similar sequence on the surface of natural proteins.  The structural context of peptides displayed on the surface of a bacteriophage particle does not necessarily possess any particular relationship to that of similar peptides on natural proteins that exhibit similar binding properties.  These results suggest that the interaction of a small molecule ligand with a peptide loop is not universally dependent on structural context and further suggest that the first step in the recognition of a small molecule by a protein may, in many cases, be the interaction of the small molecule with a disordered, flexible loop.  This is consistent with a growing body of evidence that disordered regions of proteins are frequently involved in intermolecular interactions.  This body of work implies that where disordered regions of proteins are involved in small molecule ligand binding, the structural context will be relatively less important for determining ligand specificity.  Their importance in ligand binding supports our observations that the sequences of affinity-selected peptides are relevant for identification of ligand binding sites in naturally occurring proteins.  Given these observations, the use of libraries of phage displayed proteins provides the potential for collecting statistically significant numbers of affinity selected peptides relevant to the identification of binding sites in naturally occurring peptides.  

This project is supported by the Office of Biological and Environmental Research of the Department of Energy

Please visit our RELIC Database to process and manipulate the experimental data of affinity selected peptides. 15 programs are available to statistically analyze a population of peptides, search for motifs with peptides, predict what parts of the protein are involved in binding, and rank order the proteins against a set of peptides to predict which proteins are most likely to bind that ligand.

 

2005 © Argonne National Laboratory