|
|
| |
|
|
|
|
|
This project involves analysis of the mechanism by which small molecules bind to proteins. In spite of the fact that multiple three dimensional pictures have been obtained of protein-ligand pairings, the pace of rational drug design has been hindered by a lack of global coherent rules underlying small molecule-protein interactions. By studying the pattern of virally-presented combinatorial peptides binding to common metabolites such as ATP and glucose, and correlating those sequences with three dimensional structures of known metabolite/protein pairs, we aim to create a database of peptide sequences which are predictive for metabolite binding in known protein sequences. Information derived from this work can eventually be extended to combinatorial chemistry-derived drugs to predict potential targets within the human body prior to clinical trials.
Our ability to rapidly sequence prokaryotic genomes is generating an unprecedented amount of DNA sequence data. These explosively growing databases, however, are fueling an increasing demand for novel high throughput, sensitive and reliable analytical methods for the identification and characterization of genes, and/or their products, based on function [see recent review, Nature Insight]. Gene identification is providing us with a new paradigm for not only understanding the inner workings of cells at their most fundamental level, but also acquiring the ability to predict a cell’s response to both external and internal stimuli. The ability to quickly (and economically) identify proteins that are capable of binding particular chemicals would greatly accelerate both our utilization of, and defense against, these organisms. It is just these non-housekeeping ORFs, unrelated to previously characterized gene products, that are the critical elements in mining biological species for use by scientists in nominally unrelated areas as medicine, agriculture, energy production, environmental clean-up and biological nonproliferation. Mapping the specialized pathways in a human pathogen can lead to the development of a target-specific small molecule inhibitor. Similarly, knowing the specific pathways utilized by a toxin-consuming microbe can greatly accelerate the development of methods to break down the same toxin in a hazardous waste site. The development of novel genomics tools for the classification of proteins based upon their binding to small molecule ligands is essential to achieving the goals of the microbial genome project at DOE. The primary utility of a protein/small molecule linkage map resides in its ability to suggest new functions for proteins and thereby provide guidance for further experimentation. A technique that can selectively identify proteins that will bind environmental toxins, for instance, would be a major step towards addressing DOE mission goals. The experiments here are designed to establish a novel tool with that capability.
|
|
Rasmol view of amino acid residues (as predicted by phage displayed peptides) contacting guanosine-5'-monophosphate
|
Of 4,405 predicted E. coli genes, only around 2,200 or 50%
have been characterized to any extent (Univ. of Wisconsin E.coli Genome Center,
http://www.genetics.wisc.edu/). Within the EcoCyc database, which characterizes
the known network of E. coli small molecule metabolism, it has been shown that
out of 744 reactions catalyzed by 607 enzymes, 100 enzymes are multifunctional,
68 of the reactions are catalyzed by greater than one enzyme, with the network
containing a mind-boggling total of 791 chemical substrates [Ouzounis and Karp,
2000]. In theory, at the current
rate of about 10 identifications per month, in the absence of any significant
rate acceleration, it will take almost 20 years to annotate E. coli alone
[Thomas, 1999]. Whole genome
comparisons have allowed for a general functional characterization of
housekeeping genes, leaving the more specialized and/or unusual proteins
functionless. An automated method for specific ligand corroboration would not
only confirm these predictions much faster, but distinguish binding patterns for
structurally related ligands (for example CTP vs. dCTP as well as structurally
related proteins). Large-scale
mapping of protein/protein interactions is currently undergoing massive
development and optimization [see for example Uetz, P. et al, 2000].
An analogous evolution in the study of protein/small molecule
interactions must take place, given the ubiquitous and essential role small
molecules play in biological processes. The
rate of evolution of electronic and laboratory approaches to functional genomics
is not keeping pace with the rate of sequence generation [Rastan & Beeley,
1997]. In spite of the large number
of functional genomics tools currently available, typically about 40% of
predicted ORFs remain unidentified in terms of function, even after the
application of sequence similarity comparisons, genomic context analysis,
profile comparisons across multiple genomes, and structural genomics methods.
Characterization of the type(s) of small molecules to which an ORFan binds gives
a vital clue as to its function within the cell.
Depending upon the type of ligand, not only can a protein of unknown
function be placed within a particular pathway, but it might be identified as a
cross-over protein tying together two previously unconnected pathways. What
kinds of proteins are these remaining unidentified ORFs?
Many of these so-called ORFans closely resemble predicted gene
products of unknown function in other bacteria. These
results illustrate the critical need for developing new tools for functional
analyses.
This project seeks to apply a novel approach to
genome-wide identification of small molecule binding proteins.
Preliminary results demonstrate that the similarity between the
sequence of a protein and the sequences of phage-displayed peptides
affinity-selected against small molecules can be predictive for protein
binding to that small molecule ligand. This analysis not only accurately
predicted the position of the taxol binding site on tubulin [Rodi et al.,
2001], but also led to the successful identification of human Bcl-2 as a
taxol-binding protein [Rodi et al., 1999]. Affinity-selected peptides
provide information analogous to that of a consensus-binding sequence, and
can be used in an analogous fashion to identify ligand binding sites.
Can we use this method as a global approach to predicting which
proteins bind ATP and to identify the position of ATP binding sites?
To answer this question we have selected a combination of protein
structural analyses and phage-display technology to search for additional
ATP-binding motifs by comparing the random peptide libraries for affinity to
2 sets of data: The E.
Coli K12 genome, and ATP binding proteins of the PDB.
Analysis of these data will present the reliability of our approaches
in both the identification of ATP binding proteins and in identifying
binding sites within ATP proteins.
M13: A single-stranded filamentous DNA bacteriophage
|
|
Phage display technology involves the expression of exogenous proteins, peptides or peptide libraries on the surface of bacteriophage. The design of random peptide phage display libraries typically involves the in-frame insertion of a short random peptide into the M13 minor coat protein, g3p, the functional activity of which remains intact. As a result, the peptide is displayed on the surface of the bacteriophage and the phage remains infectious for E. coli.
|
It has long been known that certain peptide consensus
sequences are predictive for small molecule binding. Novel protein sequences are
routinely surveyed for the presence of these sequences.
Unfortunately, consensus binding sequences have only been identified for
a few small molecule ligands, limiting this potentially powerful method.
Further, since peptides are capable of binding to any given small
molecule in many different ways, no single consensus sequence is likely to
identify all the proteins that bind to a particular ligand.
Rather, the full peptide-binding potential of a particular small molecule
is likely to include several, if not dozens, of peptide motifs.
Realizing the potential utility of a method that could characterize a
significant proportion of the peptides that bind a particular ligand, we chose
to explore the use of phage-displayed peptide libraries as a source of this type
of information. The phage display
system of combinatorial peptide libraries has been extensively exploited to
design and create novel proteins as well as identify protein/protein
interactions. These studies have been made possible by the relative ease of
screening large numbers of library members, separation of binding phage, and
amplification of viral particles. The
sequences of approximately 100 phage-displayed peptides were selected for
affinity to taxol, and their sequences compared to those of all available human
proteins. This analysis not only
accurately predicted the position of the taxol binding site on tubulin [Rodi et
al., 2001], but led to the successful identification of human Bcl-2 as a taxol-binding
protein [Rodi et al., 1999]. These
results demonstrate the validity of this method for the identification of
protein targets of small molecule ligands.
The demonstration that sequences of affinity-selected, phage-displayed
peptides can be predictive for small molecule binding when compared to the
sequences of naturally occuring proteins provides the basis for a novel approach
to the functional analysis of gene products at the whole-genome scale.
|
12mer
library of phage displayed peptides that accurately predicted the
binding site (red) of ATP for Phosphoenolpyruvate
Carboxykinase of E. Coli |
|
Our work has demonstrated that the similarity between the sequences of ligand-selected
peptides and proteins known to bind those ligands provides information about the
regions of the protein chain involved in ligand binding [Rodi et al, 2001].
These results imply that peptides displayed on the surface of
bacteriophage particles exhibit binding properties that resemble the binding
properties of peptides of similar sequence on the surface of natural proteins.
The structural context of peptides displayed on the surface of a
bacteriophage particle does not necessarily possess any particular relationship
to that of similar peptides on natural proteins that exhibit similar binding
properties. These results suggest
that the interaction of a small molecule ligand with a peptide loop is not
universally dependent on structural context and further suggest that the first
step in the recognition of a small molecule by a protein may, in many cases, be
the interaction of the small molecule with a disordered, flexible loop.
This is consistent with a growing body of evidence that disordered
regions of proteins are frequently involved in intermolecular interactions.
This body of work implies that where disordered regions of proteins are
involved in small molecule ligand binding, the structural context will be
relatively less important for determining ligand specificity.
Their importance in ligand binding supports our observations that the
sequences of affinity-selected peptides are relevant for identification of
ligand binding sites in naturally occurring proteins.
Given these observations, the use of libraries of phage displayed
proteins provides the potential for collecting statistically significant numbers
of affinity selected peptides relevant to the identification of binding sites in
naturally occurring peptides.
This project is supported by the Office of Biological and Environmental Research of the Department of Energy
Please visit our
RELIC Database to process and manipulate the experimental data of affinity selected peptides. 15 programs are available to statistically analyze a population of peptides, search for motifs with peptides, predict what parts of the protein are involved in binding, and rank order the proteins against a set of peptides to predict which proteins are most likely to bind that ligand.
| 2005 © Argonne National Laboratory |