Identification of Functional Modules in Protein Complexes via Hyperclique Pattern Discovery
Proteins usually do not act isolated in a cell but function within complicated cellular pathways, interacting with other proteins either in pairs or as components of larger complexes. While many protein complexes have been identified by large-scale experimental studies [1] [2], due to a large number of false-positive interactions existing in current protein complexes, it is still difficult to obtain an accurate understanding of functional module, which encompass groups of proteins involved in common elementary biological function. In this project, we present a hyperclique pattern [5] discovery approach for extracting functional modules (hyperclique patterns) from protein complexes. A hyperclique pattern is a type of association pattern containing proteins that are highly affiliated with each other. The analysis of hyperclique patterns using the Gene Ontology suggest that proteins within the same hyperclique pattern more likely perform the same function and participate in the same biological process. More interestingly, the 3-D structural view of proteins within a hyperclique pattern reveals that these proteins physically interact with each other. In addition, we observe that several hyperclique patterns corresponding to different functions can participate in the same protein complex as independent modules; and a hyperclique pattern can be involved in different complexes performing different higher-order biological functions, although the pattern corresponds to a specific elementary biological function. Finally, the results also indicate that our method can facilitate the functional annotation of uncharacterized proteins.
Protein Complex Data and Analysis Tools
Protein Complex Data: Two datasets [1][2]
summarizing large-scale experimental studies of multi-protein complexes are available for the yeast
Saccharomyces Cerevisiae. Coupling different purification (immunoprecipitation and tandem affinity purification (TAP))
and labeling schemes with mass spectrometry (MS), both studies used bait proteins to identify physiologically intact protein complexes. Independent
research [3] [4] showed that the TAP-MS
dataset by Gavin et al. [1] has a relatively better accuracy for
predicting protein functions, therefore we take this dataset to illustrate our method. In this TAP-MS dataset, there are a total of 1,440 distinct
proteins within 232 multi-protein complexes.
Analysis Tools: The Gene Ontology
was used to annotate the proteins of hyperclique patterns identified in the TAP-MS dataset. A graph drawing package
GraphViz was used to produce the graph representation of the annotation. The functional description of each protein
(if available) was obtained from the Saccharomyces Genome Database
(SGD). The 3-D structure information of yeast proteins was obtained from the Protein Data Bank (PDB), and
PyMOL was used for visualizing the 3-D structure of proteins within a hyperclique pattern.
Results
Analysis of the hyperclique pattern {Pre2, Pre4, Pre5, Pre6, Pre8, Pre9, Pup3, Scl1} using the Gene Ontology (GO).
|
![]() |
The following is a 3-D structural view of the hyperclique pattern {Pre2, Pre4, Pre5, Pre6, Pre8, Pre9, Pup3, Scl1}, which is within the protein complex, proteasome (PDB ID: 1fnt).
![]() |
![]() |
![]() |
A 3-D structural view of all proteins in the protein complex proteasome (PDB ID: 1fnt).
![]() |
The hyperclique pattern {Pre2, Pre4, Pre5, Pre6, Pre8, Pre9, Pup3, Scl1} is contained in four experimental resulting protein complexes, which are shown in the following Table.
| CID | Protein Complexes | Function Category |
| 106 | Blm3 Dam1 Dbp9 Ecm29 Est3 Gfa1 Ino4 Kap95 Lys12 Mds3, Nud1 Pda1 Pdb1 Pre10 Pre2 Pre3 Pre4 Pre5 Pre6 Pre8 Pre9 Pse1 Pup3 Rgr1 Rpt3 Rpt5 Scl1 Spa2 Srp1 Ulp1 YFL006W YGR081C YMR310C YPL012W Yra1 | Protein Synthesis and Turnover |
| 148 | Cdc6 Ecm29 Gfa1 Mlh2 Nas6 Pgk1 Pre1 Pre2 Pre3 Pre4 Pre5 Pre6 Pre7 Pre8 Pre9 Pup3 Rpn10 Rpn11 Rpn12 Rpn13 Rpn3 Rpn5 Rpn6 Rpn7 Rpn8 Rpn9 Rpt1 Rpt2 Rpt3 Rpt4 Rpt5 Rpt6 Scl1 Ubp6 | Protein Synthesis and Turnover |
| 157 | Blm3 Cdc6 Ecm29 Mlh2 Pgk1 Pre1 Pre10 Pre2 Pre3 Pre4 Pre5 Pre6 Pre7 Pre8 Pre9 Pup3 Rgr1 Rpn10 Rpn11 Rpn12 Rpn13 Rpn3 Rpn5 Rpn6 Rpn7 Rpn8 Rpn9 Rpt1 Rpt2 Rpt3 Rpt4 Rpt5 Rpt6 Scl1 Ubp6 YFL006W | Protein Synthesis and Turnover |
| 151 | Blm3 Cdc55 Cin1 Erg13 Hhf2 Hos2 Iml1 Kap95 Kel1 Lte1 Myo5 Pfk1 Pph21 Pph22 Pre1 Pre10
Pre2 Pre4 Pre5 Pre6 Pre7 Pre8 Pre9 Pup1 Pup2 Pup3 Rrd2 Rts1
Scl1 Sif2 Srp1 Tdh2 Tdh3 Tef4 Tpd3 YBL104C YCR033W YGL245W YGR161C YIL112W YKR029C Yef3 Yor1 Yra1 Zds1 Zds2 |
Signaling |
The following shows GO function annotation of protein complex 151. There are three hyperclique patterns in this complex. Proteins within a pair of < > form a hyperclique pattern.
A list of maximal hyperclique patterns at a support threshold 0 and an h-confidence threshold 60%.
| Hyperclique Pattern | Molecular Function Annotation | Biological Process Annotation |
| Cus1 Msl1 Prp3 Prp9 Sme1 Smx2 Smx3 Yhc1 YJR084W Brr1 Dib1 Ecm2 Hsh155 Lsm4 Mud1 Prp11 Prp19 Prp21 Prp31 Prp39 Prp40 Prp42 Prp6 Smd1 Snt309 Snu56 Srb2 YDL209C Clf1 Lea1 Luc7 Prp4 Rse1 Smb1 Smd3 Snp1 Snu66 Snu71 YLR424W | GO Function Annotation | GO Process Annotation |
| Brr1 Mud1 Prp39 Prp40 Prp42 Smd1 Snu56 Luc7 Rse1 Smd3 Snp1 Snu71 Smd2 | GO Function Annotation | GO Process Annotation |
| Ecm2 Hsh155 Prp19 Prp21 Snt309 YDL209C Clf1 Lea1 Rse1 YLR424W Prp46 Smd2 Snu114 | GO Function Annotation | GO Process Annotation |
| Emg1 Imp3 Imp4 Kre31 Mpp10 Nop14 Sof1 YMR093W YPR144C Krr1 YDR449C Enp1 | GO Function Annotation | GO Process Annotation |
| Dib1 Lsm4 Prp31 Prp6 Clf1 Prp4 Smb1 Snu66 YLR424W Prp46 Smd2 Snu114 | GO Function Annotation | GO Process Annotation |
| Cdc33 Dib1 Lsm4 Prp31 Prp6 Clf1 Prp4 Smb1 Snu66 YLR424W | GO Function Annotation | GO Process Annotation |
| Pre2 Pre4 Pre5 Pre8 Pup3 Pre6 Pre9 Scl1 | GO Function Annotation | GO Process Annotation |
| Clf1 Prp4 Smb1 Snu66 YLR424W Prp46 Smd2 Snu114 | GO Function Annotation | GO Process Annotation |
Reference
1. A. Gavin et al. Functional organization of the yeast proteome by systematic analysis of protein complexes, Nature, 415:141-147, 2002.
2. Y. Ho et al. Systematic identification of protein complexes in saccharomyces cerevisiae by mass spectrometry. Nature, 415:180-183, 2002.
3. M. Deng, F. Sun, and T. Chen. Assessment of the reliability of protein-protein interactions and protein function prediction. Pacific Symposium Biocomputing (PSB), 140-151, 2003.
4. C. Mering et al. Comparative assessment of large-scale data sets of protein-protein interactions. Nature, 417, 2002.
5. H. Xiong, P. Tan, and V. Kumar, Mining strong affinity association patterns in data sets with skewed support distribution. In Proc. of the third IEEE International Conference on Data Mining (ICDM), 387-394, 2003.
Free counter




