COMPLEAT

Background

COMPLEAT was developed at Harvard Medical School, in the lab of Norbert Perrimon and Drosophila RNAi Screening Center (DRSC). COMPLEAT broadens the scope of high-throughput data analyses by using backend annotations that make the tool complementary to existing tools. The tool incorporates several useful features in order to provide a comprehensive data-mining environment, including network-based visualization and interactive querying options.

Brief introduction to COMPLEAT

COMPLEAT is an online tool used to analyze high-throughput datasets (or small-scale datasets) using protein complex enrichment analysis. The tool uses a protein complex resource as the backend annotation data instead of conventional Gene Ontology- or pathway-based annotations. With the input data, there is no need to pre-select for ‘hits’ from the study. Users can simply upload the full data set. Users can upload multiple datasets and quickly zoom in to view the complexes that are enriched in one or both datasets, as well as complexes enriched differentially between two datasets.

Citing COMPLEAT

If you use COMPLEAT in your research, please cite our paper:
A. Vinayagam, Y. Hu, M. Kulkarni, C. Roesel, R. Sopko, S. E. Mohr, N. Perrimon, Protein Complex-Based Analysis Framework for High-Throughput Data Sets. Sci. Signal. 6, rs5 (2013).

Assembly of the human, fly and yeast complex resource

Protein complexes were assembled in two different ways. First, complexes annotated from literature were collected from various public resources and mapped to human, fly and yeast genes using the ortholog prediction tool DIOPT. Second, complexes were predicted based on high-quality protein-protein interaction (PPI) networks from human, fly, or yeast using CFinder and NetworkBlast.

Table 1: Original resources for literature–based complexes

Source	Focus	Original species	Ortholog mapping
CORUM	Protein complex from literature	Mammal	Mammal -> Fly/Yeast Mouse -> Human
PINdb	Protein complex from literature	Human, Yeast	Human/Yeast -> Fly Human<->Yeast
CYC2008	Protein complex from literature	Yeast	Yeast -> Human/Fly
CYC2008	Protein complex from HT data	Yeast	Yeast -> Human/Fly
Gene Ontology	Protein complex	Human, Fly, Yeast	No mapping
DPiM (manual)	Protein complex from MS data	Fly	Fly -> Human/Yeast
KEGG module	Signaling pathways protein complex	Not clear	Human -> Fly/Yeast
SignaLink	Core signaling pathways	Human, Fly	Human/Fly -> Yeast
flyReactome	Core signaling pathways	Fly	Fly -> Human/Yeast

Table 2. Number of complexes derived from different sources

Organism	Literature		Predicted		Combined
Organism	Complexes	Proteins	Complexes	Proteins	Complexes	Proteins
Human	3638	7524	6251	6334	9881	9293
Fly	3077	5619	3639	3933	6703	6536
Yeast	2173	3280	5551	3366	7713	3994

Annotation of the human, yeast and fly complex resource

In addition to the annotation of complexes from their original resource, the representative GO terms, common publications, and common sub-cellular localization are also searchable and displayed in the complex detail view. The binary interactions between the complex members are tracked and also displayed in the Cytoscape view of selected complexes.

Enrichment algorithms

Values from the input data are mapped to the complex members and sorted highest-to-lowest. A complex score is computed as the interquartile mean (IQM), which preserves the direction (positive/negative or up-/down-regulation) of the original data. A p-value is also computed to estimate the significance of complex scores as compared to 1000 random complexes of the same size. Enriched complexes (shown in color) are those that meet the p-value cutoff selected by the user (slider bar below the graph).