Pangea - PAthway, Network and Gene-set Enrichment Analysis

How should I start? Why do I need to do “Gene ID mapping”?
Drosophila primary gene identifiers (IDs) such as FBgn, gene symbol and CG number may be updated due to gene splits, merges or renames, for example. The data that this tool incorporates is updated periodically, and so current gene IDs may not match the user’s set of gene IDs. We therefore strongly recommend that the user updates the gene IDs in query sets to the version used in this tool. To do this, click the “Gene ID mapping” option on the top menu bar or the hyperlinked text next to step 1. Paste the list of gene IDs into the box and click “Submit”. In the results table, any IDs that are different from the version used by the tool are highlighted, as well as duplicated genes. The user should check the highlighted genes and associated notes to ensure that the conversion is correct. The output list can be used to directly populate the search page by clicking “Search With Results” on the left.

What if I only have Uniprot IDs?
In the current version of this tool, the gene identifiers supported by PANGEA include Entrez GeneIDs, official gene symbols and primary gene identifiers from MODs. Users might need to analyze a list of genes with other identifiers such as UniProtKB IDs. The mapping of UniProtKB IDs to the gene identifiers supported by PANGEA can be done using "Gene Id Mapping" page from the top menu bar. Users need to specify the species as well as choosing the input type "UniProt" from dropdown menu and then click "Submit". The result table contains the search term, Entrez GeneID, official gene symbol and primary gene identifiers from MODs.

How to use “Enter your own background genes”?
Ideally, for enrichment analysis, the enrichment should only be performed over the complete set of genes that are measurable by/observed in the experiment. If you do not enter any genes in this box, the enrichment will be calculated based on whole genome as background. This is fine with any genome-wide screens and profiles. A background set is strongly recommended in instances where the whole genome/proteome is not measured, for example, where focused libraries/reagents (e.g. kinases and phosphatases) have been used for a screen. In mass-spec analyses many proteins are not detected, as they are either absent from the starting material (e.g. soluble fraction) or due to the nature of the protein (e.g. membrane proteins) or abundance; in this instance the complete set of proteins observed in the experiment plus controls should be used as background.

How was “GO slim” annotation generated? What is the difference between the two GO slim subsets?
Slims are subsets of the Gene Ontology (GO), in which annotations are mapped to higher terms in the ontology. The two GO slim subsets are substantially different. The 1st subset is a very high-level categorization or grouping slim, with just <50 terms. This is useful for a broad overview, but not for identifying specific biological roles. The 2nd subset is suitable for enrichment, it consists of, for example, about 150 terms for Drosophila representing mainly discrete processes, localization and functional classes.

What the difference is between “Direct GO term only” and “Using GO hierarchy”?
The GO terms are organized in a hierarchy. In the “Direct GO” set, only direct gene to GO term annotation is used – each term behaves as a separate gene class. In the “Using GO hierarchy” sets, the hierarchy is used, and the genes associated with child terms are accounted for in the enrichment.

What is “Experiment Data Only”? What is “Excluding high-throughput experiments”? What are the criteria?
GO terms can be assigned to a gene from different evidence sources; broadly: experimental evidence, inference from sequence conservation, computational pipelines and author statements. Detailed information can be found on the GO website (http://geneontology.org/docs/guide-go-evidence-codes/). For “Experiment Data Only”, only GO annotations supported by experimental evidence are used. For some analyses, it may be desirable to exclude annotations inferred from high-throughput studies (for example, if analyzing data from a high-throughput study, the enrichment results may be biased if a similar protocol had been used to assign GO terms). “Excluding high-throughput experiments” annotation uses all evidence, except those assigned from high-throughput experiments.

How protein complex annotation was generated? Why some of the protein complex names sound like GO terms?
Protein complex annotation was obtained from COMPLEAT database. Some of the complexes were annotated from literature supplemented with functional module annotated from KEGG, while others were predicted based on protein-protein interaction network. If the complex does not have a name, the most enriched GO terms among the complex members were picked to name the complex. Detailed information can be found in the publication about this resource (https://pubmed.ncbi.nlm.nih.gov/23443684/).

How to use “Search Multiple”?
If you need to do the enrichments for multiple gene lists and want to compare the results, for example, the marker genes of different cell clusters after single-cell RNA-seq profiling, to survey the pathways activated in different cell clusters, you might use this feature. The input should include two columns separated by comma or tab - one column to specify the list name e.g. cell cluster ID, while the other column to specify the gene ID:

                cluster0		CG12374
                cluster0		CG12057
                cluster0		mt:Cyt-b
                cluster1		CAH2
                cluster1		Idh
                cluster1		CG34026
                cluster2		Phae2
                cluster2		Mal-A8
                cluster2		CG15199

Frequently Asked Questions