|
DRSC
>> DIOPT Summary
About DIOPT and DIOPT-DIST
The identification of orthologs is commonly used for
bioinforamtics activities such as data mining and
establishing models for human diseases. Moreover, our group
notes that researchers analyzing the results of screens
performed at the Drosophila RNAi Screening Center (DRSC)
frequently wish to identify mammalian orthologs of the fly
genes that were "hits" (positive results) in their
screens.
In helping DRSC screeners to identify orthologs using
existing tools and algorithms, we recognized a need for a
user-friendly approach to viewing and comparing ortholog
predictions obtained using different tools and
algorithms. This was our motivation in developing DIOPT. To
facilitate identification of orthologs specifically of human
disease-associated genes, we further developed
DIOPT-DIST. Information about our approaches to development
of both tools is summarized below.
Cite DIOPT or DIOPT-DIST
If you use DIOPT or DIOPT-DIST in your research, we ask
that you please cite our paper:
Hu Y, Flockhart I, Vinayagam A, Bergwitz C,
Berger B, Perrimon N, Mohr SE. An Integrative Approach to
Ortholog Prediction for Disease-Focused and Other Functional
Studies. BMC Bioinformatics. 2011 Aug 31;12(1):357.
The DIOPT Approach
Many tools have emerged to meet the need to identify
orthologs. However, low coverage and heterogeneity of these
tools present an obstacle to scientists who want to identify
a one or a few highest-confidence orthologs for a given gene
of interest or conversely, want to cast a wide net and
follow up on all possible orthologs of a gene.
Our goal is to provide an easy-to-use resource that
facilitates summary, comparison and access to various
sources of ortholog predictions. DIOPT integrates human,
mouse, fly, worm, zebrafish and yeast ortholog predictions
made by Ensembl Compara, HomoloGene, Inparanoid, Isobase,
OMA, orthoMCL, Phylome, RoundUp, and TreeFam. DIOPT lets
users find ortholog pairs for a specified gene or genes
identified by one, many or all of these published
approaches. This provides a streamlined method for
integration, comparison and access to orthology predictions
originating from algorithms based on sequence homology,
phylogenetic trees, and functional similarity. DIOPT
calculates a simple score indicating the number of tools
that support a given orthologous gene-pair relationship, as
well as a weighted score based on functional assessment
using high quality GO molecular function annotation of all
fly-human orthologous pairs predicted by each tool.
Differences in the algorithms used by each tool to predict
orthologous relationship is one source of difference in the
set of predictions made by one tool versus another. However,
we also note that some of these differences might be
attributable to use of different genome annotation releases
used by some tools versus others, and that not all tools
cover all of the species that we include in the DIOPT tool
(see Tables 1,2 and 3).
DIOPT also displays protein and domain alignments,
including percent amino acid identity, for predicted
ortholog pairs. These should help you to identify the most
appropriate matches among multiple possible orthologs.
The following summary figures and tables help to explain
our approach and summarize the tools and algorithms included
in DIOPT.
Figure 1: Summary of the DIOPT approach to
integration of results from multiple ortholog
prediction tools and algorithms. In green, tools
based on sequence alignment. In purple, tools based on
evolutionary relationships. In orange, a tool that
incorporates protein-protein interaction network data
into ortholog predictions.
|
Table 1: Summary Information and
Publications for the Tools Integrated in DIOPT
| Prediction Method |
Source |
Prediction Algorithm |
Coverage |
DIOPT Weight* |
PMID |
|
Compara |
Ensembl |
Phylogenetic approach |
57 species (vs.64) |
0.931 |
19029536 |
|
Homologene |
NCBI |
Combination of BBH*, tree and synteny |
21 species (vs. 66) |
1 |
11125071 |
|
Inparanoid |
Stockholm University, Sweden |
BBH* approach to identify orthologs and in-paralogs |
100 species (vs. 7) |
1.005 |
11743721 |
|
Isobase |
MIT |
Sequence and PPI* network alignments |
5 species |
0.957 |
21177658 |
|
OMA |
CBRG, ETH Zurich |
BBH*, global sequence alignments |
1211 species (Mar 2012) |
1.019 |
17545180 |
|
OrthoDB |
University of Geneva |
Phylogenetic approach |
1367 species (vs. 6) |
1.001 |
20972218 |
|
orthoMCL |
University of Pennsylvania |
Markov Cluster algorithm |
150 species (vs. 5) |
0.903 |
12952885 |
|
Phylome |
Centre for Genomic Regulation (CRG), Spain |
Reconstruction of evolutionary histories of all
genes in a genome, also known as phylome. |
19 phylomes (Dec 2011) |
0.912 |
17962297 |
|
RoundUp |
Harvard Medical School |
RSD*, modified BBH* |
1807 species(Dec 2011) |
1.003 |
16777906 |
|
TreeFam |
Wellcome Trust Sanger Institute |
Manually curated based on trees |
79 species (vs. 8) |
0.963 |
16381935 |
* DIOPT weights are based on the mean semantic similarity
of high quality GO molecular function annotation of all
fly-human orthologous pairs predicted by each tool.
BBH, Best Blast Hits
RSD, Reciprocal Smallest Distance
PPI, Protein-Protein Interactions
Table 2A: Genome Release Information
for the Tools Integrated in DIOPT
|
Worm |
Fish |
Fly |
Human |
Mouse |
Yeast |
| Compara |
WormBase WBcel215 |
Ensembl Zv9 |
FlyBase BDGP5 |
Ensembl GRCh37 |
Ensembl GRCm38 |
SGD EF3 |
| Homologene |
NCBI (Mar 2012) |
NCBI (Mar 2012) |
NCBI (Mar 2012) |
NCBI (Mar 2012) |
NCBI (Mar 2012) |
|
| OMA |
Ensembl v46; WS170 |
Ensembl v57; Zv8 |
Ensembl v46; BDGP4.3 |
Ensembl v64; GRCh37.5 |
Ensembl v64; NCBIm37 |
Ensembl v64; SGD EF 3 |
| Inparanoid |
WS199 | ZFISH7.52 |
FlyBase r5.13 | NCBI v36.52 | NCBI v37.52 |
SGD |
| Isobase |
Ensembl v59 | NA |
Ensembl v59 | Ensembl v59 | Ensembl v59 |
Ensembl v59 |
| OrthoDB |
release unavailable |
Zv9 |
FlyBase r5.45 |
GRCh37.p5 |
NCBIm37 |
|
| orthoMCL |
WS206 |
Zv8.56 |
BDGP5.13.56 |
GRCh37.56 |
NCBI v37.56 |
FungiDB |
| RoundUp |
Uniprot Nov2011 |
Uniprot Nov2011 |
Uniprot Nov2011 |
Uniprot Nov2011 |
Uniprot Nov2011 |
Uniprot Nov2011 |
| TreeFam |
Ensembl v54 | Ensembl v54 |
Ensembl v54 | Ensembl v54 |
Ensembl v54 | Ensembl v54 |
| Phylome |
NCBI v36 | NCBI v36 |
NCBI v36 | NCBI v36 |
NCBI v36 | NCBI v36 |
Table 2B: Additional Information About
Genome Releases
| Other Resource | Version |
| WormBase | release234 |
| FlyBase | release5.47 |
| RefSeq | release55 |
| EntrezGene | 22-Oct-12 |
Table 3. Maximum DIOPT score for
each orthologous relationship
| Orthologous Relationship |
Max score |
Relevant Tools |
| fly-human | 10 | All |
| fly-mouse | 10 | All |
| fly-worm | 10 | All |
| fly-fish | 9 | All but Isobase |
| fly-yeast | 9 | All but OrthoDB |
| human-mouse | 10 | All |
| human-worm | 10 | All |
| human-fish | 9 | All but Isobase |
| human-yeast | 9 | All but OrthoDB |
| mouse-worm | 9 | All but Phylome |
| mouse-fish |
8 |
All but Phylome and Isobase |
| mouse-yeast | 9 | All but OrthoDB |
| fish-worm |
8 |
All but Phylome and Isobase |
| fish-yeast |
8 |
All but Isobase and OrthoDB |
| worm-yeast | 9 | All but OrthoDB |
The DIOPT-DIST Approach
Facilitating the identification of orthologs between a
model organism and humans is of particular relevance to
genes associated with human diseases. Keeping in mind that
researchers visiting or otherwise using resources from the
DRSC are often interested to identify orthologs of
disease-associated genes, including genes more recently
identified through genome-wide association studies (GWAS),
we decided to take DIOPT results a step further by linking
them up with disease associations in curated resources.
DIOPT-DIST is an online-searchable tool that maps
gene-disease relationships from
the NCBI Online
Menedlian Inheritance in Man (OMIM) database and GWAS
datasets contained in
the GWAS
catalog to genes in the C. elegans, Drosophila, mouse,
S. cerevisiae, and zebrafish genomes using DIOPT orthologous
predictions. Disease terms were extracted from OMIM and
GWAS ftp files and were categorized with MeSH headings. 2
additional categories were added for the terms out of the
scope of MeSH disease annotation such as disease risk
factors and traits (table 4).
The user may search 1.) the related human diseases by a
list of genes from model organisms; 2.) the corresponding
genes in model organisms by disease term, disease category
or IDs from OMIM. Both gene/locus OMIM IDs (starts with * or
+ on the OMIM website eg. 100650) and disease phenotype OMIM
IDs (starts with # or % on the OMIM website eg. 610251) can
be searched at DIOPT-DIST (table 5).
Figure 2: The DIOPT-DIST Approach
Table 4. Disease Categories
| ID |
Disease Category |
Source |
| C01 |
Bacterial Infections and Mycoses |
MeSH heading |
| C02 |
Virus Diseases |
MeSH heading |
| C03 |
Parasitic Diseases |
MeSH heading |
| C04 |
Neoplasms |
MeSH heading |
| C05 |
Musculoskeletal Diseases |
MeSH heading |
| C06 |
Digestive System Diseases |
MeSH heading |
| C07 |
Stomatognathic Diseases |
MeSH heading |
| C08 |
Respiratory Tract Diseases |
MeSH heading |
| C09 |
Otorhinolaryngologic Diseases |
MeSH heading |
| C10 |
Nervous System Diseases |
MeSH heading |
| C11 |
Eye Diseases |
MeSH heading |
| C12 |
Male Urogenital Diseases |
MeSH heading |
| C13 |
Female Urogenital Diseases and Pregnancy Complications |
MeSH heading |
| C14 |
Cardiovascular Diseases |
MeSH heading |
| C15 |
Hemic and Lymphatic Diseases |
MeSH heading |
| C16 |
Congenital, Hereditary, and Neonatal Diseases and
Abnormalities |
MeSH heading |
| C17 |
Skin and Connective Tissue Diseases |
MeSH heading |
| C18 |
Nutritional and Metabolic Diseases |
MeSH heading |
| C19 |
Endocrine System Diseases |
MeSH heading |
| C20 |
Immune System Diseases |
MeSH heading |
| C21 |
Disorders of Environmental Origin |
MeSH heading |
| C23 |
Pathological Conditions, Signs and Symptoms |
MeSH heading |
| C24 |
Occupational Diseases |
MeSH heading |
| C25 |
Substance-Related Disorders |
MeSH heading |
| C26 |
Wounds and Injuries |
MeSH heading |
| F03 |
Mental Disorders |
MeSH heading |
| Y01 |
Disease risk factor, diagnosis or treatment |
Added for terms not included in MeSH |
| Y02 |
Trait |
Added for terms not included in MeSH |
Table 5. DIOPT-DIST result page, one
example of OMIM terms
| FlyBase ID |
Fly Symbol |
Human GeneID |
Human Symbol |
DIOPT Score |
OMIM ID |
Disease/Trait |
Source |
|
FBgn0012036 |
Aldh |
217 |
ALDH2 |
8 |
100650 |
{Hangover, susceptibility to},
610251 (3) |
OMIM |
| 100650 |
{Sublingual nitroglycerin, susceptibility to poor
response to} (3) |
OMIM |
| 100650 |
Alcohol sensitivity, acute, 610251 (3) |
OMIM |
| 100650 |
{Esophageal cancer, alcohol-related,
susceptibility to} (3) |
OMIM |
Notes on Disease/Trait column:
Disease term:
{Hangover,
susceptibility
to}, 610251
(3)
- 610251 is the OMIM
ID for the phenotype.
- Brackets, "{ }", indicate mutations
that contribute to susceptibility to multi-factorial
disorders or to susceptibility to infection.
- "(3)"
means the molecular basis of the disorder is known.
For more information about OMIM annotation, please go to
the OMIM help page
(http://omim.org/help/faq).
If you have questions, suggestions or comments on DIOPT
please contact our
informatics
staff.
|