DRSC >> DIOPT Summary


The identification of orthologs is commonly used for bioinforamtics activities such as data mining and establishing models for human diseases. Moreover, our group notes that researchers analyzing the results of screens performed at the Drosophila RNAi Screening Center (DRSC) frequently wish to identify mammalian orthologs of the fly genes that were "hits" (positive results) in their screens.

In helping DRSC screeners to identify orthologs using existing tools and algorithms, we recognized a need for a user-friendly approach to viewing and comparing ortholog predictions obtained using different tools and algorithms. This was our motivation in developing DIOPT. To facilitate identification of orthologs specifically of human disease-associated genes, we further developed DIOPT-DIST. Information about our approaches to development of both tools is summarized below.


If you use DIOPT or DIOPT-DIST in your research, we ask that you please cite our paper:

Hu Y, Flockhart I, Vinayagam A, Bergwitz C, Berger B, Perrimon N, Mohr SE. An Integrative Approach to Ortholog Prediction for Disease-Focused and Other Functional Studies. BMC Bioinformatics. 2011 Aug 31;12(1):357. PubMed Entry

The DIOPT Approach

Many tools have emerged to meet the need to identify orthologs. However, low coverage and heterogeneity of these tools present an obstacle to scientists who want to identify a one or a few highest-confidence orthologs for a given gene of interest or conversely, want to cast a wide net and follow up on all possible orthologs of a gene.

Our goal is to provide an easy-to-use resource that facilitates summary, comparison and access to various sources of ortholog predictions. DIOPT integrates human, mouse, fly, worm, zebrafish and yeast ortholog predictions made by Ensembl Compara, HomoloGene, Inparanoid, Isobase, OMA, orthoMCL, Phylome, RoundUp, and TreeFam. DIOPT lets users find ortholog pairs for a specified gene or genes identified by one, many or all of these published approaches. This provides a streamlined method for integration, comparison and access to orthology predictions originating from algorithms based on sequence homology, phylogenetic trees, and functional similarity. DIOPT calculates a simple score indicating the number of tools that support a given orthologous gene-pair relationship, as well as a weighted score based on functional assessment using high quality GO molecular function annotation of all fly-human orthologous pairs predicted by each tool. Differences in the algorithms used by each tool to predict orthologous relationship is one source of difference in the set of predictions made by one tool versus another. However, we also note that some of these differences might be attributable to use of different genome annotation releases used by some tools versus others, and that not all tools cover all of the species that we include in the DIOPT tool (see Tables 1,2 and 3).

DIOPT also displays protein and domain alignments, including percent amino acid identity, for predicted ortholog pairs. These should help you to identify the most appropriate matches among multiple possible orthologs.

The following summary figures and tables help to explain our approach and summarize the tools and algorithms included in DIOPT.

DIOPT integration schema Figure 1: Summary of the DIOPT approach to integration of results from multiple ortholog prediction tools and algorithms. In green, tools based on sequence alignment. In purple, tools based on evolutionary relationships. In orange, a tool that incorporates protein-protein interaction network data into ortholog predictions.

Table 1: Summary Information and Publications for the Tools Integrated in DIOPT

Prediction Method Source Prediction Algorithm Coverage DIOPT Weight* PMID
Compara Ensembl Phylogenetic approach 70 species (vs.81) 0.931 19029536
Homologene NCBI Combination of BBH*, tree and synteny 21 species (vs. 68) 1 11125071
Inparanoid Stockholm University, Sweden BBH* approach to identify orthologs and in-paralogs 273 species (vs. 8) 1.005 11743721
Isobase MIT Sequence and PPI* network alignments 5 species (vs.2, Nov. 2014) 0.957 21177658
OMA CBRG, ETH Zurich BBH*, global sequence alignments 1706 species (Oct 2014) 1.019 17545180
OrthoDB University of Geneva Phylogenetic approach 3027 species (vs.8) 1.001 20972218
orthoMCL University of Pennsylvania Markov Cluster algorithm 150 species (vs. 5) 0.903 12952885
Phylome Centre for Genomic Regulation (CRG), Spain Reconstruction of evolutionary histories of all genes in a genome, also known as phylome. 1059 species,120 Phylomes (vs. 4) 0.912 17962297
RoundUp Harvard Medical School RSD*, modified BBH* 2044 species(Apr 2013) 1.003 16777906
TreeFam Wellcome Trust Sanger Institute Manually curated based on trees 109 species (vs. 9) 0.963 16381935

* DIOPT weights are based on the mean semantic similarity of high quality GO molecular function annotation of all fly-human orthologous pairs predicted by each tool.
   BBH, Best Blast Hits
   RSD, Reciprocal Smallest Distance
   PPI, Protein-Protein Interactions

Table 2A: Genome Release Information for the Tools Integrated in DIOPT

Worm Fish Fly Human Mouse Yeast Fission Yeast Frog
Compara WBcel235 GRCz10 BDGP6 GRCh38.p3 GRCm38.p4 R64-1-1 JGI 4.2
Homologene WS195 Zv9 FlyBase r5.48 GRCh38 GRCm38.p2 R64-1-1 ASM294v2
OMA Ensembl v73 WBcel235 Ensembl v70 Zv9 Ensembl v73 BDGP5 Ensembl v75 GRCh37 Ensembl v75 GRCm38 Ensembl v73 (EF4) Ensembl Fungi v22 (ASM294v2) Ensembl v73 (JGI_4.2)
Inparanoid UniProt Nov 2013 UniProt Nov 2013 UniProt Nov 2013 UniProt Nov 2013 UniProt Nov 2013 UniProt Nov 2013 UniProt Nov 2013 UniProt Nov 2013
Isobase Ensembl v59 NA Ensembl v59 Ensembl v59 Ensembl v59 Ensembl v59
orthoMCL WS206 Zv8.56 BDGP5.13.56 GRCh37.56 NCBI v37.56 FungiDB GenBank
orthoDB Ensembl v75 FlyBase r5.55 Ensembl v75 Ensembl v75 UniProt Feb 2014 UniProt Feb 2014 Ensembl v75
RoundUp UniProt Apr 2013 UniProt Apr 2013 UniProt Apr 2013 UniProt Apr 2013 UniProt Apr 2013 UniProt Apr 2013 UniProt Apr 2013 UniProt Apr 2013
TreeFam Ensembl v69 Ensembl v69 Ensembl v69 Ensembl v69 Ensembl v69 Ensembl v69 Ensembl v69 Ensembl v69
Phylome UniProt UniProt UniProt UniProt UniProt UniProt UniProt UniProt

Table 2B: Additional Information About Genome Releases

Other ResourceVersion

Table 3. Maximum DIOPT score for each orthologous relationship

Orthologous Relationship Max score Relevant Tools
fission yeast-baker's yeast7Inparanoid, OMA, orthoMCL, Phylome, RoundUp, Treefam, Homologene
fission yeast-worm6Homologene, Treefam, RoundUp, orthoMCL, Inparanoid, OMA
fission yeast-fly7Phylome, Homologene, Treefam, RoundUp, orthoMCL, Inparanoid, OMA
fission yeast-fish6Homologene, Inparanoid, OMA, orthoMCL, RoundUp, Treefam
fission yeast-frog4Treefam, Inparanoid, OMA, RoundUp
fission yeast-human7Homologene, Inparanoid, OMA, orthoMCL, Phylome, RoundUp, Treefam
fission yeast-mouse6orthoMCL, RoundUp, OMA, Inparanoid, Homologene, Treefam
baker's yeast-worm9orthoMCL, OMA, Treefam, RoundUp, Isobase, Compara, Inparanoid, Homologene, Phylome
baker's yeast-fly9Isobase, Treefam, RoundUp, Phylome, OMA, Inparanoid, Homologene, Compara, orthoMCL
baker's yeast-fish8Phylome, Homologene, Treefam, RoundUp, orthoMCL, Inparanoid, Compara, OMA
baker's yeast-frog6Treefam, Compara, Inparanoid, OMA, Phylome, RoundUp
baker's yeast-human9orthoMCL, Treefam, RoundUp, Phylome, Isobase, Inparanoid, Compara, Homologene, OMA
baker's yeast-mouse9Phylome, RoundUp, orthoMCL, OMA, Inparanoid, Homologene, Compara, Treefam, Isobase
worm-fly10Phylome, RoundUp, orthoMCL, OrthoDB, Isobase, Inparanoid, Compara, Homologene, OMA, Treefam
worm-fish8Homologene, Treefam, RoundUp, orthoMCL, OrthoDB, Inparanoid, Compara, OMA
worm-frog6OrthoDB, OMA, Treefam, Compara, Inparanoid, RoundUp
worm-human10RoundUp, Treefam, Phylome, orthoMCL, OrthoDB, Isobase, Inparanoid, Homologene, Compara, OMA
worm-mouse9Inparanoid, Treefam, RoundUp, orthoMCL, OrthoDB, Compara, Isobase, Homologene, OMA
fly-fish9OrthoDB, RoundUp, Treefam, orthoMCL, Inparanoid, Homologene, Compara, OMA, Phylome
fly-frog7OMA, Treefam, RoundUp, OrthoDB, Inparanoid, Compara, Phylome
fly-human10Inparanoid, RoundUp, Phylome, Treefam, orthoMCL, OrthoDB, Isobase, Homologene, Compara, OMA
fly-mouse10Phylome, Compara, Homologene, Inparanoid, Isobase, OMA, OrthoDB, RoundUp, Treefam, orthoMCL
fish-frog6Compara, Inparanoid, OMA, OrthoDB, RoundUp, Treefam
fish-human9Compara, Treefam, RoundUp, Phylome, orthoMCL, OrthoDB, OMA, Homologene, Inparanoid
fish-mouse8OrthoDB, orthoMCL, RoundUp, Inparanoid, OMA, Homologene, Compara, Treefam
frog-human7Compara, Inparanoid, OMA, OrthoDB, Phylome, RoundUp, Treefam
frog-mouse6OMA, Treefam, OrthoDB, Inparanoid, Compara, RoundUp
human-mouse10Compara, orthoMCL, RoundUp, Treefam, Phylome, OrthoDB, OMA, Isobase, Homologene, Inparanoid

The DIOPT-DIST Approach

Facilitating the identification of orthologs between a model organism and humans is of particular relevance to genes associated with human diseases. Keeping in mind that researchers visiting or otherwise using resources from the DRSC are often interested to identify orthologs of disease-associated genes, including genes more recently identified through genome-wide association studies (GWAS), we decided to take DIOPT results a step further by linking them up with disease associations in curated resources.

DIOPT-DIST is an online-searchable tool that maps gene-disease relationships from the NCBI Online Menedlian Inheritance in Man (OMIM) database and GWAS datasets contained in the GWAS catalog to genes in the C. elegans, Drosophila, mouse, S. cerevisiae, and zebrafish genomes using DIOPT orthologous predictions. Disease terms were extracted from OMIM and GWAS ftp files and were categorized with MeSH headings. 2 additional categories were added for the terms out of the scope of MeSH disease annotation such as disease risk factors and traits (table 4).

The user may search 1.) the related human diseases by a list of genes from model organisms; 2.) the corresponding genes in model organisms by disease term, disease category or IDs from OMIM. Both gene/locus OMIM IDs (starts with * or + on the OMIM website eg. 100650) and disease phenotype OMIM IDs (starts with # or % on the OMIM website eg. 610251) can be searched at DIOPT-DIST (table 5).

Figure 2: The DIOPT-DIST Approach


Table 4. Disease Categories

ID Disease Category Source
C01 Bacterial Infections and Mycoses MeSH heading
C02 Virus Diseases MeSH heading
C03 Parasitic Diseases MeSH heading
C04 Neoplasms MeSH heading
C05 Musculoskeletal Diseases MeSH heading
C06 Digestive System Diseases MeSH heading
C07 Stomatognathic Diseases MeSH heading
C08 Respiratory Tract Diseases MeSH heading
C09 Otorhinolaryngologic Diseases MeSH heading
C10 Nervous System Diseases MeSH heading
C11 Eye Diseases MeSH heading
C12 Male Urogenital Diseases MeSH heading
C13 Female Urogenital Diseases and Pregnancy Complications MeSH heading
C14 Cardiovascular Diseases MeSH heading
C15 Hemic and Lymphatic Diseases MeSH heading
C16 Congenital, Hereditary, and Neonatal Diseases and Abnormalities MeSH heading
C17 Skin and Connective Tissue Diseases MeSH heading
C18 Nutritional and Metabolic Diseases MeSH heading
C19 Endocrine System Diseases MeSH heading
C20 Immune System Diseases MeSH heading
C21 Disorders of Environmental Origin MeSH heading
C23 Pathological Conditions, Signs and Symptoms MeSH heading
C24 Occupational Diseases MeSH heading
C25 Substance-Related Disorders MeSH heading
C26 Wounds and Injuries MeSH heading
F03 Mental Disorders MeSH heading
Y01 Disease risk factor, diagnosis or treatment Added for terms not included in MeSH
Y02 Trait Added for terms not included in MeSH

Table 5. DIOPT-DIST result page, one example of OMIM terms

FlyBase ID Fly Symbol Human GeneID Human Symbol DIOPT Score OMIM ID Disease/Trait Source
FBgn0012036 Aldh 217 ALDH2 8 100650 {Hangover, susceptibility to}, 610251 (3) OMIM
100650 {Sublingual nitroglycerin, susceptibility to poor response to} (3) OMIM
100650 Alcohol sensitivity, acute, 610251 (3) OMIM
100650 {Esophageal cancer, alcohol-related, susceptibility to} (3) OMIM
Notes on Disease/Trait column: Disease term: {Hangover, susceptibility to}, 610251 (3)
  • 610251 is the OMIM ID for the phenotype.
  • Brackets, "{ }", indicate mutations that contribute to susceptibility to multi-factorial disorders or to susceptibility to infection.
  • "(3)" means the molecular basis of the disorder is known.

For more information about OMIM annotation, please go to the OMIM help page (http://omim.org/help/faq).

If you have questions, suggestions or comments on DIOPT please contact our informatics staff.