DRSC >> DIOPT Summary


The identification of orthologs is commonly used for bioinforamtics activities such as data mining and establishing models for human diseases. Moreover, our group notes that researchers analyzing the results of screens performed at the Drosophila RNAi Screening Center (DRSC) frequently wish to identify mammalian orthologs of the fly genes that were "hits" (positive results) in their screens.

In helping DRSC screeners to identify orthologs using existing tools and algorithms, we recognized a need for a user-friendly approach to viewing and comparing ortholog predictions obtained using different tools and algorithms. This was our motivation in developing DIOPT. To facilitate identification of orthologs specifically of human disease-associated genes, we further developed DIOPT-DIST. Information about our approaches to development of both tools is summarized below.


If you use DIOPT or DIOPT-DIST in your research, we ask that you please cite our paper:

Hu Y, Flockhart I, Vinayagam A, Bergwitz C, Berger B, Perrimon N, Mohr SE. An Integrative Approach to Ortholog Prediction for Disease-Focused and Other Functional Studies. BMC Bioinformatics. 2011 Aug 31;12(1):357. PubMed Entry

The DIOPT Approach

Many tools have emerged to meet the need to identify orthologs. However, low coverage and heterogeneity of these tools present an obstacle to scientists who want to identify a one or a few highest-confidence orthologs for a given gene of interest or conversely, want to cast a wide net and follow up on all possible orthologs of a gene.

Our goal is to provide an easy-to-use resource that facilitates summary, comparison and access to various sources of ortholog predictions. DIOPT integrates human, mouse, fly, worm, zebrafish and yeast ortholog predictions made by Ensembl Compara, HomoloGene, Inparanoid, Isobase, OMA, orthoMCL, Phylome, RoundUp, and TreeFam. DIOPT lets users find ortholog pairs for a specified gene or genes identified by one, many or all of these published approaches. This provides a streamlined method for integration, comparison and access to orthology predictions originating from algorithms based on sequence homology, phylogenetic trees, and functional similarity. DIOPT calculates a simple score indicating the number of tools that support a given orthologous gene-pair relationship, as well as a weighted score based on functional assessment using high quality GO molecular function annotation of all fly-human orthologous pairs predicted by each tool. Differences in the algorithms used by each tool to predict orthologous relationship is one source of difference in the set of predictions made by one tool versus another. However, we also note that some of these differences might be attributable to use of different genome annotation releases used by some tools versus others, and that not all tools cover all of the species that we include in the DIOPT tool (see Tables 1,2 and 3).

DIOPT also displays protein and domain alignments, including percent amino acid identity, for predicted ortholog pairs. These should help you to identify the most appropriate matches among multiple possible orthologs.

The following summary figures and tables help to explain our approach and summarize the tools and algorithms included in DIOPT.

DIOPT integration schema Figure 1: Summary of the DIOPT approach to integration of results from multiple ortholog prediction tools and algorithms. In green, tools based on sequence alignment. In purple, tools based on evolutionary relationships. In orange, a tool that incorporates protein-protein interaction network data into ortholog predictions.

Table 1: Summary Information and Publications for the Tools Integrated in DIOPT

Prediction Method Source Prediction Algorithm Coverage DIOPT Weight* PMID
Compara EnsemblPhylogenetic approach70 species (vs.81)0.931 19029536
Homologene NCBICombination of BBH*, tree and synteny21 species (vs. 68)111125071
Inparanoid Stockholm University, SwedenBBH* approach to identify orthologs and in-paralogs273 species (vs. 8)1.00511743721
Isobase MITSequence and PPI* network alignments5 species (vs.2, Nov. 2014)0.95721177658
OMA CBRG, ETH ZurichBBH*, global sequence alignments1706 species (Oct 2014)1.01917545180
OrthoDB University of GenevaPhylogenetic approach3027 species (vs.8)1.001 20972218
orthoMCL University of PennsylvaniaMarkov Cluster algorithm150 species (vs. 5)0.90312952885
Phylome Centre for Genomic Regulation (CRG), SpainReconstruction of evolutionary histories of all genes in a genome, also known as phylome.1059 species,120 Phylomes (vs. 4)0.912 17962297
RoundUp Harvard Medical SchoolRSD*, modified BBH*2044 species (Apr 2013)1.00316777906
TreeFam Wellcome Trust Sanger InstituteManually curated based on trees109 species (vs. 9)0.963 16381935
Panther University of Southern CaliforniaPhylogenetic approach 79 species1.1 26578592
HGNC European Bioinformatics Institute (EMBL-EBI)Manually curated2 species1.5
ZFIN Zebrafish Model Organism Database Sequence similarity analysis and manual curation 4 species 1.5

* DIOPT weights are based on the mean semantic similarity of high quality GO molecular function annotation of all fly-human orthologous pairs predicted by each tool.
   BBH, Best Blast Hits
   RSD, Reciprocal Smallest Distance
   PPI, Protein-Protein Interactions

Table 2A: Genome Release Information for the Tools Integrated in DIOPT

Worm Fish Fly Human Mouse Yeast Fission Yeast Frog Rat
ComparaWBcel235GRCz10BDGP6GRCh38.p3GRCm38.p4R64-1-1JGI 4.2Rnor_6.0
HomologeneWS195Zv9"FlyBase r5.48"GRCh38GRCm38.p2R64-1-1ASM294v2Rnor_5.0
OMAEnsembl v73 WBcel235Ensembl v70 Zv9Ensembl v73 BDGP5Ensembl v75 GRCh37Ensembl v75 GRCm38Ensembl v73 (EF4)Ensembl Fungi v22 (ASM294v2)Ensembl v73 (JGI_4.2)Ensembl v73 (Rnor_5.0)
InparanoidUniProt Nov 2013UniProt Nov 2013UniProt Nov 2013UniProt Nov 2013UniProt Nov 2013UniProt Nov 2013UniProt Nov 2013UniProt Nov 2013UniProt Nov 2013
IsobaseEnsembl v59NAEnsembl v59Ensembl v59Ensembl v59Ensembl v59
orthoMCLWS206Zv8.56BDGP5.13.56GRCh37.56NCBI v37.56FungiDBGenBankEnsembl v53
orthoDBEnsembl v75FlyBase r5.55Ensembl v75Ensembl v75UniProt Feb 2014UniProt Feb 2014Ensembl v75Ensembl v75
RoundUpUniProt Apr 2013UniProt Apr 2013UniProt Apr 2013UniProt Apr 2013UniProt Apr 2013UniProt Apr 2013UniProt Apr 2013UniProt Apr 2013
TreeFamEnsembl v69Ensembl v69Ensembl v69Ensembl v69Ensembl v69Ensembl v69Ensembl v69Ensembl v69 Ensembl v69 Phylome
PantherWormBase Apr 2014Ensembl Apr 2014FlyBase Apr 2014Ensembl Apr 2014MGI Apr 2014SGD Apr 2014PomBase Apr 2014Gene Apr 2014RGD Apr 2014
HGNCHGNC Feb 2016HGNC Feb 2016
ZFINZFIN May 2016ZFIN May 2016ZFIN May 2016ZFIN May 2016

Table 2B: Additional Information About Genome Releases

Other ResourceVersion

Table 3. Maximum DIOPT score for each orthologous relationship

Orthologous Relationship Max score Relevant Tools
fission yeast-baker's yeast8Inparanoid,OMA,orthoMCL,Phylome,RoundUp,Treefam,Homologene,Panther
fission yeast-worm7Homologene,Treefam,RoundUp,orthoMCL,Inparanoid,OMA,Panther
fission yeast-fly8Phylome,Homologene,Treefam,RoundUp,orthoMCL,Inparanoid,OMA,Panther
fission yeast-fish7Homologene,Inparanoid,OMA,orthoMCL,RoundUp,Treefam,Panther
fission yeast-frog5Treefam,Inparanoid,OMA,RoundUp,Panther
fission yeast-human8Homologene,Inparanoid,OMA,orthoMCL,Phylome,RoundUp,Treefam,Panther
fission yeast-mouse7orthoMCL,RoundUp,OMA,Inparanoid,Homologene,Treefam,Panther
baker's yeast-worm10orthoMCL,OMA,Treefam,RoundUp,Isobase,Compara,Inparanoid,Homologene,Phylome,Panther
baker's yeast-fly10Isobase,Treefam,RoundUp,Phylome,OMA,Inparanoid,Homologene,Compara,orthoMCL,Panther
baker's yeast-fish9Phylome,Homologene,Treefam,RoundUp,orthoMCL,Inparanoid,Compara,OMA,Panther
baker's yeast-frog7Treefam,Compara,Inparanoid,OMA,Phylome,RoundUp,Panther
baker's yeast-human10orthoMCL,Treefam,RoundUp,Phylome,Isobase,Inparanoid,Compara,Homologene,OMA,Panther
baker's yeast-mouse10Phylome,RoundUp,orthoMCL,OMA,Inparanoid,Homologene,Compara,Treefam,Isobase,Panther
rat-fission yeast6Inparanoid,OMA,orthoMCL,TreeFam,Homologene,Panther
rat- baker's yeast7Compara,Homologene,Inparanoid,OMA,orthoMCL,TreeFam,Panther

The DIOPT-DIST Approach

Facilitating the identification of orthologs between a model organism and humans is of particular relevance to genes associated with human diseases. Keeping in mind that researchers visiting or otherwise using resources from the DRSC are often interested to identify orthologs of disease-associated genes, including genes more recently identified through genome-wide association studies (GWAS), we decided to take DIOPT results a step further by linking them up with disease associations in curated resources.

DIOPT-DIST is an online-searchable tool that maps gene-disease relationships from the NCBI Online Menedlian Inheritance in Man (OMIM) database and GWAS datasets contained in the GWAS catalog to genes in the C. elegans, Drosophila, mouse, S. cerevisiae, and zebrafish genomes using DIOPT orthologous predictions. Disease terms were extracted from OMIM and GWAS ftp files and were categorized with MeSH headings. 2 additional categories were added for the terms out of the scope of MeSH disease annotation such as disease risk factors and traits (table 4).

The user may search 1.) the related human diseases by a list of genes from model organisms; 2.) the corresponding genes in model organisms by disease term, disease category or IDs from OMIM. Both gene/locus OMIM IDs (starts with * or + on the OMIM website eg. 100650) and disease phenotype OMIM IDs (starts with # or % on the OMIM website eg. 610251) can be searched at DIOPT-DIST (table 5).

Figure 2: The DIOPT-DIST Approach


Table 4. Disease Categories

ID Disease Category Source
C01 Bacterial Infections and Mycoses MeSH heading
C02 Virus Diseases MeSH heading
C03 Parasitic Diseases MeSH heading
C04 Neoplasms MeSH heading
C05 Musculoskeletal Diseases MeSH heading
C06 Digestive System Diseases MeSH heading
C07 Stomatognathic Diseases MeSH heading
C08 Respiratory Tract Diseases MeSH heading
C09 Otorhinolaryngologic Diseases MeSH heading
C10 Nervous System Diseases MeSH heading
C11 Eye Diseases MeSH heading
C12 Male Urogenital Diseases MeSH heading
C13 Female Urogenital Diseases and Pregnancy Complications MeSH heading
C14 Cardiovascular Diseases MeSH heading
C15 Hemic and Lymphatic Diseases MeSH heading
C16 Congenital, Hereditary, and Neonatal Diseases and Abnormalities MeSH heading
C17 Skin and Connective Tissue Diseases MeSH heading
C18 Nutritional and Metabolic Diseases MeSH heading
C19 Endocrine System Diseases MeSH heading
C20 Immune System Diseases MeSH heading
C21 Disorders of Environmental Origin MeSH heading
C23 Pathological Conditions, Signs and Symptoms MeSH heading
C24 Occupational Diseases MeSH heading
C25 Substance-Related Disorders MeSH heading
C26 Wounds and Injuries MeSH heading
F03 Mental Disorders MeSH heading
Y01 Disease risk factor, diagnosis or treatment Added for terms not included in MeSH
Y02 Trait Added for terms not included in MeSH

Table 5. DIOPT-DIST result page, one example of OMIM terms

FlyBase ID Fly Symbol Human GeneID Human Symbol DIOPT Score OMIM ID Disease/Trait Source
FBgn0012036 Aldh 217 ALDH2 8 100650 {Hangover, susceptibility to}, 610251 (3) OMIM
100650 {Sublingual nitroglycerin, susceptibility to poor response to} (3) OMIM
100650 Alcohol sensitivity, acute, 610251 (3) OMIM
100650 {Esophageal cancer, alcohol-related, susceptibility to} (3) OMIM
Notes on Disease/Trait column: Disease term: {Hangover, susceptibility to}, 610251 (3)
  • 610251 is the OMIM ID for the phenotype.
  • Brackets, "{ }", indicate mutations that contribute to susceptibility to multi-factorial disorders or to susceptibility to infection.
  • "(3)" means the molecular basis of the disorder is known.

For more information about OMIM annotation, please go to the OMIM help page (http://omim.org/help/faq).

If you have questions, suggestions or comments on DIOPT please contact our informatics staff.