About

Video Tutorial

About DRscDB:

DRscDB is a user-friendly, manually curated single-cell RNA-seq (scRNA-seq) search database based on datasets derived from published literature. DRscDB allows users to search, mine, and compare multiple genes and cell clusters across diverse species including Drosophila and Humans. DRscDB serves as a comprehensive repository of published scRNA-seq datasets that are manually curated, thus providing the users with literature-derived marker genes of various scRNA-seq datasets. Importantly, DRscDB has the power to identify gene orthologs across various species and allows for an efficient comparison of gene expression profiles at single-cell level. Salient features of DRscDB include but not limited:

Feedback can be sent to drsc@genetics.med.harvard.edu or using “software bug report form” at bottom of DRSC/TRiP home page .

Use Case 1 - Single Gene Search

1.) Landing page

User needs to specify the species of input gene and DRscDB covers Drosophila, human, mouse and zebrafish
User has the option to select the data by tissue
User has the option to set the criteria of ortholog mapping. Ortholog mapping is based on DIOPT .

2.) Result page1

This result page summarizes the number of relevant datasets expressing the input gene and orthologous genes. The input gene is highlighted in blue.
User has the option to adjust ortholog mapping criteria at this page.
User can select any gene from the summary table and click to view the result

3.) Result page2

This result page summarizes the detailed information of relevant datasets for the selected gene and displays the information of pubmed ID, dataset name, tissue and cell types as well as marker annotation in a table format
The table can be searched, sorted and exported
User has the option to view the statistics for the expression at cluster level or marker annotation

4.) Result page3 (cluster level statistics)

User has the option to view 3 different visualization: dot plot, bar graph and heatmap
Default is dot plot and the size of the dots represents the percent cell expressing the input gene while the darkness of the dots reflects the average expression level.
A table of cluster level statistics is provided at the bottom that is searchable, sortable and exportable.

5.) Result page4 (marker gene statistics)

Statistics of marker genes is visualized by bar graph with the height reflecting fold enrichment while the darkness reflects -log10 P value.
A table of marker statistics is provided at the bottom that is searchable, sortable and exportable

Use Case 2 – Single Gene List Enrichment

1.) Landing page

User can analyze a list of any genes (eg. hits from cell-based screen, or a list of transcription factors) to find out the tissue/cell type that are enriched for. For example, this enrichment analysis will help user to identify the most relevant tissue/cell type if input a list of hits from a cell-based screen (eg. fitness genes), which allows the user to design the in vivo experiment accordingly as the next step. User might also use a list of specific gene group eg transcription factors (TFs) to identify the TFs that are expressing in the tissue/cell type of interest for follow up study.
The species and type of gene identifier of input genes need to be specified. Gene identifiers can be official gene symbol, entrez geneid or species-specific gene id (eg. FBgn for Drosophila genes and MGI_ID for mouse genes).

Information about specie specific gene identifier

Species id	Species name	Short name	Common name	Species specific database	Example of species specific ID	Website
7165	Anopheles gambiae	ag	Mosquito	vectorBase	AGAP012829	https://vectorbase.org/
7227	Drosophila melanogaster	dm	Fly	FlyBase	FBgn0260768	https://flybase.org/
7955	Danio rerio	dr	Zebrafish	ZFIN	ZDB-GENE-010525-1	https://zfin.org/
9606	Homo sapiens	hs	Human	HGNC	10604	https://www.genenames.org/
10090	Mus musculus	mm	Mouse	MGI	1196256	https://www.informatics.jax.org/

Since enrichment is computationally intensive process, we separated the process of geneid mapping and enrichment. We suggest user using gene id mapping page first to map gene identifiers to the same version of gene identifiers used by DRscDB. This is particularly important if FBgns are used (FlyBase gene identifiers).
User has the option to filter out datasets by tissue and species.
User has the option to select the number of top marker genes (default is top 100)

2.) Result page

The enrichment result is summarized in a table which displays the information about the enriched gene sets as well as the enrichment statistics such as the fold enrichment, p values and adjusted p values and the overlapping genes. User has the option to filter the result by different P value cut-off. This table can be sorted by different parameters eg. fold change, P value, adjusted P value and can be exported.
User has the option to view detailed info about each gene set by clicking on the gene set name. Both the full list of marker genes as well as the overlapping genes are displayed.
Bar graph is used to visualized the enrichment result. User has the option to customize the bar graph by selecting a subset of gene sets, showing different parameters (eg. fold change, p value) as well as using different color scheme. The image of bar graph can be exported.

Use Case 3 – Multiple Gene Lists Enrichment

1.) Landing page:

User can analyze multiple gene lists together eg. marker genes of each cluster from newly obtained single-cell RNA-seq dataset to compare to a relevant dataset and assign cell types based on existing dataset. User might also compare the markers from one dataset, for example, the markers of blood cells from “A single-cell survey of Drosophila blood” (pubmed_ID =32396065) with the markers from a similar study “Single-cell transcriptome maps of myeloid blood cell lineages in Drosophila” (pubmed_id =32900993) to validate data reproducibility. In addition, if user does this type of analysis across the dataset from a different species, the result can facilitate the discovery of evolutionarily conserved transcriptomic architecture.
User needs to format the information into two columns separated by “,” or tab with the 1st column for group name (eg. cluster name of a scRNA-seq dataset) while the 2nd column for gene identifiers. The species and the type of gene identifiers of the input need to be specified. An example of input is provided at the landing page.
User has the option to select the number of top marker genes.
User needs to select the publication.

2.) Result page:

The enrichment result comparing all input gene lists (eg. 17 lists) with the marker genes of all the clusters (eg. 10 clusters) from selected publication is summarized in a table format (17x10) displaying the enrichment P values and fold enrichment values.
User has the option to visualize the result by heatmap or dot plot, which can be exported and can be customized by selecting different parameters (1log10 P value, fold enrichment, percent overlapping genes) and by adjusting the order of rows/columns.