Online GESS Tool

(Version 1.1.2) details

BACKGROUND

RNAi is a widely used and valuable genetics tool to study gene function, but it is vulnerable to some off-target effects representing a significant source of false positive results. Previously a bioinformatics method, genome-wide enrichment of seed sequence matches (GESS), was developed to identify candidate off-targetted transcripts from primary screen results. The algorithm reveals microRNA (miRNA)-like effects by seed region analysis. For more information please see A bioinformatics method identifies prominent off-targeted transcripts in RNAi screens. GESS online tool was developed to provide a quick and an easy to use user interface to run the GESS algorithm.

Figure 1: RNAi On-Target Effects versus RNAi miRNA like Off-Target Effects

 microRNA (miRNA)-like effects
        can be count as of such effects that seed regions of RNAi reagents bind to mostly 3UTR and generate false
        positive results (see figure 1).

GUIDE TO USE ONLINE GESS TOOL

Part 1: User Input

1- Upload si/shRNA file :

The user should upload a tab or comma separated text file or Excel file containing si/shRNA data. If both active and inactive si/shRNAs are provided, the input file should contain three columns; first column with si/shRNA identifiers, second column with si/shRNA sequences and the third column with corresponding phenotype information. If only active si/shRNAs are provided, the input file should have two columns; one for si/shRNA identifiers and one for the sequences. No phenotype data is needed in this case.

Case 1: Input file contains both active and inactive si/shRNAs

If a user is providing both active and inactive si/shRNAs, she/he has to submit a file containing at least three columns in the following order; a sequence identifier, si/shRNA sequence and corresponding phenotype information. Please see below for accepted input for each column.

Accepted input for each column in an si/shRNA file containing both active and inactive RNAi
Any identifier given by the user
For example: sequence_1, D-200-200, etc.
RNAi reagent sequence, can be sense strand or antisense strand
For example: tttgggcatccgcctgtaaa , CGACAGAAGCAUUCCCUAU, etc.
Phenotype data to distiguish "active" and "inactive" RNAi
To indicate active RNAi: "YES", "TRUE", 1, (or any number equal to or greater than 1)
To indicate inactive RNAi: "NO", "FALSE", 0, (or any number smaller than 1)
Example Layout for Input file contains both active and inactive si/shRNAs
1 GCAGCTTCATAACCGAAGA Yes
2 GAGCAGCCCTTTAAGGATT Yes
3 GAGCAGCCCTGGAAGGAC No
Case 2: Input file contains only active si/shRNAs

In this case the program will assume that the user provides only active si/shRNAs in the input file and it will generate a set of theoretical inactive si/shRNA seed sequences. To create an inactive set, the last nucleotide of each seed sequence will be changed to its compliment. As a result, there will be equal number of active si/shRNA seed sequences and inactive si/shRNA seed sequences to analyze.

A sample layout for input file without any phenotype/activity data can be seen below. Please see above to see accepted input formats.

Input file contains only active si/shRNAs
1 GCAGCTTCATAACCGAAGA
2 GAGCAGCCCTTTAAGGATT
3 GAGCAGCCCTGGAAGGAC

Indicate Input Strand: Users have to state whether their RNAi reagent sequences are sense or antisense strands.

Indicate RNAi Reagent Type: GESS analysis can be done using siRNA or shRNA sequences. If the input contains shRNA seqeunces, it is possible to trim them by two or three nucleotides. If shRNA is selected another option pops up in the user interface (see below image) to make sure if the user wants to trim shRNA sequences or not. If one of the trim options is selected, the program removes the required number (chosen by user, two or three) of nucleotides from shRNA sequences.

Figure 2: shRNA Trimming Options

2- Choose Reference Data Types or Upload a Custom Database file:

The user can either choose to use the built-in database; available for human, mouse and fly; or provide a custom database file.

If the user wants to use the built-in database, she/he can choose the organism (Human Mouse or Fly) and transcript region (3’UTR, 5’UTR, CDS, Full Transcript for Protein Coding Genes, Full Transcript for All Genes) to search for the seed matches. The default values are “Human” for organism and “3'UTR” for region.

If the user wants to upload a custom database file, she/he can do the GESS analysis for any organism of interest. The file should have FASTA formatted sequences. A sample file can be seen here.

3- Options:

The user can change the parameters of the program and make it more or less stringent.

4- Advanced Options:

5- Job Id:

Providing a job identifier for a GESS analysis is optional; if it is available, the resulting files will be named according to it. Otherwise, the program will randomly create an identifier for the analysis.

6- Email:

The user should provide an email address in order to get the GESS analysis results. If the analysis results significant outcome, two text files will be sent to the user.

Part 2: Results

1- File with GESS analysis results:

This file contains the basic GESS analysis results. A sample file can be seen here. Each tested sequence is listed in this file in a line with the corresponding GESS analysis results. Detailed explanation of GESS analysis results table can ben found below.

Identifier for the tested sequence. If the built in database was used, version number is used in here.

Gene symbol for mouse and human, FlyBase Identifier for fly. If custom database is provided, it will be empty.

Tested transcripts are ranked from the one with lowest P value (rank=1) to the one with highest P value (rank = A, the number of sequences tested.)

Seed Match Frequency of the active si/shRNAs

Seed Match Frequency of the inactive si/shRNAs

Seed Match Enrichment (Seed Match Frequency of the active si/shRNAs / Seed Match Frequency of the inactive si/shRNAs )

Active RNAi: Enrichment is among active RNAi reagents

Inactive RNAi: Enrichment is among inactive RNAi reagents

Number of active si/shRNAs that have seed matches to the tested sequence

Number of active si/shRNAs that do NOT have seed matches to the tested sequence

Number of inactive si/shRNAs that have seed matches to the tested sequence

Number of inactive si/shRNAs that do NOT have seed matches to tested sequence

If one of the si/shRNA categories (siPhenMatch, siPhenNoMatch, siNoPhenMatch, siNoPhenNoMatch) has 20 events or less, the FisherExactTest p-value will be used instead of the Yates Chi Square p-value.

Either Yates Chi-Square p-value or Fisher Exact Test p-value depending on the p-value selected method.

p-value adjusted according to Bonferroni correction method.

p-value adjusted according to Bonferroni Step-down correction method.

p-value adjusted according to Benjamini & Hochberg correction method.

"Yes" if statistically significant according to Bonferroni correction method, "No" otherwise.

"Yes" if statistically significant according to Bonferroni Step-down correction method, "No" otherwise.

"Yes" if statistically significant according to Benjamini and Hochberg correction method, "No" otherwise.

α / A

α / (A + 1 - rank of sequence)

α * rank of sequence / A


2- File with tested sequences and matching si/shRNAs:

This file contains tested sequences and identifiers of matching "active" si/shRNAs. Tested sequences and matching active si/shRNAs are mapped in a line in this file. Sequence identifiers are ordered according to their ranks in the results file and only the significant results are reported here (p value cut-off 0.05). A sample file can be seen here.

Error Handling

1- Errors Detected While Pre-processing the Input si/shRNA File :

GESS program pre-processes input si/shRNA file and if detects errors in more than 25% of the si/shRNA data, it displays a warning message on the screen. In this case GESS analysis would not been started, users have to fix the errors in their input file. Error type and the row numbers with invalid data are listed on the UI. Users can see the content of each row by hovering their mouse on the little box with row number. Please see the screen shot below.


Figure 4: Error Page



If the error rate is less than or equal to 25% in the input file, the program ignores the invalid data and does the analysis using the valid data only. When the analysis is completed, the user is informed about ignored si/shRNA sequences via email.

2- Errors Detected after GESS analysis started

If the GESS analysis fails for a reason after successfully submitting an input to the tool, an email is sent to user to inform him/her about the failure.

Version Details

Version 1.1.2 Version 1.1.1

Version 1.1