Title: | Gene Set Uncertainty in Enrichment Analysis |
---|---|
Description: | Toolbox for various enrichment analysis methods and quantification of uncertainty of gene sets, Schmid et al. (2016) <doi:10.1093/bioinformatics/btw030>. |
Authors: | Florian Schmid [aut], Christoph Muessel [aut], Johann M. Kraus [aut], Hans A. Kestler [aut], Hans A. Kestler [cre] |
Maintainer: | Hans A. Kestler <[email protected]> |
License: | Artistic-2.0 |
Version: | 1.3.4 |
Built: | 2025-03-01 05:38:05 UTC |
Source: | https://github.com/cran/GiANT |
Toolbox for gene set analysis of uncertain gene sets.
Package: | GiANT |
Type: | Package |
Version: | 1.3 |
Date: | 2020-04-29 |
License: | Artistic-2.0 |
LazyLoad: | yes |
This package provides an approach for evaluating the fuzziness of a gene set. This is done by repeatedly performing gene set analyses on slightly modified versions of the gene set and comparing their enrichment scores. A utility for such uncertainty tests is provided in the evaluateGeneSetUncertainty
function.
The package also comprises a generic framework for different types of enrichment analyses (Ackermann and Strimmer). It establishes a customizeable pipeline that typically consists of the following steps:
Calculation of gene-level statistics: A gene-level statistic scores the relationship between the measurements for a specific gene and the class labels. Typical measures include correlation coefficients, the t statistic or the fold change between the groups (see gls
for gene-level statistics included in the package).
Transformation of gene-level statistic values: Optionally, the gene-level statistic values can be postprocessed, e.g. by taking the absolute value or the square for correlation values or by binarizing or ranking values. See transformation
for transformations included in the package.
Calculation of gene set statistics: Based on the (possibly transformed) gene-level statistics, the gene set(s) of interest is/are scored. Examples are the median, the mean or the enrichment score of the gene-level statistic values in the gene set(s). See gss
for gene set statistics included in the package.
Significance assessment: To assess the significance of the gene set statistic value(s) with respect to a null distribution, computer-intensive tests are performed. These tests repeatedly sample random label vectors or gene sets and calculate their gene set statistic values. These values can then be compared to the true gene set statistics. See significance
for significance assessment methods included in the package.
The package represents such analysis pipelines as configuration objects that can be created using the function gsAnalysis
. For predefined state-of-the-art methods, such as Gene Set Enrichment Analysis (Subramanian et al), Overrepresentation Analysis or Global Ancova (Hummel et al), it provides predefined configurations (see predefinedAnalyses
).
The main function for standard gene set analyses, geneSetAnalysis
, performs enrichment analyses based on pipeline configuration objects.
Florian Schmid, Christoph Müssel, Johann M. Kraus, Hans A. Kestler
Maintainer: Hans A. Kestler <[email protected]>
Ackermann, M., Strimmer, K. (2009) A general modular framework for gene set enrichment analysis. BMC Bioinformatics, 10(1), 47.
Subramanian, A., Tamayo, P., Mootha, V. K., Mukherjee, S., Ebert, B. L., Gillette, M. A., Paulovich, A., Pomeroy, S. L., Golub, T. R., Lander, E. S., Mesirov, J. P. (2005) Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Science of the United States of America, 102, 15545-15550.
Hummel, M., Meister, R., Mansmann, U. (2008) Globalancova: exploration and assessment of gene group effects. Bioinformatics, 24(1), 78–85.
data(exampleData) ################################## # Example 1: gene set analysis # ################################## res <- geneSetAnalysis( # parameters for geneSetAnalysis dat = countdata, geneSets = pathways[1], analysis = analysis.averageCorrelation(), adjustmentMethod = "fdr", # additional parameters for analysis.averageCorrelation labs = labels, method = "pearson", numSamples = 50) summary(res, mode="table") #################################### # Example 2: uncertainty analysis # #################################### resUncertainty <- evaluateGeneSetUncertainty( # parameters for evaluateGeneSetUncertainty dat = countdata, geneSet = pathways[[3]], analysis = analysis.averageCorrelation(), numSamplesUncertainty = 5, blockSize = 1, k = seq(0.1,0.9,by=0.1), # additional parameters for analysis.averageCorrelation labs = labels, numSamples = 5) plot(resUncertainty, main = names(pathways[3]))
data(exampleData) ################################## # Example 1: gene set analysis # ################################## res <- geneSetAnalysis( # parameters for geneSetAnalysis dat = countdata, geneSets = pathways[1], analysis = analysis.averageCorrelation(), adjustmentMethod = "fdr", # additional parameters for analysis.averageCorrelation labs = labels, method = "pearson", numSamples = 50) summary(res, mode="table") #################################### # Example 2: uncertainty analysis # #################################### resUncertainty <- evaluateGeneSetUncertainty( # parameters for evaluateGeneSetUncertainty dat = countdata, geneSet = pathways[[3]], analysis = analysis.averageCorrelation(), numSamplesUncertainty = 5, blockSize = 1, k = seq(0.1,0.9,by=0.1), # additional parameters for analysis.averageCorrelation labs = labels, numSamples = 5) plot(resUncertainty, main = names(pathways[3]))
Randomly generated example data representing a count matrix
data(exampleData)
data(exampleData)
The data consists of a variable countdata
with a matrix with 1113 genes and 96 samples. The count values are randomly generated numbers.
data(exampleData) print(countdata)
data(exampleData) print(countdata)
Creates a data frame summarizing an analysis. This table has one row per gene set, each comprising the adjusted and unadjusted p-values and the number of genes for the set.
createSummaryTable(object, orderBy = c("adjustedPValues", "rawPValues", "geneSetName"), significantOnly = FALSE, signLevel = object$signLevel)
createSummaryTable(object, orderBy = c("adjustedPValues", "rawPValues", "geneSetName"), significantOnly = FALSE, signLevel = object$signLevel)
object |
A result object as returned by |
orderBy |
Specifies which field should be used for the row ordering. By default, rows are ordered according to the adjusted p-values. |
significantOnly |
Specifies whether all gene sets ( |
signLevel |
If |
A data frame with one row for each included gene set and the columns "adjustedPValues", "rawPValues", "geneSetName" and "geneSetSize". For overrepresentation analyses, there is an additional column "intersectSize" specifying the size of the intersection of the core set and the corresponding gene set.
geneSetAnalysis
, hist.gsaResult
, summary
# load data data(exampleData) # perform gene set analyses for several pathways res <- geneSetAnalysis( # global parameters dat = countdata, geneSets = pathways, analysis = analysis.averageCorrelation(), # additional parameters for analysis.averageCorrelation labs = labels, numSamples = 10) tab <- createSummaryTable(res)
# load data data(exampleData) # perform gene set analyses for several pathways res <- geneSetAnalysis( # global parameters dat = countdata, geneSets = pathways, analysis = analysis.averageCorrelation(), # additional parameters for analysis.averageCorrelation labs = labels, numSamples = 10) tab <- createSummaryTable(res)
A robustness measure that quantifies the uncertainty of a gene set by performing a resampling experiment and can be used in the robustness
parameter of gsAnalysis
.
evaluateGeneSetUncertainty( ..., dat, geneSet, analysis, numSamplesUncertainty, blockSize = 1, k = seq(0.01, 0.99, by=0.01), signLevel = 0.05, preprocessGeneSet = FALSE, cluster = NULL)
evaluateGeneSetUncertainty( ..., dat, geneSet, analysis, numSamplesUncertainty, blockSize = 1, k = seq(0.01, 0.99, by=0.01), signLevel = 0.05, preprocessGeneSet = FALSE, cluster = NULL)
... |
Additional parameters for the different steps of the analysis pipeline, depending on the concrete configuration supplied in |
dat |
A numeric matrix of gene expression values for all analyzed genes. Here, each row corresponds to one gene, and each column corresponds to one sample. The rows must be named with the gene names used in the gene sets. |
geneSet |
A vector containing the names of genes in a gene set. All genes set must correspond to the row names of |
analysis |
The parameters of the analysis that is applied to the perturbed copies of the gene set. These parameters are described by an object of class |
numSamplesUncertainty |
The number of resampling experiments which should be applied to estimate the robustness of |
blockSize |
Number of genes in one resampled block. |
k |
A |
signLevel |
The significance level for the significance assessment of the gene sets (defaults to |
preprocessGeneSet |
Specifies whether the gene sets in |
cluster |
If the analyses should be applied in parallel for the different values of |
The uncertainty analysis repeatedly replaces parts of the original gene sets by random genes and calculating the gene set statistics for these randomized gene sets. This yields a distribution of gene set statistic values for slightly modified variants of the original gene set.
Returns a list (of class uncertaintyResult
) with the following elements:
uncertainty |
The calculated stability of the original gene set. |
confidenceValues |
A matrix of quantiles of |
uncertaintyEvaluations |
A list with one entry per value in
|
signLevel |
The significance level used for this analysis. |
originalGeneSetValues |
Result of |
geneSetAnalysis
, gsAnalysis
, gls
, transformation
, gss
, plot.uncertaintyResult
data(exampleData) res <- evaluateGeneSetUncertainty( # parameters for evaluateGeneSetUncertainty dat = countdata, geneSet = pathways[[1]], analysis = analysis.averageCorrelation(), numSamplesUncertainty = 10, k = seq(0.1,0.9, by=0.1), # additional parameters for analysis.averageCorrelation labs = labels, numSamples = 10)
data(exampleData) res <- evaluateGeneSetUncertainty( # parameters for evaluateGeneSetUncertainty dat = countdata, geneSet = pathways[[1]], analysis = analysis.averageCorrelation(), numSamplesUncertainty = 10, k = seq(0.1,0.9, by=0.1), # additional parameters for analysis.averageCorrelation labs = labels, numSamples = 10)
Filters gene sets according to different criteria.
filterGeneSets( geneSets, includedGenes = NULL, minIntersectSize = length(includedGenes), adjMatrix = NULL, steps = NULL)
filterGeneSets( geneSets, includedGenes = NULL, minIntersectSize = length(includedGenes), adjMatrix = NULL, steps = NULL)
geneSets |
A list of gene sets, where each gene set is a vector of gene names corresponding to the row names of |
includedGenes |
A vector of gene names whose occurence in each of the gene sets is checked. The further parameters how these genes are used to filter gene sets |
minIntersectSize |
If this parameter is not |
adjMatrix |
An optional adjacency matrix in which an entry is |
steps |
The maximum distance of interacting genes to the genes in |
Returns a filtered list of gene sets with the same structure as geneSets
.
geneSets <- list( gs1 = paste("gene",1:20,sep=""), gs2 = paste("gene",50:60,sep=""), gs3 = paste("gene",90:92,sep=""), gs4 = paste("gene",55:65,sep="") ) newGeneSets1 <- filterGeneSets( geneSets = geneSets, includedGenes = c("gene55","gene60")) newGeneSets2 <- filterGeneSets( geneSets = geneSets, includedGenes = c("gene1","gene55","gene20","gene100"), minIntersectSize = 2) examplePathway <- c("gene1","gene2","gene3","gene4") pathwayAdjMatrix <- matrix(0,100,100) rownames(pathwayAdjMatrix) <- paste("gene",1:100,sep="") colnames(pathwayAdjMatrix) <- paste("gene",1:100,sep="") # gene1 interacts with gene2 and gene3 # -> step 1 if gene1 is starting point pathwayAdjMatrix[1,2:3] <- 1 pathwayAdjMatrix[2:3,1] <- 1 # gene3 interacts with gene4 # -> step 2 if gene1 is starting point pathwayAdjMatrix[3,4] <- 1 pathwayAdjMatrix[4,3] <- 1 newGeneSets3 <- filterGeneSets( geneSets = geneSets, includedGenes = "gene1", adjMatrix = pathwayAdjMatrix, steps = 2)
geneSets <- list( gs1 = paste("gene",1:20,sep=""), gs2 = paste("gene",50:60,sep=""), gs3 = paste("gene",90:92,sep=""), gs4 = paste("gene",55:65,sep="") ) newGeneSets1 <- filterGeneSets( geneSets = geneSets, includedGenes = c("gene55","gene60")) newGeneSets2 <- filterGeneSets( geneSets = geneSets, includedGenes = c("gene1","gene55","gene20","gene100"), minIntersectSize = 2) examplePathway <- c("gene1","gene2","gene3","gene4") pathwayAdjMatrix <- matrix(0,100,100) rownames(pathwayAdjMatrix) <- paste("gene",1:100,sep="") colnames(pathwayAdjMatrix) <- paste("gene",1:100,sep="") # gene1 interacts with gene2 and gene3 # -> step 1 if gene1 is starting point pathwayAdjMatrix[1,2:3] <- 1 pathwayAdjMatrix[2:3,1] <- 1 # gene3 interacts with gene4 # -> step 2 if gene1 is starting point pathwayAdjMatrix[3,4] <- 1 pathwayAdjMatrix[4,3] <- 1 newGeneSets3 <- filterGeneSets( geneSets = geneSets, includedGenes = "gene1", adjMatrix = pathwayAdjMatrix, steps = 2)
The main function of the package that performs a gene set analysis for a list of gene sets.
geneSetAnalysis( ..., dat, geneSets, analysis, signLevel = 0.05, preprocessGeneSets = FALSE, adjustmentMethod = p.adjust.methods, cluster = NULL)
geneSetAnalysis( ..., dat, geneSets, analysis, signLevel = 0.05, preprocessGeneSets = FALSE, adjustmentMethod = p.adjust.methods, cluster = NULL)
... |
Additional parameters for the different steps of the analysis pipeline, depending on the concrete configuration supplied in |
dat |
A numeric matrix of gene expression values for all analyzed genes. Here, each row corresponds to one gene, and each column corresponds to one sample. The rows must be named with the gene names used in the gene sets. |
geneSets |
A list of gene sets, where each gene set is a vector of gene names corresponding to the row names of |
analysis |
An object of type |
signLevel |
The significance level for the significance assessment of the gene sets (defaults to |
preprocessGeneSets |
Specifies whether the gene sets in |
adjustmentMethod |
The method to use for the adjustment for multiple testing (see |
cluster |
If the analyses should be applied in parallel for the gene sets, this parameter must hold an initialized cluster as returned by |
This is the main interface function of the package for gene set enrichment analyses. Analyses usually consist of a pipeline of steps. Often, the first step is the calculation of a summary statistic for the relation of each gene to the class labels. These values or transformations thereof are employed to calculate a gene set statistic for each of the supplied gene sets. The significance of gene set enrichments can be determined according to different methods, and the robustness of gene sets can be evaluated by slightly modifying the gene sets. To provide a flexible mechanism for the plethora of different approaches arising from the different choices, basic pipeline configurations are encapsulated in gsAnalysis
objects which can be created using the gsAnalysis
function. Ready-to-use configuration objects for certain well-known methods are included in the package (see predefinedAnalyses
). Parameters of the chosen analysis pipeline can be set in the ...
parameter.
An object of the type gsaResult
with the following elements:
adjustedPValues |
A vector of p-values, one for each gene set. These values are already adjusted for multiple testing according to the |
rawPValues |
The raw unadjusted p-values, one for each gene set. |
res.all |
A list comprising the detailed results for each gene set. Each element of this list is another list with the following components:
|
signLevel |
The significance level used for this analysis. |
analysis |
The performed analysis (of type |
analysisType |
A character string identifying the analysis as an enrichment analysis pipeline ( |
adjustmentMethod |
The method used to adjust the p-values in |
Ackermann, M., Strimmer, K. (2009) A general modular framework for gene set enrichment analysis. BMC Bioinformatics, 10(1), 47.
gsAnalysis
, gls
, transformation
, gss
, global
, significance
, evaluateGeneSetUncertainty
, hist.gsaResult
, preprocessGs
# load data data(exampleData) # apply predefined analysis for gene set enrichment analysis res <- geneSetAnalysis( # parameters for geneSetAnalysis dat = countdata, geneSets = pathways[1], analysis = analysis.averageCorrelation(), adjustmentMethod = "fdr", # additional parameters for analysis.averageCorrelation labs = labels, method = "pearson", numSamples = 10)
# load data data(exampleData) # apply predefined analysis for gene set enrichment analysis res <- geneSetAnalysis( # parameters for geneSetAnalysis dat = countdata, geneSets = pathways[1], analysis = analysis.averageCorrelation(), adjustmentMethod = "fdr", # additional parameters for analysis.averageCorrelation labs = labels, method = "pearson", numSamples = 10)
Functions to calculate the gene-level statistic, as used in the gls
parameter of gsAnalysis
.
A gene-level statistic calculates a measure of correlation between the expression of a gene and the class labels.
gls.cor(dat, labs, method = "pearson") gls.regression(dat, labs) gls.foldChange(dat, labs, logMeasurements = TRUE) gls.tStatistic(dat, labs, pValue = FALSE, alternative = "two.sided") gls.moderateTStatistic(dat,labs) gls.nBinomTest(dat, labs, returnValue = c("pval", "qval", "foldChange", "log2FoldChange"), dispersionMethod = "blind", dispersionSharingMode = "fit-only", dispersionFitType = "local")
gls.cor(dat, labs, method = "pearson") gls.regression(dat, labs) gls.foldChange(dat, labs, logMeasurements = TRUE) gls.tStatistic(dat, labs, pValue = FALSE, alternative = "two.sided") gls.moderateTStatistic(dat,labs) gls.nBinomTest(dat, labs, returnValue = c("pval", "qval", "foldChange", "log2FoldChange"), dispersionMethod = "blind", dispersionSharingMode = "fit-only", dispersionFitType = "local")
dat |
A numeric matrix of gene expression values for all analyzed genes. Here, each row corresponds to one gene, and each column corresponds to one sample. The rows must be named with the gene names used in the gene sets. |
labs |
A vector of class labels for the samples in |
logMeasurements |
For |
method |
For |
pValue |
For |
alternative |
For |
returnValue |
For |
dispersionMethod |
For |
dispersionSharingMode |
For |
dispersionFitType |
For |
Standard functions for the calculation of gene-level statistics (to be used in an analysis pipeline defined by gsAnalysis
):
gls.cor
: Calculates the correlation of the gene expression values to the class labels.
gls.regression
: Calculates the slope of a linear regression of the gene expression values and the class labels.
gls.foldChange
: Calculates the (standard or log2) fold change between the measurements for the two classes.
gls.tStatistic
: Calculates the p-value or the statistic of a two-sample t test for the measurements of the two classes
gls.moderateTStatistic
: Calculates the moderate t statistic for the measurements of the two classes
gls.nBinomTest
: Applies the negative binomial test for sequencing data based on the DESeq2 package to test for differences between two classes (see results
).
Each of these function returns a numeric vector of gene-level statistics (one entry per gene).
geneSetAnalysis
, gsAnalysis
, gss
, transformation
Functions to calculate a gene set statistic, as used in the gss
parameter of gsAnalysis
. A gene set statistic summarizes a single gene set.
gss.mean(x, geneSetIndices) gss.sum(x, geneSetIndices) gss.wilcoxonRankTest(x, geneSetIndices) gss.maxmean(x, geneSetIndices) gss.median(x, geneSetIndices) gss.enrichmentScore(x, geneSetIndices, p = 1) gss.fisherExactTest(x, geneSetIndices) gss.gsz(x, geneSetIndices, w1 = 0.2, w2 = 0.5, preVar = 0, varConstant = 10)
gss.mean(x, geneSetIndices) gss.sum(x, geneSetIndices) gss.wilcoxonRankTest(x, geneSetIndices) gss.maxmean(x, geneSetIndices) gss.median(x, geneSetIndices) gss.enrichmentScore(x, geneSetIndices, p = 1) gss.fisherExactTest(x, geneSetIndices) gss.gsz(x, geneSetIndices, w1 = 0.2, w2 = 0.5, preVar = 0, varConstant = 10)
x |
A vector comprising one numeric value for each gene in the data set. This vector is usually obtained from the previous step, the gene-level statistic (see |
geneSetIndices |
A vector containing the indices of the genes in the gene set with respect to the full gene set (i.e., the indices of the rows containing the measurements for these genes in |
p |
Factor for |
w1 |
Weight for the median of the variance estimates for a gene set of size |
w2 |
Weight for the median of the variance estimates for a gene set across the whole gene list. Should be between 0 and 1. Default is |
preVar |
Parameter for incorporating the uncertainty of the observations. This is omitted by default ( |
varConstant |
Reference gene set size used for variance estimates. Default is |
Standard functions for the calculation of gene set statistics (to be used in an analysis pipeline defined by gsAnalysis
):
gss.mean
: Calculates the mean of the (transformed) gene-level statistic values for the genes in the set.
gss.sum
: Calculates the sum of the (transformed) gene-level statistic values for the genes in the set.
gss.wilcoxonRankTest
: Calculates a Wilcoxon test comparing the (transformed) gene-level statistic values for the genes in the set versus those of the genes not in the set.
gss.maxmean
: Calculates the maximum of the means of positive and negative statistic values, weighted by the overall proportion of positive/negative values (e.g. for correlation scores where the sign denotes the direction). Described in Efron and Tibshirani.
gss.median
: Calculates the median of the (transformed) gene-level statistic values for the genes in the set.
gss.enrichmentScore
: Calculates the enrichment score of the (transformed) gene-level statistic values for the genes in the set, as described in Subramanian et al.
gss.fisherExactTest
: Performs Fisher's exact test to check gene sets for overrepresentation in the differential genes. This should be used used in combination with the transformation transformation.adjustAndBinarize
.
gss.gsz
: Calculates the Gene Set Z-score of the (transformed) gene-level statistic values for the genes in the set. Described in Toronen et al.
Each method returns a single numeric value, the gene set statistic for the supplied gene set.
Efron, B., Tibshirani, R. (2007) On testing the significance of sets of genes. Annals of Applied Statistics, 1, 107-129.
Subramanian, A., Tamayo, P., Mootha, V. K., Mukherjee, S., Ebert, B. L., Gillette, M. A., Paulovich, A., Pomeroy, S. L., Golub, T. R., Lander, E. S., Mesirov, J. P. (2005) Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Science of the United States of America, 102, 15545–15550.
Toronen, P., Ojala, P. J., Marttinen, P., Holm L. (2009) Robust extraction of functional signals from gene set analysis using a generalized threshold free scoring function. BMC Bioinformatics, 10(1), 307.
geneSetAnalysis
, gsAnalysis
, gls
, transformation
Functions to perform global gene set analyses, as used in the globalStat
parameter of gsAnalysis
.
global.overrepresentation(dat, geneSet, coreSet) global.ancova(dat, geneSet, labs, ...) global.test(dat, geneSet, labs, ...)
global.overrepresentation(dat, geneSet, coreSet) global.ancova(dat, geneSet, labs, ...) global.test(dat, geneSet, labs, ...)
dat |
A numeric matrix of gene expression values for all analyzed genes. Here, each row corresponds to one gene, and each column corresponds to one sample. The rows must be named with the gene names used in the gene sets. |
geneSet |
A gene set in form of a vector of gene names corresponding to the row names of |
coreSet |
A gene set of interest resulting from an analysis of |
labs |
A vector of class labels for the samples in |
... |
Further parameters for |
Wrapper functions for global gene set analyses.
global.overrepresentation
: This function performs an overrepresentation analysis by rating the overlap of geneSet
and coreSet
with respect to the set of all genes using Fisher's exact test.
global.ancova
: This function performs a global gene set enrichment analysis using the global ANCOVA method by Hummel et al. It wraps the GlobalAncova
function in the GlobalAncova package.
global.test
: This function performs a global gene set enrichment analysis using a global test by Goeman et al. It wraps the gt
function in the globaltest package.
A list containing the following items:
pValue |
The p-value for the significance of |
intersectGeneSetCoreSet |
This element is only returned in case of an overrepresentation analysis and consists of a vector of genes included in both sets ( |
res.all |
The full result object returned by |
Hummel, M., Meister, R., Mansmann, U. (2008) GlobalANCOVA: exploration and assessment of gene group effects. Bioinformatics, 24(1), 78–85.
Goeman, J. J., van de Geer, S. A., de Kort, F., van Houwelingen, H. C. (2004) A global test for groups of genes: testing association with a clinical outcome. Bioinformatics, 20(1), 93–99.
Defines the configuration of an analysis that can be performed using geneSetAnalysis
, and returns it as a wrapper object.
gsAnalysis(name, gls = NULL, glsParameterNames = NULL, transformation = NULL, transformationParameterNames = NULL, gss = NULL, gssParameterNames = NULL, globalStat = NULL, globalStatParameterNames = NULL, significance = NULL, significanceParameterNames = NULL, testAlternative = c("greater", "less"))
gsAnalysis(name, gls = NULL, glsParameterNames = NULL, transformation = NULL, transformationParameterNames = NULL, gss = NULL, gssParameterNames = NULL, globalStat = NULL, globalStatParameterNames = NULL, significance = NULL, significanceParameterNames = NULL, testAlternative = c("greater", "less"))
name |
A character string describing the analysis. |
gls |
The name of the function that calculates the gene-level statistic for a given dataset. If set to |
glsParameterNames |
A character vector of names of the parameters used by the gene-level statistic defined in |
transformation |
The name of the function that transforms the gene-level statistics values. If set to |
transformationParameterNames |
A character vector of names of the parameters used by the transformation defined in |
gss |
The name of the function that calculates the gene set statistics from untransformed or transformed gene-level statistic values. If set to |
gssParameterNames |
A character vector vector of names of the parameters used by the gene set statistic defined in |
globalStat |
If the gene set analysis consists of a global test and cannot be divided into the calculation of a gene-level statistic, a transformation and the calculation of the gene set statistics, this parameter defines the name of the function that performs this global analysis. In this case, the parameters |
globalStatParameterNames |
A character vector names of the parameters used by the global analysis defined in |
significance |
The name of a method that performs a significance assessment for the gene set statistic values. If set to |
significanceParameterNames |
A character vector of names of the parameters used by the significance assessment method defined in |
testAlternative |
Specifies the alternative hypothesis of the significance test for the gene set enrichment, which may be dependent on the chosen gene set statistic. Must be one of |
The function provides a way of flexibly defining the steps of the gene set analysis pipeline. This pipeline consists of a subset of the following steps, each of which may have specific parameters:
Gene-level analysis: A gene-level statistic scores the relationship between the measurements for a specific gene and the class labels. Typical measures include correlation coefficients, the t statistic or the fold change between the groups (see gls
for gene-level statistics included in the package).
Transformation of gene-level statistics: Optionally, the gene-level statistic values can be postprocessed, e.g. by taking the absolute value or the square for correlation values or by binarizing or ranking values. See transformation
for transformations included in the package.
Gene set analysis: Based on the (possibly transformed) gene-level statistics, the gene set(s) of interest is/are scored. Examples are the median, the mean or the enrichment score of the gene-level statistic values in the gene set(s). See gss
for gene set statistics included in the package.
Significance assessment: To assess the significance of the gene set statistic value(s) with respect to a null distribution, computer-intensive tests are performed. These tests repeatedly sample random label vectors or gene sets and calculate their gene set statistic values. These values can then be compared to the true gene set statistics. See significance
for significance assessment methods included in the package.
Global analysis: As an alternative to the above pipeline steps, it is possible to define a single, global method that directly calculates an enrichment p-value for a supplied data set and gene set. See global
for the global analysis tests included in the package.
Several state-of-the-art analyses have predefined configuration objects in which the above steps are defined accordingly (see predefinedAnalyses
).
An object of class gsAnalysis
with components corresponding to the above parameters.
predefinedAnalyses
, geneSetAnalysis
, evaluateGeneSetUncertainty
, gls
, transformation
, gss
, global
, significance
data(exampleData) # defines an analysis that corresponds to gsAna1() gsa <- gsAnalysis( name = "averageCorrelation", gls = "gls.cor", glsParameterNames = c("labs","method"), transformation = "transformation.abs", transformationParameterNames = NULL, gss = "gss.mean", gssParameterNames = NULL, globalStat = NULL, globalStatParameterNames = NULL, significance = "significance.sampling", significanceParameterNames = c("numSamples"), testAlternative = "greater") print(gsa) # apply the previously defined analysis res <- geneSetAnalysis( # global parameters dat = countdata, geneSets = pathways[1], analysis = gsa, # parameters for the specific analysis gsAna1 labs = labels, numSamples = 10)
data(exampleData) # defines an analysis that corresponds to gsAna1() gsa <- gsAnalysis( name = "averageCorrelation", gls = "gls.cor", glsParameterNames = c("labs","method"), transformation = "transformation.abs", transformationParameterNames = NULL, gss = "gss.mean", gssParameterNames = NULL, globalStat = NULL, globalStatParameterNames = NULL, significance = "significance.sampling", significanceParameterNames = c("numSamples"), testAlternative = "greater") print(gsa) # apply the previously defined analysis res <- geneSetAnalysis( # global parameters dat = countdata, geneSets = pathways[1], analysis = gsa, # parameters for the specific analysis gsAna1 labs = labels, numSamples = 10)
Plots the distribution of gene set statistic values obtained in different resampling settings of an enrichment analysis, and draws the statistic value of the input set as a vertical line.
## S3 method for class 'gsaResult' hist(x, signLevel = x$signLevel, subset = NULL, ask = FALSE, addLegend = TRUE, ...)
## S3 method for class 'gsaResult' hist(x, signLevel = x$signLevel, subset = NULL, ask = FALSE, addLegend = TRUE, ...)
x |
A result of a call to |
signLevel |
The significance level that should be applied for the plots. Default is the significance level used for the analysis in |
subset |
Indices for the results that should be included in the diagram. |
ask |
If set to true, the plot function will prompt for a user input for each new plot that is shown on an interactive device (see |
addLegend |
If set to true (default), a |
... |
Other parameters which can be used for histograms (see |
The function plots the distribution of gene set statistic values under the null hypothesis. It requires the
significance assessment step of the enrichment analysis configuration (parameter significance
or gsAnalysis
) to be a computer-intensive testing procedure that yields a distribution of gene set statistic p-values under the null hypothesis. Predefined configurations for which this plot works are analysis.gsea
, analysis.averageCorrelation
and analysis.averageTStatistic
.
A histogram is plotted for the analysis in x
. If x
includes the analyses for several gene sets, one histogram is plotted for each of the gene sets.
The statistic value of the input set is depicted as a vertical line.
The most common graphical parameters can be supplied as vectors (one entry per analyzed gene set) to vary them between the different analyses. These parameters are: main
, xlab
, ylab
.
Returns a list with all the underlying data for the plotted histograms as invisible object.
geneSetAnalysis
, predefinedAnalyses
, gsAnalysis
, evaluateGeneSetUncertainty
, plot.uncertaintyResult
# load data data(exampleData) res <- geneSetAnalysis( # global parameters dat = countdata, geneSets = pathways[3], analysis = analysis.averageCorrelation(), # additional parameters for analysis.averageCorrelation labs = labels, p = 1, numSamples = 10) # plot the histogram for the cell cycle control gene set hist(res, main = names(pathways[3]))
# load data data(exampleData) res <- geneSetAnalysis( # global parameters dat = countdata, geneSets = pathways[3], analysis = analysis.averageCorrelation(), # additional parameters for analysis.averageCorrelation labs = labels, p = 1, numSamples = 10) # plot the histogram for the cell cycle control gene set hist(res, main = names(pathways[3]))
Randomly generated example data to give samples in countdata a class label 0 or 1
data(exampleData)
data(exampleData)
The data consists of a variable labels
with a vector of 96 values. The values are a random sequence of 0's and 1's.
data(exampleData) print(labels)
data(exampleData) print(labels)
Merges all probes belonging to the same gene by identifying duplicate row names in a data matrix.
mergeProbesForGenes(dat, method = c("mean", "max", "min", "median"))
mergeProbesForGenes(dat, method = c("mean", "max", "min", "median"))
dat |
A numeric matrix of gene expression values for all analyzed genes. Here, each row corresponds to one probe, and each column corresponds to one sample. The rows must be named with the gene names and may contain duplicates if multiple probes correspond to the same gene. |
method |
The method which should be used to merge probes entries in |
A matrix of the same structure as dat
, but possibly with fewer rows if probes were merged.
dat <- matrix(1:6, nrow=3, ncol=2) rownames(dat) <- c("g1", "g2", "g1") newDat <- mergeProbesForGenes(dat, method = "mean")
dat <- matrix(1:6, nrow=3, ncol=2) rownames(dat) <- c("g1", "g2", "g1") newDat <- mergeProbesForGenes(dat, method = "mean")
Parses a GMT file as downloadable from MSigDB (presented in Subramanian et al.) and returns a list of gene sets.
parseGmt(file)
parseGmt(file)
file |
A file name. |
Parses a GMT file and returns a list of gene sets. Each list element named according to the included gene set. The gene set files can be downloaded from http://www.broadinstitute.org/gsea/msigdb.
A named list
of gene sets.
Subramanian, A., Tamayo, P., Mootha, V. K., Mukherjee, S., Ebert, B. L., Gillette, M. A., Paulovich, A., Pomeroy, S. L., Golub, T. R., Lander, E. S., Mesirov, J. P. (2005) Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Science of the United States of America, 102, 15545–15550.
geneSetAnalysis
, predefinedAnalyses
, gsAnalysis
A list comprising exemplary pathways. Each pathway is represented by a vectro of random genes from countdata which are associated to this pathway.
data(exampleData)
data(exampleData)
The data consists of a variable pathways
of class list
with 9 exemplary pathways. Each element of the list corresponds to one pathway - represented by a vector of gene names which are involved in this pathway.
data(exampleData) print(pathways)
data(exampleData) print(pathways)
For each percentage of original gene set genes, the quantiles of the distribution obtained by a resampling simulation are plotted. Significance threshold (quantile of the Null distribution) and the test statistic of the original gene set are drawn as horizontal lines.
## S3 method for class 'uncertaintyResult' plot(x, signLevel = x$signLevel, addLegend = TRUE, addMinimalStability = FALSE, ...)
## S3 method for class 'uncertaintyResult' plot(x, signLevel = x$signLevel, addLegend = TRUE, addMinimalStability = FALSE, ...)
x |
A result of a call to |
signLevel |
Only results with significance level smaller than the given value are plotted. |
addLegend |
If set to true (default), a |
addMinimalStability |
If set to true, a line is added to the plot giving the minimal stability. |
... |
Other parameters which can be used for histograms (see |
The function plots the quantiles of the resampling distributions for evaluated degrees of fuzziness. It requires the
significance assessment step of the enrichment analysis configuration (parameter significance
or gsAnalysis
) to be a computer-intensive testing procedure that yields a distribution of gene set statistic values under the null hypothesis. Predefined configurations for which this plot works are analysis.gsea
, analysis.averageCorrelation
and analysis.averageTStatistic
.
Three lines, corresponding to the different qunatiles with one dot per fuzziness evaluation (k
) are plotted for the analysis in x
. The significance threshold is shown as a green horizontal line. The statistic value of the original input set is depicted as a red horizontal line.
If addMinimalStability
is TRUE
, the lower bound of the stability is ploted as a dotted line.
No return value, called for creating a plot.
geneSetAnalysis
, predefinedAnalyses
, gsAnalysis
, evaluateGeneSetUncertainty
# load data data(exampleData) res <- evaluateGeneSetUncertainty( # parameters for evaluateGeneSetUncertainty dat = countdata, geneSet = pathways[[1]], analysis = analysis.averageCorrelation(), numSamplesUncertainty = 10, N = seq(0.1,0.9, by=0.1), # additional parameters for analysis.averageCorrelation labs = labels, numSamples = 10) # plot the results for the cell cycle control gene set plot(res, addMinimalStability = TRUE)
# load data data(exampleData) res <- evaluateGeneSetUncertainty( # parameters for evaluateGeneSetUncertainty dat = countdata, geneSet = pathways[[1]], analysis = analysis.averageCorrelation(), numSamplesUncertainty = 10, N = seq(0.1,0.9, by=0.1), # additional parameters for analysis.averageCorrelation labs = labels, numSamples = 10) # plot the results for the cell cycle control gene set plot(res, addMinimalStability = TRUE)
Plots a Venn diagramm of the overlaps of the core set and gene sets in an overrepresentation analysis.
plotOverrepresentation( object, signLevel = object$signLevel, subset = NULL, aggregate = FALSE, ask = FALSE, ...)
plotOverrepresentation( object, signLevel = object$signLevel, subset = NULL, aggregate = FALSE, ask = FALSE, ...)
object |
A result of a call to |
signLevel |
Only results with significance level smaller than the given value are included in the venn diagram. |
subset |
Indices for the results that should be included in the diagram. |
aggregate |
Specifies whether all gene sets should be plotted in a single Venn diagram (which is possible for at most four gene sets) or whether there should be one Venn diagram for each gene set. |
ask |
If set to true, the plot function will prompt for a user input for each new plot that is shown on an interactive device (see |
... |
Further parameters to be passed to |
No return value, called for creating a plot.
geneSetAnalysis
, predefinedAnalyses
, gsAnalysis
data(exampleData) # use the absolute correlation as a gene-level statistic stat <- abs(apply(countdata,1,cor,y = labels)) # define the core set as the 25% genes with the highest correlation coreSet <- rownames(countdata)[tail(order(stat), 25)] # perform an overrepresentation analysis resOverrep <- geneSetAnalysis( dat = countdata, geneSets = pathways, analysis = analysis.customOverrepresentation(), coreSet = coreSet, adjustmentMethod = "fdr") # plot a Venn diagram plotOverrepresentation(resOverrep, subset = 1:3, aggregate = TRUE)
data(exampleData) # use the absolute correlation as a gene-level statistic stat <- abs(apply(countdata,1,cor,y = labels)) # define the core set as the 25% genes with the highest correlation coreSet <- rownames(countdata)[tail(order(stat), 25)] # perform an overrepresentation analysis resOverrep <- geneSetAnalysis( dat = countdata, geneSets = pathways, analysis = analysis.customOverrepresentation(), coreSet = coreSet, adjustmentMethod = "fdr") # plot a Venn diagram plotOverrepresentation(resOverrep, subset = 1:3, aggregate = TRUE)
Predefined analysis configurations that can be used in geneSetAnalysis
analysis.gsea() analysis.overrepresentation() analysis.customOverrepresentation() analysis.averageCorrelation() analysis.averageTStatistic() analysis.globalTest() analysis.globalAncova()
analysis.gsea() analysis.overrepresentation() analysis.customOverrepresentation() analysis.averageCorrelation() analysis.averageTStatistic() analysis.globalTest() analysis.globalAncova()
The above functions return configurations for state-of-the-art analysis pipelines that can be used in geneSetAnalysis
. All configurations are preconfigured collections of standard methods for the different pipeline steps. The following lists the methods chosen for the different steps and their parameters. For more detailed descriptions of these methods, please refer to the linked manual pages.
analysis.gsea
defines the Gene Set Enrichment Analysis (GSEA) method by Subramanain et al.
Here, the gene-level statistic the absolute correlation calculated by gls.cor
with the associated parameters labs, method
and a preprocessing by transformation.abs
. As a gene set statistic, the enrichment score (function gss.enrichmentScore
with parameter p
) is calculated. The significance is assessed in a permutation test using significance.permutation
with testAlternative = "greater"
and free parameter numSamples, labs
.
analysis.overrepresentation
calculates an overrepresentation analysis using the gene-level statistic gls.tStatistic
with parameters pValue
(should be TRUE
), alternative
and labs
. The resulting values are then transformed via transformation.adjustAndBinarize
(parameters are the adjMethod
and threshold
). Finally gss.fisherExactTest
is used as gene set statistic.
analysis.customOverrepresentation
calculates an overrepresentation analysis using a user-defined core set coreSet
. That is, instead of calculating this core set internally based on differential expression as the standard overrepresentation analysis, this function allows for defining custom core sets. It internally uses the global analysis global.overrepresentation
.
analysis.averageCorrelation
calculates the gene-level statistic as the absolute correlation using gls.cor
(with parameters labs, method
) and transformation.abs
. The gene set statistic is the mean correlation calculated by gss.mean
. The significance is assessed by comparing the gene set statistic to randomly sampled gene sets using significance.sampling
(with the parameter numSamples
and the preset parameter testAlternative = "greater"
).
analysis.averageTStatistic
uses the absolute t statistic as the gene-level statistic by applying gls.tStatistic
(with parameters labs, pValue, alternative
) and transformation.abs
. The gene set statistic is the mean t statistic in the gene set as returned by gss.mean
. The significance is assessed by comparing the gene set statistic to randomly sampled gene sets using significance.sampling
(with the parameter numSamples
and the preset parameter testAlternative = "greater"
).
analysis.globalTest
performs a global gene set enrichment analysis by Goeman et al. by applying the global.test
function which in turn wraps the gt
function in the globaltest package.
analysis.globalAncova
applies the global ANCOVA method by Hummel et al. using the global method global.ancova
which wraps the GlobalAncova
function in the GlobalAncova package.
All functions return an object of class gsAnalysis
that specifies the corresponding analysis parameters for geneSetAnalysis
.
Subramanian, A., Tamayo, P., Mootha, V. K., Mukherjee, S., Ebert, B. L., Gillette, M. A., Paulovich, A., Pomeroy, S. L., Golub, T. R., Lander, E. S., Mesirov, J. P. (2005) Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Science of the United States of America, 102, 15545–15550.
Hummel, M., Meister, R., Mansmann, U. (2008) GlobalANCOVA: exploration and assessment of gene group effects. Bioinformatics, 24(1), 78–85.
Goeman, J. J., van de Geer, S. A., de Kort, F., van Houwelingen, H. C. (2004) A global test for groups of genes: testing association with a clinical outcome. Bioinformatics, 20(1), 93–99.
data(exampleData) # apply a gene set analysis based on the average absolute correlation resAvCor <- geneSetAnalysis( # parameters for geneSetAnalysis dat = countdata, geneSets = pathways[1], analysis = analysis.averageCorrelation(), adjustmentMethod = "fdr", # additional parameters for analysis.averageCorrelation labs = labels, method = "pearson", numSamples = 10) # apply an overrepresentation analysis resOverrep <- geneSetAnalysis( # parameters for geneSetAnalysis dat = countdata, geneSets = pathways, analysis = analysis.overrepresentation(), adjustmentMethod = "fdr", # additional parameters for analysis.overrepresentation pValue = TRUE, threshold = 0.1, labs = labels ) # apply a global analysis using GlobalAncova resGA <- geneSetAnalysis( # parameters for geneSetAnalysis dat = countdata, geneSets = pathways[1], analysis = analysis.globalAncova(), adjustmentMethod = "fdr", # additional parameters for analysis.globalAncova labs = labels, method = "approx")
data(exampleData) # apply a gene set analysis based on the average absolute correlation resAvCor <- geneSetAnalysis( # parameters for geneSetAnalysis dat = countdata, geneSets = pathways[1], analysis = analysis.averageCorrelation(), adjustmentMethod = "fdr", # additional parameters for analysis.averageCorrelation labs = labels, method = "pearson", numSamples = 10) # apply an overrepresentation analysis resOverrep <- geneSetAnalysis( # parameters for geneSetAnalysis dat = countdata, geneSets = pathways, analysis = analysis.overrepresentation(), adjustmentMethod = "fdr", # additional parameters for analysis.overrepresentation pValue = TRUE, threshold = 0.1, labs = labels ) # apply a global analysis using GlobalAncova resGA <- geneSetAnalysis( # parameters for geneSetAnalysis dat = countdata, geneSets = pathways[1], analysis = analysis.globalAncova(), adjustmentMethod = "fdr", # additional parameters for analysis.globalAncova labs = labels, method = "approx")
This function removes all genes that are not part of the experiment (not in rownames(dat)
) from the specified gene sets which. All names are set to lower case.
preprocessGs( dat, geneSets)
preprocessGs( dat, geneSets)
dat |
A numeric matrix of gene expression values for all analyzed genes. Here, each row corresponds to one gene, and each column corresponds to one sample. The rows must be named with the gene names used in the gene sets. Here, only the row names (i.e. the gene names) are used by the function. |
geneSets |
A list of gene sets to be processed, where each gene set is a vector of gene names corresponding to the row names of |
A list of preprocessed gene sets, where each gene set only contains those genes that are also present in dat
#values are not important, only the row names are used dat <- matrix(0,100,10) rownames(dat) <- paste("gene",1:100,sep="") geneSets <- list( gs1 = paste("GENE",1:20,sep=""),# all genes in the analyzed data gs2 = paste("Gene",101:110,sep=""),#no gene in the analyzed data gs3 = paste("gene",90:110,sep="")#some genes in the analyzed data ) newGeneSets <- preprocessGs(dat = dat, geneSets = geneSets)
#values are not important, only the row names are used dat <- matrix(0,100,10) rownames(dat) <- paste("gene",1:100,sep="") geneSets <- list( gs1 = paste("GENE",1:20,sep=""),# all genes in the analyzed data gs2 = paste("Gene",101:110,sep=""),#no gene in the analyzed data gs3 = paste("gene",90:110,sep="")#some genes in the analyzed data ) newGeneSets <- preprocessGs(dat = dat, geneSets = geneSets)
Functions to assess the significance of the gene-level statistics, as used in the significance
parameter of gsAnalysis
. These functions are based on applying the same analysis to randomly modified data sets or gene sets and comparing their statistic values to the original gene set statistic value.
significance.sampling( ..., dat, geneSet, analysis, glsValues, numSamples = 1000) significance.permutation( ..., dat, geneSet, analysis, glsValues, numSamples = 1000, labs) significance.restandardization( ..., dat, geneSet, analysis, glsValues, numSamples = 1000, labs)
significance.sampling( ..., dat, geneSet, analysis, glsValues, numSamples = 1000) significance.permutation( ..., dat, geneSet, analysis, glsValues, numSamples = 1000, labs) significance.restandardization( ..., dat, geneSet, analysis, glsValues, numSamples = 1000, labs)
... |
Additional parameters for the different steps of the analysis pipeline, depending on the concrete configuration supplied in |
dat |
The original data set as a numeric matrix of gene expression values for all analyzed genes. Here, each row corresponds to one gene, and each column corresponds to one sample. The rows must be named with the gene names used in the gene sets. |
geneSet |
The original gene set in form of a vector of gene names corresponding to the row names of |
analysis |
The analysis applied to the original gene set (that should also be applied to the modified gene sets). This is an object of type |
glsValues |
A vector containing the (possibly transformed) gene-level statistic values for each gene in the original data set |
numSamples |
The number of random samples that should be taked to calculate the null distribution for the significance assessment. Default is |
labs |
A vector of class labels for the samples in |
Standard methods for the significance assessment of a gene set statistic (to be used in an analysis pipeline defined by gsAnalysis
):
significance.sampling
: This function repeatedly draws random gene sets. Their gene set statistic values form the null distribution.
significance.permutation
: This function repeatedly permutes the labels of the data set. The gene set statistic values for the original gene set on the permuted data set form the null distribution.
significance.restandardization
: This function applies both a gene set sampling and a label permutation. The permutation statistic values are standardized by their mean and standard deviation and then restandardized based on the gene set sampling statistic values. These restandardized values form the null distribution (Efron and Tibshirani).
significance.sampling
returns a list with the following elements:
gssValues |
A vector of gene set statistic values, one entry per sample. |
randomGeneSets |
A matrix containing the gene sets which were sampled randomly from the set of all genes. |
significance.permutation
returns a list with the following elements:
gssValues |
A vector of gene set statistics, one entry per sample. |
permutations |
A matrix, where each column contains the indices of one permutation. |
significance.restandardization
returns a list with the following elements:
gssValues |
A vector of gene set statistics, one entry per sample. |
samplingValues |
A list of sub-lists, each containing one sampling result as defined above. |
permutationValues |
A list of sub-lists, each containing one permutation result as defined above. |
Efron, B., Tibshirani, R. (2007) On testing the significance of sets of genes. Annals of Applied Statistics, 1, 107-129.
geneSetAnalysis
, gsAnalysis
, hist.gsaResult
Prints a summary of a gene set analysis result object.
## S3 method for class 'gsaResult' summary(object, mode = c("summary", "table"), orderBy = c("adjustedPValues", "rawPValues", "geneSetName"), significantOnly = FALSE, signLevel = object$signLevel, ...)
## S3 method for class 'gsaResult' summary(object, mode = c("summary", "table"), orderBy = c("adjustedPValues", "rawPValues", "geneSetName"), significantOnly = FALSE, signLevel = object$signLevel, ...)
object |
A result object as returned by |
mode |
Specifies the type of information that is displayed: By default ( |
orderBy |
If |
significantOnly |
If |
signLevel |
If |
... |
Currently unused |
Returns a data.frame with a summary of the produced results such as data set name, p-values, size of the given gene set.
geneSetAnalysis
, hist.gsaResult
, createSummaryTable
data(exampleData) # perform gene set analyses for several pathways res <- geneSetAnalysis( # global parameters dat = countdata, geneSets = pathways, analysis = analysis.averageCorrelation(), # additional parameters for analysis.averageCorrelation labs = labels, numSamples = 10) #summarize the analyses summary(res, mode = "summary") summary(res, mode = "table", orderBy = "rawPValues")
data(exampleData) # perform gene set analyses for several pathways res <- geneSetAnalysis( # global parameters dat = countdata, geneSets = pathways, analysis = analysis.averageCorrelation(), # additional parameters for analysis.averageCorrelation labs = labels, numSamples = 10) #summarize the analyses summary(res, mode = "summary") summary(res, mode = "table", orderBy = "rawPValues")
Functions to transform the gene-level statistic values prior to the calculation of the gene set statistics, as used in the transformation
parameter of gsAnalysis
. Most of the functions wrap existing R functions.
transformation.abs(x) transformation.square(x) transformation.localFdr(x, statistic="pvalue", cutoff.method="fndr", pct0=0.75) transformation.binarize(x, quant) transformation.rank(x) transformation.adjust(x, adjMethod = "fdr") transformation.adjustAndBinarize(x, adjMethod = "fdr", threshold = 0.05)
transformation.abs(x) transformation.square(x) transformation.localFdr(x, statistic="pvalue", cutoff.method="fndr", pct0=0.75) transformation.binarize(x, quant) transformation.rank(x) transformation.adjust(x, adjMethod = "fdr") transformation.adjustAndBinarize(x, adjMethod = "fdr", threshold = 0.05)
x |
A numeric vector of gene-level statistic values, one per gene. These values are calculated by the previous step (see |
statistic |
Specifies the null model for |
cutoff.method |
Type of cut-off method used in |
pct0 |
Fraction of data used by |
quant |
For |
adjMethod |
The method to use for the adjustment for multiple testing (see |
threshold |
The threshold for differential expression of a gene (defaults to |
Standard transformation functions for gene-level statistics (to be used in an analysis pipeline defined by gsAnalysis
):
transformation.abs
: Calculates the absolute values of the elements in x
(a wrapper for abs
).
transformation.square
: Squares all elements in x
.
transformation.localFdr
: Calculates the local fdr for the elements in x
. This is a wrapper for fdrtool
.
transformation.binarize
: Binarizes the values in x
by using the quant
quantile as a threshold.
transformation.rank
: Ranks the values in x
and returns the rank vector.
transformation.adjust
: Adjusts for multiple testing according to the adjustment method specified in adjMethod
.
transformation.adjustAndBinarize
: Adjusts for multiple testing according to the adjustment method specified in adjMethod
and binarizes the resulting p-values according to threshold
(values smaller than the threshold become 1 others 0).
All functions return a vector of transformed values having the same length as x
.
geneSetAnalysis
, gsAnalysis
, gss
, gls