decoupler.mt.gsea#
- decoupler.mt.gsea = <decoupler._Method.Method object>#
Gene Set Enrichment Analysis (GSEA) [STM+05].
Features are ranked based on a continuous statistic (e.g., expression, score, or correlation). The enrichment score (ES) for a feature set is computed by walking down the ranked list and increasing a running-sum statistic when a feature is in the set, and decreasing it when it is not.
Where:
is a feature set
is the ranking of the feature statistics in descending order
is the value for feature
is the value for feature in
is the number of features in
is the total number of features in
is the number of features not in but present in
For each feature, the function is applied and stored as a sequence .
The enrichment score corresponds to the maximum deviation from zero of this running sum.
When multiple random permutations are done (
times > 1), statistical significance is assessed via empirical testing.Where:
are the enrichment scores of the random permutations
is the total number of permutations
Additionaly, is updated to a normalized enrichment score .
Where:
is the mean of positive values in
is the mean of negative values in
Finally, the obtained are adjusted by Benjamini-Hochberg correction.
- Parameters:
data –
anndata.AnnDatainstance,pandas.DataFrame, or a tuple of(matrix, samples, features). All methods assume that input values follow a normal distribution unless otherwise specified. Therefore, when working with observational count data, some form of normalization is required (e.g.,scanpy’s library-size normalization followed by log1p). Using raw integer counts is not recommended, as they follow a Poisson distribution.Feature scaling on normalized counts is also acceptable, but note that it changes the results by assuming equal importance across features, and outcomes will vary depending on which observations are included.
No normalization or transformation is required when using contrast-level feature statistics such as log fold changes or Wald test statistics.
net – Dataframe in long format. Must include
sourceandtargetcolumns, and optionally aweightcolumn.tmin (default:
5) – Minimum number of targets per source. Sources with fewer targets will be removed.layer – Layer key name of an
anndata.AnnDatainstance.raw (default:
False) – Whether to use the.rawattribute ofanndata.AnnData.empty (default:
True) – Whether to remove empty observations (rows) or features (columns).bsize (default:
250000) – For large datasets in sparse format, this parameter controls how many observations are processed at once. Increasing this value speeds up computation but uses more memory.verbose (default:
False) – Whether to display progress messages and additional execution details.times – Number of random permutations to do.
seed – Random seed to use.
- Returns:
Enrichment scores and, if applicable, adjusted by Benjamini-Hochberg.
Example
import decoupler as dc adata, net = dc.ds.toy() dc.mt.gsea(adata, net, tmin=3)