decoupler.mt.waggr#

decoupler.mt.waggr = <decoupler._Method.Method object>#

Weighted Aggregate (WAGGR) [BiMVSB+22].

This approach aggregates the molecular features $x_i$ from one observation $i$ with the feature weights $w$ of a given feature set $j$ into an enrichment score $ES$ .

This method can use any aggregation function, which by default is the weighted mean.

ES = \frac{\sum_{i=1}^{n} w_i x_i}{\sum_{i=1}^{n} w_i}

Another simpler option is the weighted sum.

ES = \sum_{i=1}^{n} w_i x_i

Alternatively, this method can also take any defined function $f$ as long at it aggregates $x_i$ and $w$ into a single $ES$ .

ES = f(w_i, x_i)

This functionality makes it relatively easy to implement and try new enrichment methods.

When multiple random permutations are done (times > 1), statistical significance is assessed via empirical testing.

p_{value}=\frac{ES_{rand} \geq ES}{P}

Where:

$ES_{rand}$ are the enrichment scores of the random permutations
$P$ is the total number of permutations

Additionaly, $ES$ is updated to a normalized enrichment score $NES$ .

NES = \frac{ES - \mu(ES_{rand})}{\sigma(ES_{rand})}

Where:

$\mu$ is the mean
$\sigma$ is the standard deviation

Finally, the obtained $p_{value}$ are adjusted by Benjamini-Hochberg correction.

Parameters:

data –
anndata.AnnData instance, pandas.DataFrame, or a tuple of (matrix, samples, features). All methods assume that input values follow a normal distribution unless otherwise specified. Therefore, when working with observational count data, some form of normalization is required (e.g., scanpy’s library-size normalization followed by log1p). Using raw integer counts is not recommended, as they follow a Poisson distribution.

Feature scaling on normalized counts is also acceptable, but note that it changes the results by assuming equal importance across features, and outcomes will vary depending on which observations are included.

No normalization or transformation is required when using contrast-level feature statistics such as log fold changes or Wald test statistics.
net – Dataframe in long format. Must include source and target columns, and optionally a weight column.
tmin (default: 5) – Minimum number of targets per source. Sources with fewer targets will be removed.
layer – Layer key name of an anndata.AnnData instance.
raw (default: False) – Whether to use the .raw attribute of anndata.AnnData.
empty (default: True) – Whether to remove empty observations (rows) or features (columns).
bsize (default: 250000) – For large datasets in sparse format, this parameter controls how many observations are processed at once. Increasing this value speeds up computation but uses more memory.
verbose (default: False) – Whether to display progress messages and additional execution details.
fun – Function to compute enrichment statistic from omics readouts (x) and feature weights (w). Provided function must contain x and w arguments and ouput a single float. By default, ‘wmean’ and ‘wsum’ are implemented.
times – Number of random permutations to do.
seed – Random seed to use.

Returns:

Enrichment scores $ES$ and, if applicable, adjusted $p_{value}$ by Benjamini-Hochberg.

Example

import decoupler as dc

adata, net = dc.ds.toy()
dc.mt.waggr(adata, net, tmin=3)

decoupler.mt.waggr

Contents

decoupler.mt.waggr#