decoupler.mt.zscore

Contents

decoupler.mt.zscore#

decoupler.mt.zscore = <decoupler._Method.Method object>#

Z-score (ZSCORE) [YAS+21].

This approach computes the mean value of the molecular features for known targets, optionally subtracts the overall mean of all measured features, and normalizes the result by the standard deviation of all features and the square root of the number of targets.

This formulation was originally introduced in KSEA, which explicitly includes the subtraction of the global mean to compute the enrichment score ESES.

ES=(μsμp)×mσES = \frac{(\mu_s-\mu_p) \times \sqrt m }{\sigma}

Where:

  • μs\mu_s is the mean of targets

  • μp\mu_p is the mean of all features

  • mm is the number of targets

  • σ\sigma is the standard deviation of all features

However, in the RoKAI implementation, this global mean subtraction was omitted.

ES=μs×mσES = \frac{\mu_s \times \sqrt m }{\sigma}

A two-sided pvaluep_{value} is then calculated from the consensus score using the survival function sfsf of the standard normal distribution.

p=2×sf(ES)p = 2 \times \mathrm{sf}\bigl(\lvert \mathrm{ES} \rvert \bigr)

Finally, the obtained pvaluep_{value} are adjusted by Benjamini-Hochberg correction.

Parameters:
  • data

    anndata.AnnData instance, pandas.DataFrame, or a tuple of (matrix, samples, features). All methods assume that input values follow a normal distribution unless otherwise specified. Therefore, when working with observational count data, some form of normalization is required (e.g., scanpy’s library-size normalization followed by log1p). Using raw integer counts is not recommended, as they follow a Poisson distribution.

    Feature scaling on normalized counts is also acceptable, but note that it changes the results by assuming equal importance across features, and outcomes will vary depending on which observations are included.

    No normalization or transformation is required when using contrast-level feature statistics such as log fold changes or Wald test statistics.

  • net – Dataframe in long format. Must include source and target columns, and optionally a weight column.

  • tmin (default: 5) – Minimum number of targets per source. Sources with fewer targets will be removed.

  • layer – Layer key name of an anndata.AnnData instance.

  • raw (default: False) – Whether to use the .raw attribute of anndata.AnnData.

  • empty (default: True) – Whether to remove empty observations (rows) or features (columns).

  • bsize (default: 250000) – For large datasets in sparse format, this parameter controls how many observations are processed at once. Increasing this value speeds up computation but uses more memory.

  • verbose (default: False) – Whether to display progress messages and additional execution details.

  • flavor – Which flavor to use when calculating the z-score, either KSEA or RoKAI.

Returns:

Enrichment scores ESES and, if applicable, adjusted pvaluep_{value} by Benjamini-Hochberg.

Example

import decoupler as dc

adata, net = dc.ds.toy()
dc.mt.zscore(adata, net, tmin=3)