decoupler.mt.viper#

decoupler.mt.viper = <decoupler._Method.Method object>#

Virtual Inference of Protein-activity by Enriched Regulon analysis (VIPER) [ASG+16].

This approach first ranks features based on their absolute values and computes a one-tail score.

\begin{align} w &= \frac{w}{max(|w|)} \\ l_{orig} &= 1_{w \neq 0} \\ l &= \frac{l_{orig}}{\sum_{i=1}^{k} \frac{l_i}{max(l_{orig})}max(l_{orig})} \\ q^{norm} &= \Phi^{-1}(2|q-0.5| + (1 + max(|q-0.5|))) \\ S_1 &= \sum_{i=1}^{k}q_i^{norm}l_i(1-|w_i|) \\ \end{align}

Where:

$w \in [-1, +1]$ is a vector of interaction weights across features
$l \in [0, 1]$ is a vector of interaction likelihoods across features
$q \in [0, 1]$ is a vector of quantiles based on the molecular readouts across features
$k$ is the number of features in $q$
$\Phi^{-1}$ is is the inverse of the cumulative distribution function of the standard normal distribution
$q^{norm} \in [-\infty,+\infty]$ are the z-scores of the deviation of quantiles from 0.5

$S_1$ encodes for the magnitude of the enrichment score, irrespective of the interaction signs in net.

Then, $q$ are z-transformed and weighted by their interaction strength and likelihood.

S_2 = \sum_{i=1}^{k}w_il_i(\Phi^{-1}(q_i))

In this case, $S_2$ takes the direction (sign) of interactions into consideration.

Afterwards, a summary score $S_3$ is obtained.

S_3 = \begin{cases} (|S_2| + S_1) \times \mathrm{sgn}(S_2) & \text{if } S_1 > 0 \\ S_2 & \text{if } S_1 < 0 \end{cases}

An enrichment score $ES$ is obtained by comparing $S_3$ to a null model generated through an analytical approach that shuffles features.

ES = S_3\sqrt{\sum_{i=1}^{k}l_{orig,i}^{2}}

Together with a $p_{value}$

p_{value} = \Phi(ES)

Additionaly, computing multiple sources simultaneously, a pleiotropic correction is employed.

In brief, all possible pairs of sources AB are generated under two conditions:

both A and B are significantly enriched (p < reg_sign=0.05)
they share at least n_targets=10 features

Subsequently, a $ES$ and its associated $p_{value}$ is computed for both A ( $pA$ ) and B ( $pB$ ) based only on the shared features. Then the pleiotropy score ( $PS$ ) is computed.

PS = \begin{cases} \frac{1}{(1+|\log_{10}(pB) - \log_{10}(pA)|)^{\frac{20}{n_a}}} \text{ if } pA < pB \\ \frac{1}{(1+|\log_{10}(pA) - \log_{10}(pB)|)^{\frac{20}{n_b}}} \text{ if } pA > pB \end{cases}

Where:

$n_a$ is the number of test pairs involving the source A
$n_b$ is the number of test pairs involving the source B

This score is used to update $l_{orig}$ .

l_{orig, i} = \begin{cases} PS \times 1_{\{i \in A\}} \text{ if } pA < pB \\ PS \times 1_{\{i \in B\}} \text{ if } pA > pB \end{cases}

A new $ES$ and $p_{value}$ are calculated following all the previous steps but using the updated $l_{orig}$

Finally, the obtained $p_{value}$ are adjusted by Benjamini-Hochberg correction.

Parameters:

data –
anndata.AnnData instance, pandas.DataFrame, or a tuple of (matrix, samples, features). All methods assume that input values follow a normal distribution unless otherwise specified. Therefore, when working with observational count data, some form of normalization is required (e.g., scanpy’s library-size normalization followed by log1p). Using raw integer counts is not recommended, as they follow a Poisson distribution.

Feature scaling on normalized counts is also acceptable, but note that it changes the results by assuming equal importance across features, and outcomes will vary depending on which observations are included.

No normalization or transformation is required when using contrast-level feature statistics such as log fold changes or Wald test statistics.
net – Dataframe in long format. Must include source and target columns, and optionally a weight column.
tmin (default: 5) – Minimum number of targets per source. Sources with fewer targets will be removed.
layer – Layer key name of an anndata.AnnData instance.
raw (default: False) – Whether to use the .raw attribute of anndata.AnnData.
empty (default: True) – Whether to remove empty observations (rows) or features (columns).
bsize (default: 250000) – For large datasets in sparse format, this parameter controls how many observations are processed at once. Increasing this value speeds up computation but uses more memory.
verbose (default: False) – Whether to display progress messages and additional execution details.
pleiotropy – Whether correction for pleiotropic regulation should be performed.
reg_sign – If pleiotropy, p-value threshold for considering significant regulators.
n_targets – If pleiotropy, integer indicating the minimal number of overlaping targets to consider for analysis.
penalty – If pleiotropy, number higher than 1 indicating the penalty for the pleiotropic interactions. 1 = no penalty.

Returns:

Enrichment scores $ES$ and, if applicable, adjusted $p_{value}$ by Benjamini-Hochberg.

Example

import decoupler as dc
adata, net = dc.ds.toy()
dc.mt.viper(adata, net, tmin=3)

decoupler.mt.viper

Contents

decoupler.mt.viper#