decoupler.mt.viper

Contents

decoupler.mt.viper#

decoupler.mt.viper = <decoupler._Method.Method object>#

Virtual Inference of Protein-activity by Enriched Regulon analysis (VIPER) [ASG+16].

This approach first ranks features based on their absolute values and computes a one-tail score.

w=wmax(w)lorig=1w0l=lorigi=1klimax(lorig)max(lorig)qnorm=Φ1(2q0.5+(1+max(q0.5)))S1=i=1kqinormli(1wi)\begin{align} w &= \frac{w}{max(|w|)} \\ l_{orig} &= 1_{w \neq 0} \\ l &= \frac{l_{orig}}{\sum_{i=1}^{k} \frac{l_i}{max(l_{orig})}max(l_{orig})} \\ q^{norm} &= \Phi^{-1}(2|q-0.5| + (1 + max(|q-0.5|))) \\ S_1 &= \sum_{i=1}^{k}q_i^{norm}l_i(1-|w_i|) \\ \end{align}

Where:

  • w[1,+1]w \in [-1, +1] is a vector of interaction weights across features

  • l[0,1]l \in [0, 1] is a vector of interaction likelihoods across features

  • q[0,1]q \in [0, 1] is a vector of quantiles based on the molecular readouts across features

  • kk is the number of features in qq

  • Φ1\Phi^{-1} is is the inverse of the cumulative distribution function of the standard normal distribution

  • qnorm[,+]q^{norm} \in [-\infty,+\infty] are the z-scores of the deviation of quantiles from 0.5

S1S_1 encodes for the magnitude of the enrichment score, irrespective of the interaction signs in net.

Then, qq are z-transformed and weighted by their interaction strength and likelihood.

S2=i=1kwili(Φ1(qi))S_2 = \sum_{i=1}^{k}w_il_i(\Phi^{-1}(q_i))

In this case, S2S_2 takes the direction (sign) of interactions into consideration.

Afterwards, a summary score S3S_3 is obtained.

S3={(S2+S1)×sgn(S2)if S1>0S2if S1<0S_3 = \begin{cases} (|S_2| + S_1) \times \mathrm{sgn}(S_2) & \text{if } S_1 > 0 \\ S_2 & \text{if } S_1 < 0 \end{cases}

An enrichment score ESES is obtained by comparing S3S_3 to a null model generated through an analytical approach that shuffles features.

ES=S3i=1klorig,i2ES = S_3\sqrt{\sum_{i=1}^{k}l_{orig,i}^{2}}

Together with a pvaluep_{value}

pvalue=Φ(ES)p_{value} = \Phi(ES)

Additionaly, computing multiple sources simultaneously, a pleiotropic correction is employed.

In brief, all possible pairs of sources AB are generated under two conditions:

  1. both A and B are significantly enriched (p < reg_sign=0.05)

  2. they share at least n_targets=10 features

Subsequently, a ESES and its associated pvaluep_{value} is computed for both A (pApA) and B (pBpB) based only on the shared features. Then the pleiotropy score (PSPS) is computed.

PS={1(1+log10(pB)log10(pA))20na if pA<pB1(1+log10(pA)log10(pB))20nb if pA>pBPS = \begin{cases} \frac{1}{(1+|\log_{10}(pB) - \log_{10}(pA)|)^{\frac{20}{n_a}}} \text{ if } pA < pB \\ \frac{1}{(1+|\log_{10}(pA) - \log_{10}(pB)|)^{\frac{20}{n_b}}} \text{ if } pA > pB \end{cases}

Where:

  • nan_a is the number of test pairs involving the source A

  • nbn_b is the number of test pairs involving the source B

This score is used to update lorigl_{orig}.

lorig,i={PS×1{iA} if pA<pBPS×1{iB} if pA>pBl_{orig, i} = \begin{cases} PS \times 1_{\{i \in A\}} \text{ if } pA < pB \\ PS \times 1_{\{i \in B\}} \text{ if } pA > pB \end{cases}

A new ESES and pvaluep_{value} are calculated following all the previous steps but using the updated lorigl_{orig}

Finally, the obtained pvaluep_{value} are adjusted by Benjamini-Hochberg correction.

Parameters:
  • data

    anndata.AnnData instance, pandas.DataFrame, or a tuple of (matrix, samples, features). All methods assume that input values follow a normal distribution unless otherwise specified. Therefore, when working with observational count data, some form of normalization is required (e.g., scanpy’s library-size normalization followed by log1p). Using raw integer counts is not recommended, as they follow a Poisson distribution.

    Feature scaling on normalized counts is also acceptable, but note that it changes the results by assuming equal importance across features, and outcomes will vary depending on which observations are included.

    No normalization or transformation is required when using contrast-level feature statistics such as log fold changes or Wald test statistics.

  • net – Dataframe in long format. Must include source and target columns, and optionally a weight column.

  • tmin (default: 5) – Minimum number of targets per source. Sources with fewer targets will be removed.

  • layer – Layer key name of an anndata.AnnData instance.

  • raw (default: False) – Whether to use the .raw attribute of anndata.AnnData.

  • empty (default: True) – Whether to remove empty observations (rows) or features (columns).

  • bsize (default: 250000) – For large datasets in sparse format, this parameter controls how many observations are processed at once. Increasing this value speeds up computation but uses more memory.

  • verbose (default: False) – Whether to display progress messages and additional execution details.

  • pleiotropy – Whether correction for pleiotropic regulation should be performed.

  • reg_sign – If pleiotropy, p-value threshold for considering significant regulators.

  • n_targets – If pleiotropy, integer indicating the minimal number of overlaping targets to consider for analysis.

  • penalty – If pleiotropy, number higher than 1 indicating the penalty for the pleiotropic interactions. 1 = no penalty.

Returns:

Enrichment scores ESES and, if applicable, adjusted pvaluep_{value} by Benjamini-Hochberg.

Example

import decoupler as dc
adata, net = dc.ds.toy()
dc.mt.viper(adata, net, tmin=3)