decoupler.pp.extract#

decoupler.pp.extract(data, layer=None, raw=False, empty=True, shuffle=True, verbose=False, bsize=250000)#

Extracts matrix, rownames and colnames from data.

Parameters:

data (AnnData | DataFrame | tuple[ndarray, ndarray, ndarray]) –
anndata.AnnData instance, pandas.DataFrame, or a tuple of (matrix, samples, features). All methods assume that input values follow a normal distribution unless otherwise specified. Therefore, when working with observational count data, some form of normalization is required (e.g., scanpy’s library-size normalization followed by log1p). Using raw integer counts is not recommended, as they follow a Poisson distribution.

Feature scaling on normalized counts is also acceptable, but note that it changes the results by assuming equal importance across features, and outcomes will vary depending on which observations are included.

No normalization or transformation is required when using contrast-level feature statistics such as log fold changes or Wald test statistics.
layer (str | None (default: None)) – Layer key name of an anndata.AnnData instance.
raw (bool (default: False)) – Whether to use the .raw attribute of anndata.AnnData.
empty (bool (default: True)) – Whether to remove empty observations (rows) or features (columns).
shuffle (bool (default: True)) – Whether to shuffle features to ensure ties are broken.
verbose (bool (default: False)) – Whether to display progress messages and additional execution details.

Return type:

Returns:

Matrix, rownames and colnames from data.

Example

import decoupler as dc

adata, net = dc.ds.toy()
X, obs_names, var_names = dc.pp.extract(adata)

decoupler.pp.extract