decoupler.pp.pseudobulk

Contents

decoupler.pp.pseudobulk#

decoupler.pp.pseudobulk(adata, sample_col, groups_col, layer=None, raw=False, empty=False, mode='sum', skip_checks=False, bsize=250000, verbose=False)#

Summarizes omic profiles across cells, grouped by sample and optionally by group categories.

By default this function expects raw integer counts as input and sums them per sample and group (mode='sum'), but other modes are available.

This function produces some quality control metrics to assess if is necessary to filter some samples or features. The number of cells that belong to each sample is stored in adata.obs['psbulk_n_cells'], the total sum of counts per sample in .obs['psbulk_counts'], and the proportion of cells that have a non-zero value for a given feature in .layers['psbulk_props'].

Parameters:
  • adata (AnnData) – Annotated data matrix with observations (rows) and features (columns).

  • sample_col (str) – Column of adata.obs where to extract the samples names.

  • groups_col (str | None) – Column of adata.obs where to extract the groups names. Can be set to None to ignore groups.

  • layer (str | None (default: None)) – Layer key name of an anndata.AnnData instance.

  • raw (bool (default: False)) – Whether to use the .raw attribute of anndata.AnnData.

  • empty (bool (default: False)) – Whether to remove empty observations (rows) or features (columns).

  • mode (str | Callable | dict (default: 'sum')) – How to perform the pseudobulk. Available options are sum, mean or median. It also accepts callback functions, like lambda, to perform custom aggregations. Additionally, it is also possible to provide a dictionary of different callback functions, each one stored in a different resulting .layer. In this case, the result of the first callback function of the dictionary is stored in .X by default. To switch between layers check decoupler.swap_layer.

  • skip_checks (bool (default: False)) – Whether to skip input checks. Set to True when working with positive and negative data, or when counts are not integers and mode='sum'.

  • verbose (bool (default: False)) – Whether to display progress messages and additional execution details.

Return type:

AnnData

Returns:

New AnnData object containing summarized pseudobulk profiles by sample and optionally by group.

Example

import decoupler as dc

adata = dc.ds.covid5k()
pdata = dc.pp.pseudobulk(adata, sample_col="individual", groups_col="celltype")
pdata