decoupler.pp.pseudobulk#
- decoupler.pp.pseudobulk(adata, sample_col, groups_col, layer=None, raw=False, empty=False, mode='sum', skip_checks=False, bsize=250000, verbose=False)#
Summarizes omic profiles across cells, grouped by sample and optionally by group categories.
By default this function expects raw integer counts as input and sums them per sample and group (
mode='sum'), but other modes are available.This function produces some quality control metrics to assess if is necessary to filter some samples or features. The number of cells that belong to each sample is stored in
adata.obs['psbulk_n_cells'], the total sum of counts per sample in.obs['psbulk_counts'], and the proportion of cells that have a non-zero value for a given feature in.layers['psbulk_props'].- Parameters:
adata (
AnnData) – Annotated data matrix with observations (rows) and features (columns).sample_col (
str) – Column ofadata.obswhere to extract the samples names.groups_col (
str|None) – Column ofadata.obswhere to extract the groups names. Can be set toNoneto ignore groups.layer (
str|None(default:None)) – Layer key name of ananndata.AnnDatainstance.raw (
bool(default:False)) – Whether to use the.rawattribute ofanndata.AnnData.empty (
bool(default:False)) – Whether to remove empty observations (rows) or features (columns).mode (
str|Callable|dict(default:'sum')) – How to perform the pseudobulk. Available options aresum,meanormedian. It also accepts callback functions, like lambda, to perform custom aggregations. Additionally, it is also possible to provide a dictionary of different callback functions, each one stored in a different resulting.layer. In this case, the result of the first callback function of the dictionary is stored in.Xby default. To switch between layers checkdecoupler.swap_layer.skip_checks (
bool(default:False)) – Whether to skip input checks. Set toTruewhen working with positive and negative data, or when counts are not integers andmode='sum'.verbose (
bool(default:False)) – Whether to display progress messages and additional execution details.
- Return type:
- Returns:
New AnnData object containing summarized pseudobulk profiles by sample and optionally by group.
Example
import decoupler as dc adata = dc.ds.covid5k() pdata = dc.pp.pseudobulk(adata, sample_col="individual", groups_col="celltype") pdata