decoupler.pp.filter_by_expr

Contents

decoupler.pp.filter_by_expr#

decoupler.pp.filter_by_expr(adata, group=None, lib_size=None, min_count=10, min_total_count=15, large_n=10, min_prop=0.7, inplace=True)#

Determine which genes have sufficiently large counts to be retained in a statistical analysis.

Adapted from the function filterByExpr of edgeR (https://rdrr.io/bioc/edgeR/man/filterByExpr.html).

Parameters:
  • adata (AnnData) – Annotated data matrix with observations (rows) and features (columns).

  • group (str | None (default: None)) – Name of the adata.obs column to group by. If None, it assumes that all samples belong to one group.

  • lib_size (float | None (default: None)) – Library size. If None, default to the sum of reads per sample.

  • min_count (int (default: 10)) – Minimum count requiered per gene for at least some samples.

  • min_total_count (int (default: 15)) – Minimum total count required per gene across all samples.

  • large_n (int (default: 10)) – Number of samples per group that is considered to be “large”.

  • min_prop (float (default: 0.7)) – Minimum proportion of samples in the smallest group that express the gene.

  • inplace (bool (default: True)) – Whether to perform the operation in the same object.

Return type:

None | ndarray

Returns:

If inplace=False, array of genes to be kept.

Example

import decoupler as dc

adata = dc.ds.covid5k()
pdata = dc.pp.pseudobulk(adata, sample_col="individual", groups_col="celltype")
dc.pp.filter_by_expr(pdata)