decoupler.pp.filter_by_expr#
- decoupler.pp.filter_by_expr(adata, group=None, lib_size=None, min_count=10, min_total_count=15, large_n=10, min_prop=0.7, inplace=True)#
Determine which genes have sufficiently large counts to be retained in a statistical analysis.
Adapted from the function
filterByExprof edgeR (https://rdrr.io/bioc/edgeR/man/filterByExpr.html).- Parameters:
adata (
AnnData) – Annotated data matrix with observations (rows) and features (columns).group (
str|None(default:None)) – Name of theadata.obscolumn to group by. If None, it assumes that all samples belong to one group.lib_size (
float|None(default:None)) – Library size. If None, default to the sum of reads per sample.min_count (
int(default:10)) – Minimum count requiered per gene for at least some samples.min_total_count (
int(default:15)) – Minimum total count required per gene across all samples.large_n (
int(default:10)) – Number of samples per group that is considered to be “large”.min_prop (
float(default:0.7)) – Minimum proportion of samples in the smallest group that express the gene.inplace (
bool(default:True)) – Whether to perform the operation in the same object.
- Return type:
- Returns:
If
inplace=False, array of genes to be kept.
Example
import decoupler as dc adata = dc.ds.covid5k() pdata = dc.pp.pseudobulk(adata, sample_col="individual", groups_col="celltype") dc.pp.filter_by_expr(pdata)