Preprocessing

Contents

Preprocessing#

Data#

pp.extract(data[, layer, raw, empty, ...])

Extracts matrix, rownames and colnames from data.

Network#

`pp.read_gmt`(path)	Read a GMT file and return the feature sets as a network.
`pp.prune`(features, net[, tmin, verbose])	Removes sources of a `net` with less than `tmin` targets shared with `mat`.
`pp.adjmat`(features, net[, verbose])	Converts a network in long format into a regulatory adjacency matrix (targets x sources).
`pp.idxmat`(features, net[, verbose])	Indexes and returns feature sets as a decomposed sparse matrix.
`pp.shuffle_net`(net[, target, weight, seed, ...])	Shuffle a network to make it random.
`pp.net_corr`(net[, data, tmin, verbose])	Checks the correlation across the sources in a network.

AnnData#

`pp.get_obsm`(adata, key)	Extracts values stored in `.obsm` as a new AnnData object.
`pp.swap_layer`(adata, key[, X_key, inplace])	Swaps an `AnnData.X` for a given layer key.
`pp.pseudobulk`(adata, sample_col, groups_col)	Summarizes omic profiles across cells, grouped by sample and optionally by group categories.
`pp.filter_samples`(adata[, min_cells, ...])	Remove pseudobulked samples with insufficient number of cells and total counts.
`pp.filter_by_expr`(adata[, group, lib_size, ...])	Determine which genes have sufficiently large counts to be retained in a statistical analysis.
`pp.filter_by_prop`(adata[, min_prop, ...])	Determine which genes are expressed in a sufficient proportion of cells across samples.
`pp.knn`(adata[, key, bw, max_nn, cutoff])	Adds K-Nearest Neighbors similarities based on spatial distances.
`pp.bin_order`(adata, order[, names, label, nbins])	Bins features along a continuous, ordered process such as pseudotime.