Preprocessing

Contents

Preprocessing#

Data#

pp.extract(data[, layer, raw, empty, ...])

Extracts matrix, rownames and colnames from data.

Network#

pp.read_gmt(path)

Read a GMT file and return the feature sets as a network.

pp.prune(features, net[, tmin, verbose])

Removes sources of a net with less than tmin targets shared with mat.

pp.adjmat(features, net[, verbose])

Converts a network in long format into a regulatory adjacency matrix (targets x sources).

pp.idxmat(features, net[, verbose])

Indexes and returns feature sets as a decomposed sparse matrix.

pp.shuffle_net(net[, target, weight, seed, ...])

Shuffle a network to make it random.

pp.net_corr(net[, data, tmin, verbose])

Checks the correlation across the sources in a network.

AnnData#

pp.get_obsm(adata, key)

Extracts values stored in .obsm as a new AnnData object.

pp.swap_layer(adata, key[, X_key, inplace])

Swaps an AnnData.X for a given layer key.

pp.pseudobulk(adata, sample_col, groups_col)

Summarizes omic profiles across cells, grouped by sample and optionally by group categories.

pp.filter_samples(adata[, min_cells, ...])

Remove pseudobulked samples with insufficient number of cells and total counts.

pp.filter_by_expr(adata[, group, lib_size, ...])

Determine which genes have sufficiently large counts to be retained in a statistical analysis.

pp.filter_by_prop(adata[, min_prop, ...])

Determine which genes are expressed in a sufficient proportion of cells across samples.

pp.knn(adata[, key, bw, max_nn, cutoff])

Adds K-Nearest Neighbors similarities based on spatial distances.

pp.bin_order(adata, order[, names, label, nbins])

Bins features along a continuous, ordered process such as pseudotime.