scanpy.get.aggregate#
- scanpy.get.aggregate(adata, by, func, *, axis=None, mask=None, dof=1, layer=None, obsm=None, varm=None)[source]#
Aggregate data matrix based on some categorical grouping.
This function is useful for pseudobulking as well as plotting.
Aggregation to perform is specified by
func, which can be a single metric or a list of metrics. Each metric is computed over the group and results in a new layer in the outputAnnDataobject.If none of
layer,obsm, orvarmare passed in,Xwill be used for aggregation data.- Parameters:
- adata
AnnData AnnDatato be aggregated.- by
str|Collection[str] Key of the column to be grouped-by.
- func
Union[Literal['count_nonzero','mean','sum','var','median'],Iterable[Literal['count_nonzero','mean','sum','var','median']]] How to aggregate.
- axis
Optional[Literal['obs',0,'var',1]] (default:None) Axis on which to find group by column.
- mask
ndarray[tuple[int,...],dtype[bool]] |str|None(default:None) Boolean mask (or key to column containing mask) to apply along the axis.
- dof
int(default:1) Degrees of freedom for variance. Defaults to 1.
- layer
str|None(default:None) If not None, key for aggregation data.
- obsm
str|None(default:None) If not None, key for aggregation data.
- varm
str|None(default:None) If not None, key for aggregation data.
- adata
- Return type:
- Returns:
Aggregated
AnnData.
Examples
Calculating mean expression and number of nonzero entries per cluster:
>>> import scanpy as sc, pandas as pd >>> pbmc = sc.datasets.pbmc3k_processed().raw.to_adata() >>> pbmc.shape (2638, 13714) >>> aggregated = sc.get.aggregate( ... pbmc, by="louvain", func=["mean", "count_nonzero"] ... ) >>> aggregated AnnData object with n_obs × n_vars = 8 × 13714 obs: 'louvain' var: 'n_cells' layers: 'mean', 'count_nonzero'
We can group over multiple columns:
>>> pbmc.obs["percent_mito_binned"] = pd.cut(pbmc.obs["percent_mito"], bins=5) >>> sc.get.aggregate( ... pbmc, by=["louvain", "percent_mito_binned"], func=["mean", "count_nonzero"] ... ) AnnData object with n_obs × n_vars = 40 × 13714 obs: 'louvain', 'percent_mito_binned' var: 'n_cells' layers: 'mean', 'count_nonzero'
Note that this filters out any combination of groups that wasn’t present in the original data.